Chrome Extension Data Collection: What the Numbers Show
A study of 442 AI-powered Chrome extensions found that 52 percent collect user data. A study of 1,237 general extensions found 27 percent collecting. This is what those numbers mean, how Chrome permissions actually work, and what it means for anyone who extracts data from the web.
There are roughly 130,000 extensions in the Chrome Web Store. Most of them are small utilities: a tab manager, a colour picker, a grammar checker. But the ones that interact with web pages — that read what you're looking at, fill in forms, or extract data — occupy a different category. They run inside your browser. They see everything the browser sees.
The question of what they do with that access has been studied a handful of times, and the results are consistent: a significant fraction of extensions that request access to page content use that access to collect data and send it to external servers. The user typically has no visibility into this. The Chrome Web Store privacy disclosures, when they exist, are written by the extension developers themselves and are not audited.
This is a look at what the research shows, how the Chrome permission model works, and what the structural difference is between an extension that processes data locally and one that routes it through a server.
"The tools that promise to help you collect data are themselves collecting data about you. The research on this is consistent enough that it probably deserves more attention than it gets."
How Chrome permissions actually work
When you install a Chrome extension, it declares the permissions it needs in a file called the manifest. Chrome shows these to the user at install time, though the display is limited and often difficult to interpret. The permissions system is designed to limit what extensions can access — but the most commonly requested permission is also the broadest.
The activeTab permission gives an extension access to the currently active tab when the user interacts with the extension. The tabs permission gives access to tab metadata across all open tabs. The storage permission lets the extension read and write data locally. And the host permissions pattern — which looks like https://*/* — gives the extension access to the content of every page the user visits.
That last one is the critical one. An extension with broad host permissions runs a content script on every page you load. That content script can read the full text of the page, including any data you've typed into forms. It can also make network requests — sending that content to any server the extension developer controls.
"Host permissions are the extension equivalent of giving someone a key to your house and a copy of your diary. The combination is powerful. The question is what they do with it."
What each permission category actually accesses
Chrome's permission names don't map cleanly to what they enable. This is what the main categories cover in practice.
| Permission | What it enables | Risk level |
|---|---|---|
activeTab |
Read page content and run scripts on the active tab, triggered by user action | Medium — limited to one tab, requires user interaction |
tabs |
Access URLs, titles, and status of all open tabs | Medium — metadata only, but reveals browsing history shape |
https://*/* (host) |
Inject scripts and read full page content on every HTTPS site | High — unrestricted page access across all sites |
storage |
Read and write data in browser local storage | Low in isolation — context-dependent |
cookies |
Access session cookies for sites the extension has host permission to | High — enables session hijacking if combined with host permissions |
webRequest |
Intercept, block, or modify network requests | High — full visibility into all network traffic |
identity |
Authenticate via Google OAuth and access user identity | High — links extension activity to a specific user account |
Sources: Chrome Extension API documentation; Incogni 2026 Chrome Extension Privacy Report.
The most common combination in data-collection extensions is broad host permissions plus a background service worker — which means the extension can run code on every page and phone home with the results, even when the user isn't actively interacting with it.
The data collection rate by extension category
Not all extension categories carry the same risk. The pattern across studies is consistent.
The highest data collection rate of any category. Writing extensions run on every page where a user composes text — which means they see form inputs, emails, and documents. Many monetise through data rather than subscriptions.
Shopping extensions request broad host permissions to detect product pages. They see the products you browse, prices you compare, and purchase patterns. For extensions used in competitive price research, this means your sourcing strategy is visible to the extension provider.
AI extensions typically need to send page content to a remote model for processing — which means your data leaves the browser by design. The 52 percent figure covers extensions that collect data beyond what's needed for the stated function.
The baseline across all extension types. One in four general extensions collects user data — a number that is harder to justify when the stated function (a colour picker, a tab manager) has no data collection requirement.
Sources: BetaNews 2026 (442 AI-powered extensions); Incogni 2022 (1,237 extensions); Georgia Tech 2024.
The popularity paradox
One finding from the research that cuts against common intuition: popular extensions are more likely to collect user data than less popular ones, not less. The Incogni analysis found that extensions with higher install counts had a 36 percent data collection rate compared to 20.7 percent for less-installed extensions.
The explanation is structural. Popular extensions are often free — and free extensions need a revenue model. Data is that model for a significant fraction of them. A smaller, more specialised extension might be paid or supported by a niche community; a large, general-purpose extension is more likely to be covering its costs through data resale or advertising targeting.
The Chrome Web Store review process does not audit data collection practices. High ratings reflect user experience, not privacy behaviour. An extension can have a 4.8-star rating and half a million installs and be sending every page you visit to an ad targeting platform.
"A five-star rating in the Chrome Web Store tells you that users find the extension useful. It tells you nothing about what the extension does with the data it collects while being useful."
The Georgia Tech 2024 study adds a further dimension: of the 3,000 extensions identified as auto-collecting user-specific data, more than 200 were directly uploading sensitive information — including form content and browsing history — to external servers. The users had no indication this was happening.
bytes of your data that reach a SiteScoop server. There is no server.
of SiteScoop processing happens inside your browser tab, in WebAssembly
Why local processing changes the equation
The dominant architecture for browser extensions that process page content is: read the page, send it to a server, process it there, return a result. This model is efficient from a development perspective — servers are powerful and easy to update. But it means that every piece of data you extract passes through infrastructure the extension developer controls.
WebAssembly makes a structurally different approach viable. WASM code runs at near-native speed inside the browser tab itself. The extraction logic executes locally. The results never leave the machine unless the user explicitly exports them.
SiteScoop is built in Rust and compiled to WebAssembly. The parser that detects data patterns, the extractor that pulls structured rows, the transformer that cleans the output — all of it runs in your browser tab. There is no backend. There are no logs. The architecture makes data collection structurally impossible, not just against policy.
This matters specifically for the use cases where web scraping is most common — competitive intelligence, procurement research, market analysis. The data being extracted in these contexts is commercially sensitive. Routing it through a third-party server is a risk that most users accept without realising they're accepting it.
Full technical privacy architecture →What to look for before installing a data extraction extension
The Chrome Web Store privacy label shows a summary of what an extension collects, but the categories are broad and the disclosures are not verified. A more useful approach is to look at the permissions the extension requests and cross-reference them against what the stated function requires.
https://*/* can run on every site you visit. If the function only requires access to specific sites, broad host permissions are a signal worth investigating.
TRY SITESCOOP
Data extraction that can't phone home
Built in Rust, compiled to WebAssembly. No server, no logs, no account required. The architecture that makes your data stay yours — not a policy, a structural fact.