DATA & PRIVACY

Chrome Extension Data Collection: What the Numbers Show

A study of 442 AI-powered Chrome extensions found that 52 percent collect user data. A study of 1,237 general extensions found 27 percent collecting. This is what those numbers mean, how Chrome permissions actually work, and what it means for anyone who extracts data from the web.

Sources: BetaNews 2026 · Incogni 2022 · Georgia Tech 2024 · Chrome Web Store

There are roughly 130,000 extensions in the Chrome Web Store. Most of them are small utilities: a tab manager, a colour picker, a grammar checker. But the ones that interact with web pages — that read what you're looking at, fill in forms, or extract data — occupy a different category. They run inside your browser. They see everything the browser sees.

The question of what they do with that access has been studied a handful of times, and the results are consistent: a significant fraction of extensions that request access to page content use that access to collect data and send it to external servers. The user typically has no visibility into this. The Chrome Web Store privacy disclosures, when they exist, are written by the extension developers themselves and are not audited.

This is a look at what the research shows, how the Chrome permission model works, and what the structural difference is between an extension that processes data locally and one that routes it through a server.

"The tools that promise to help you collect data are themselves collecting data about you. The research on this is consistent enough that it probably deserves more attention than it gets."
BY THE NUMBERS
52% of AI-powered Chrome extensions collect user data, per BetaNews study of 442 extensions (2026)
86% of the top 100 Chrome extensions by install count request high-risk permissions
3,000+ extensions identified auto-collecting user-specific data, per Georgia Tech 2024 study
200+ extensions found directly uploading sensitive data to external servers (Georgia Tech 2024)
79.5% of writing and productivity extensions collect user data — the highest rate of any category
130K+ extensions in the Chrome Web Store, with privacy disclosures written by developers themselves

How Chrome permissions actually work

When you install a Chrome extension, it declares the permissions it needs in a file called the manifest. Chrome shows these to the user at install time, though the display is limited and often difficult to interpret. The permissions system is designed to limit what extensions can access — but the most commonly requested permission is also the broadest.

The activeTab permission gives an extension access to the currently active tab when the user interacts with the extension. The tabs permission gives access to tab metadata across all open tabs. The storage permission lets the extension read and write data locally. And the host permissions pattern — which looks like https://*/* — gives the extension access to the content of every page the user visits.

That last one is the critical one. An extension with broad host permissions runs a content script on every page you load. That content script can read the full text of the page, including any data you've typed into forms. It can also make network requests — sending that content to any server the extension developer controls.

"Host permissions are the extension equivalent of giving someone a key to your house and a copy of your diary. The combination is powerful. The question is what they do with it."

What each permission category actually accesses

Chrome's permission names don't map cleanly to what they enable. This is what the main categories cover in practice.

PermissionWhat it enablesRisk level
activeTab Read page content and run scripts on the active tab, triggered by user action Medium — limited to one tab, requires user interaction
tabs Access URLs, titles, and status of all open tabs Medium — metadata only, but reveals browsing history shape
https://*/* (host) Inject scripts and read full page content on every HTTPS site High — unrestricted page access across all sites
storage Read and write data in browser local storage Low in isolation — context-dependent
cookies Access session cookies for sites the extension has host permission to High — enables session hijacking if combined with host permissions
webRequest Intercept, block, or modify network requests High — full visibility into all network traffic
identity Authenticate via Google OAuth and access user identity High — links extension activity to a specific user account

Sources: Chrome Extension API documentation; Incogni 2026 Chrome Extension Privacy Report.

The most common combination in data-collection extensions is broad host permissions plus a background service worker — which means the extension can run code on every page and phone home with the results, even when the user isn't actively interacting with it.

The data collection rate by extension category

Not all extension categories carry the same risk. The pattern across studies is consistent.

79.5%
Writing & Productivity

The highest data collection rate of any category. Writing extensions run on every page where a user composes text — which means they see form inputs, emails, and documents. Many monetise through data rather than subscriptions.

64.9%
Shopping & Price Comparison

Shopping extensions request broad host permissions to detect product pages. They see the products you browse, prices you compare, and purchase patterns. For extensions used in competitive price research, this means your sourcing strategy is visible to the extension provider.

52%
AI-Powered Extensions

AI extensions typically need to send page content to a remote model for processing — which means your data leaves the browser by design. The 52 percent figure covers extensions that collect data beyond what's needed for the stated function.

27%
General Extensions (avg)

The baseline across all extension types. One in four general extensions collects user data — a number that is harder to justify when the stated function (a colour picker, a tab manager) has no data collection requirement.

Sources: BetaNews 2026 (442 AI-powered extensions); Incogni 2022 (1,237 extensions); Georgia Tech 2024.

The popularity paradox

One finding from the research that cuts against common intuition: popular extensions are more likely to collect user data than less popular ones, not less. The Incogni analysis found that extensions with higher install counts had a 36 percent data collection rate compared to 20.7 percent for less-installed extensions.

The explanation is structural. Popular extensions are often free — and free extensions need a revenue model. Data is that model for a significant fraction of them. A smaller, more specialised extension might be paid or supported by a niche community; a large, general-purpose extension is more likely to be covering its costs through data resale or advertising targeting.

The Chrome Web Store review process does not audit data collection practices. High ratings reflect user experience, not privacy behaviour. An extension can have a 4.8-star rating and half a million installs and be sending every page you visit to an ad targeting platform.

"A five-star rating in the Chrome Web Store tells you that users find the extension useful. It tells you nothing about what the extension does with the data it collects while being useful."

The Georgia Tech 2024 study adds a further dimension: of the 3,000 extensions identified as auto-collecting user-specific data, more than 200 were directly uploading sensitive information — including form content and browsing history — to external servers. The users had no indication this was happening.

0

bytes of your data that reach a SiteScoop server. There is no server.

100%

of SiteScoop processing happens inside your browser tab, in WebAssembly

SECTION FIVE

Why local processing changes the equation

The dominant architecture for browser extensions that process page content is: read the page, send it to a server, process it there, return a result. This model is efficient from a development perspective — servers are powerful and easy to update. But it means that every piece of data you extract passes through infrastructure the extension developer controls.

WebAssembly makes a structurally different approach viable. WASM code runs at near-native speed inside the browser tab itself. The extraction logic executes locally. The results never leave the machine unless the user explicitly exports them.

SiteScoop is built in Rust and compiled to WebAssembly. The parser that detects data patterns, the extractor that pulls structured rows, the transformer that cleans the output — all of it runs in your browser tab. There is no backend. There are no logs. The architecture makes data collection structurally impossible, not just against policy.

This matters specifically for the use cases where web scraping is most common — competitive intelligence, procurement research, market analysis. The data being extracted in these contexts is commercially sensitive. Routing it through a third-party server is a risk that most users accept without realising they're accepting it.

Full technical privacy architecture →

What to look for before installing a data extraction extension

The Chrome Web Store privacy label shows a summary of what an extension collects, but the categories are broad and the disclosures are not verified. A more useful approach is to look at the permissions the extension requests and cross-reference them against what the stated function requires.

01
Check the host permissions scope. An extension that requests https://*/* can run on every site you visit. If the function only requires access to specific sites, broad host permissions are a signal worth investigating.
02
Look for a background service worker. Extensions that run continuously in the background — not just when you click the icon — can collect data passively. The extension's manifest (readable in the Chrome extension management page) will show this.
03
Check the network requests. Browser DevTools (Network tab) will show you what connections an extension opens. An extension making requests to unfamiliar domains while you browse is worth scrutinising.
04
Read the privacy policy for data resale language. Phrases like "we may share data with partners" or "aggregate anonymised data" in a privacy policy are worth reading carefully. Anonymisation is frequently reversible when combined with browsing behaviour.
05
Look for open-source code or WebAssembly architecture. Extensions with open-source code can be independently verified. Extensions that run extraction logic in WebAssembly locally — like SiteScoop — cannot send your data to a server because the architecture doesn't include one.

TRY SITESCOOP

Data extraction that can't phone home

Built in Rust, compiled to WebAssembly. No server, no logs, no account required. The architecture that makes your data stay yours — not a policy, a structural fact.