Price Scraping: On Reading Public Information at Speed

There is something faintly transgressive about the word "scraping." It implies taking something - removing it from a surface where it wasn't meant to come off. Price scraping sounds like something you'd want to be careful about. Something that might not be entirely welcome.

Which is interesting, because what it actually describes is reading a number off a webpage.

The price is published. The page is public. The number was put there deliberately, by the company, to be seen by anyone who visits. When a person visits the product page and reads the price, nobody thinks twice. When software visits the same page, reads the same price, and records it - the word "scraping" appears, along with an ambient sense that something more charged is happening.

Looking for a data extraction tool that works without coding? SiteScoop runs entirely in your browser and exports any table or product grid instantly.

Try SiteScoop free →

Something did change. Just not quite what the word implies.

The only thing that actually changed

When software collects prices from a website, the underlying operation is exactly what a browser does when we navigate to any URL: a request goes out, the server sends back the page, and something reads it. In the manual case, that something is a pair of human eyes and hands that type the number into a spreadsheet. In the automated case, it's software that finds the price and records it.

The data is identical. The source is identical. The server responding to the request has no way of knowing whether there's a person behind it or not.

What changed is the rate. A person visiting product pages manually might get through twenty an hour. Automated software can handle orders of magnitude more. That difference in scale - not the reading itself, but the speed of the reading - is the actual substance of price scraping.

Why speed is worth thinking about

The discomfort with scale isn't entirely without basis, and it's worth being clear about why.

Software that visits a site at very high rates - thousands of requests per minute - creates real load on a server. Most anti-scraping measures are targeting exactly this: the resource consumption that aggressive, high-volume crawling imposes, rather than the act of collecting public data. A site designed to serve ten thousand simultaneous customers has a bad time with a hundred thousand automated requests arriving in rapid succession.

Price collection that operates at reasonable rates - browser-based tools, manual workflows, anything that isn't hammering a server - is in a different category from the kind of scraping that actually causes problems. The meaningful distinction isn't automated versus manual. It's considerate versus inconsiderate.

The legal picture, in most jurisdictions, follows a similar logic. Publicly accessible information on pages that don't require logging in is generally fair game for automated collection. The court cases that have gone the other way have tended to involve circumventing access controls, violating terms of service through technical means, or collecting personal data. Prices published on public product pages don't typically fall into those categories. Prices are there to be seen.

What people are actually doing when they scrape prices

The practical applications are considerably more mundane than the word suggests.

Competitive price analysis is the most common: collecting what competitors charge for comparable products, at regular intervals, to understand where the market sits. This is the same information that was collected manually for decades - someone visiting competitor sites, writing down prices, entering them into a spreadsheet. The only difference is that it now happens faster and more completely.

Price monitoring is another: brands tracking how their products are being priced by retailers, checking whether minimum advertised price agreements are being respected. Market researchers building price indices for categories. Procurement teams tracking what suppliers charge over time.

In every case, the data being collected is information the source published on purpose. The prices exist because companies want to be found and bought from. They are, in a direct sense, an invitation.

The company that published the data

Here is the structural situation that makes this interesting: companies publish prices specifically so that customers will see them, compare them, and make purchasing decisions. A product page with a price on it is a deliberate act of communication. "Here is what this costs. Anyone who wants to know should know."

Price scraping is, at its most literal, the act of taking that communication seriously at scale.

The discomfort, where it exists, rarely comes from the question of whether the data should be visible. It comes from the question of who's looking and what they're planning to do with it. A competitor collecting prices to adjust their own is doing something the publishing company would rather they didn't. But the information that made it possible was published by the company itself. The page invited it.

This creates the slightly peculiar situation where a company publishes a price to be seen, a competitor reads it, and the company objects to the reading. The objection isn't really to the reading. It's to being read. There's a meaningful distinction there, even if it gets collapsed in the word "scraping."

The SiteScoop extension sits at the low-volume, browser-based end of this: extracting prices and product data from whatever page is currently open, structured and ready for a spreadsheet. No server-side crawlers, no requests beyond the browser session already running. Just the page that's open and the data that's on it - received at exactly the speed a person would receive it, if that person had very fast hands.