What is web crawling or scraping, and how does it benefit an open internet?

Web crawling, also known as scraping, is the automated process of accessing and collecting publicly available information from websites. This technology powers essential tools for various purposes, including preserving historical website copies for archives, enabling journalists to report news, helping researchers find security flaws, and assisting watchdog organizations in investigating discrimination. It also allows consumers to find better deals through comparison shopping, ultimately fostering a more informed and accessible online environment.

Why are some organizations attempting to restrict web crawling and scraping for AI purposes?

Website operators and tech companies are increasingly seeking to restrict automated access, primarily due to economic anxieties surrounding AI. They fear lost advertising and licensing revenues if AI models utilize their content or if users rely on AI overviews instead of visiting their sites directly. Additionally, AI bots can strain website infrastructure, degrading performance or taking sites offline, which incurs costs for upgrades that some publishers may struggle to afford.

What are the potential risks of proposed changes to internet standards concerning web crawling?

Proposed changes to internet standards, like those from the AI Preferences and Web Bot Auth working groups, could severely limit open web access. They might allow websites to block legitimate uses, including research, archival work, accessibility tools, and investigative journalism, by enabling site operators to express "preference signals" or cryptographically identify and block bots. This could lead to a monetized, permission-based internet, where access for automated tools is restricted to a few approved, paying entities, stifling innovation and information freedom.

← Back to front page

Ethics, Law & PolicyWednesday, June 17, 2026

The Free and Open Web Is Under Attack at the IETF

Original reporting by Electronic Frontier Foundation

Image via Electronic Frontier Foundation

Automated access, often referred to as crawling or scraping, is the ability to programmatically access publicly available information on the internet. This capability is a foundational pillar of the free and open web, powering indispensable tools for everything from journalistic investigations and academic research to archival efforts by non-profits like the Internet Archive. It enables consumers to find the best deals and helps organizations identify security flaws or discriminatory practices, with the internet’s openness traditionally fostering innovation and information dissemination through such automated data collection.

The Looming Threat

However, this essential principle is increasingly under siege. Fearing lost advertising and licensing revenues due to the rise of AI models, publishers and major tech companies are pushing to restrict automated access to public web content. They cite concerns about infrastructure strain and the disruption of business models as AI overviews potentially replace direct website visits. Alarmingly, some are attempting to embed these restrictive business models directly into internet standards, specifically through changes to the Internet Engineering Task Force (IETF) protocols. Proposals by groups like "AI Preferences" aim to give publishers "preference signals" to block AI-related crawling, potentially making these signals legally binding. Another, "Web Bot Auth," seeks to enable cryptographic identification of bots, moving beyond protecting against aggressive actors to potentially allowing sites to block competitors, researchers, or anyone unwilling to pay for access. This shift threatens to transform the open web into a gated community, jeopardizing crucial public interest uses and favoring monetization over information access. The battle to preserve the internet's open nature is now being fought at the very core of its technical infrastructure.

The ongoing debate within standards bodies like the IETF underscores a critical juncture for the internet: will it remain an open commons for information, or will access to its public content become a monetized, permissioned privilege? Proposals from groups like AI Preferences and Web Bot Auth, despite some reasonable aims to manage infrastructure, threaten to fundamentally alter foundational protocols. They risk granting website operators unprecedented power to block legitimate automated access for a broad range of beneficial purposes—from essential research and journalism to accessibility tools for disabled users. This isn't merely about managing server load or protecting revenue; it’s about redrawing the lines of digital access itself, moving away from principles of openness that have defined the internet.

The Web's Future

The implications of adopting such restrictive standards extend far beyond individual websites or specific AI models. A closed web, where crawling requires explicit payment or cryptographic authentication, would erect significant barriers to entry for startups, independent researchers, and non-profit archivists, thereby stifling innovation and critical oversight. It risks creating a two-tiered internet: one for well-funded entities able to pay for licensed access, and another, diminished one for everyone else. Such a shift would degrade the internet's capacity to foster public discourse, enable accountability, and preserve collective knowledge—values that underpin a healthy, democratic society. The ongoing fight waged by EFF and its allies is therefore not just for technical standards, but for the very soul of the internet – an enduring struggle to ensure its continued value as a free, open, and accessible resource for all.

Frequently asked questions

What is web crawling or scraping, and how does it benefit an open internet?: Web crawling, also known as scraping, is the automated process of accessing and collecting publicly available information from websites. This technology powers essential tools for various purposes, including preserving historical website copies for archives, enabling journalists to report news, helping researchers find security flaws, and assisting watchdog organizations in investigating discrimination. It also allows consumers to find better deals through comparison shopping, ultimately fostering a more informed and accessible online environment.
Why are some organizations attempting to restrict web crawling and scraping for AI purposes?: Website operators and tech companies are increasingly seeking to restrict automated access, primarily due to economic anxieties surrounding AI. They fear lost advertising and licensing revenues if AI models utilize their content or if users rely on AI overviews instead of visiting their sites directly. Additionally, AI bots can strain website infrastructure, degrading performance or taking sites offline, which incurs costs for upgrades that some publishers may struggle to afford.
What are the potential risks of proposed changes to internet standards concerning web crawling?: Proposed changes to internet standards, like those from the AI Preferences and Web Bot Auth working groups, could severely limit open web access. They might allow websites to block legitimate uses, including research, archival work, accessibility tools, and investigative journalism, by enabling site operators to express "preference signals" or cryptographically identify and block bots. This could lead to a monetized, permission-based internet, where access for automated tools is restricted to a few approved, paying entities, stifling innovation and information freedom.

Intro and outro generated by Printing Press AI from the source article above. Always consult the original reporting for verbatim quotes and primary sources.