The Internet Archive's Wayback Machine, one of the web's most critical preservation tools, is facing a coordinated effort by major media outlets to prevent it from archiving their content. USA Today, the New York Times, Reddit, and 20 other significant news organizations have recently moved to block the Internet Archive's web crawler, according to analysis by AI-detection startup Originality AI. The Guardian has taken a different tack, allowing the crawler but restricting access to its archived articles through the Archive's interface and API.
The irony is stark. USA Today's recent investigation into US Immigration and Customs Enforcement detention policies relied on the Wayback Machine to compile and analyze statistical data and track policy changes over time. Yet USA Today Co., the publishing conglomerate that owns both USA Today and over 200 additional media outlets, now actively prevents the Archive from preserving its work. Mark Graham, director of the Wayback Machine, called the contradiction explicit: "They're able to pull together their story research because the Wayback Machine exists. At the same time, they're blocking access."
Publishers have justified the restrictions on two fronts. First, they express concern that artificial intelligence companies are using archived content from the Wayback Machine to train models without permission, directly competing with news organizations. A New York Times spokesperson stated that "Times content on the Internet Archive is being used by AI companies in violation of copyright law to directly compete with us." The Times declined to clarify whether this was happening in practice or remained a hypothetical concern. Reddit cited similar AI-related concerns when it moved to block the crawler. These concerns occur against the backdrop of over 100 ongoing AI copyright lawsuits in the United States.
Second, USA Today Co. spokesperson Lark-Marie Anton framed the restriction as part of a broader effort to block all scraping bots, not specifically targeting the Internet Archive. The Guardian's director of business affairs and licensing cited "concerns over potential misuse by AI companies of content sets crawled for preservation purposes."
Journalists and advocacy groups are now rallying to protect the Wayback Machine's mission. This week, the Electronic Frontier Foundation and Fight for the Future coordinated a coalition that collected signatures from over 100 working journalists, ranging from television mainstay Rachel Maddow to independent reporters. The journalists emphasized the Archive's role as a replacement for traditional preservation mechanisms—physical newspaper archives and local public libraries—that have largely disappeared.
In their letter of support, signatories wrote: "With many newspapers closed, and no clear path for local public libraries to preserve digital-only reporting, the work of safeguarding journalism's record increasingly falls to the Internet Archive." Laura Flynn, a supervising podcast producer at The Intercept, called the tool "essential" for fact-checking and sourcing audio clips. Chicago Reader writer Micco Caporale detailed how the Wayback Machine has been invaluable for union organizing work, allowing organizers to track job listings and pay changes across time.
The Internet Archive, a nonprofit that has operated for 30 years and archived over a trillion web pages, has weathered significant legal challenges. Most recently, it settled with major music publishers who sought damages of up to $700 million over the Archive's Great 78s project, which preserved vintage recordings. While no major financial penalty is at stake in the current dispute, the growing trend of media outlets restricting access poses a serious threat to the organization's core mission of preserving digital information for the public record.