NEWS Your grandchildren won't see this. Why today's news will disappear from public access.

pinkman

BOSS
Staff member
ADMIN
LEGEND
ULTIMATE
SUPREME
MEMBER
BFD Legacy
Joined
Feb 3, 2025
Messages
2,253
Reaction score
19,012
Deposit
0$
Major media outlets have begun blocking the Wayback Machine to protect content from AI models.
1770398849290.png
In the early 1990s, when the World Wide Web first emerged, its creators dreamed of an open space where anyone could share knowledge and collaborate. But today, the free and open internet is noticeably shrinking . One of the most alarming signs is that major media outlets are beginning to block access to their materials for the Internet Archive, a nonprofit organization that has been preserving the history of the web through the Wayback Machine since 1996.

For decades, the Internet Archive has automatically crawled websites and created "snapshots" of them to keep past versions of pages accessible to researchers, journalists, educators, and ordinary users. Now, several major publishers, including The Guardian, The New York Times, Financial Times, and USA Today, have confirmed they are blocking the archive's access to their content. While they formally acknowledge the importance of preserving digital history, they argue that unfettered access creates side effects.

The main reason is the battle on two fronts. The first is AI. Generative systems like ChatGPT, Copilot, and Gemini require large data sets: news, books, scientific articles, images, and other materials that help the models learn and respond to user queries. Publishers increasingly claim that tech companies have been obtaining this data for free and without the consent of copyright holders. This has led to high-profile lawsuits: for example, The New York Times sued OpenAI for alleged copyright infringement, and News Corp is suing Perplexity AI, accusing the company of misusing content.

The second front is paywalls. The Wayback Machine has long been a way to "peek" behind paid access to articles: if a page was once indexed by the archive, its previous version can sometimes be accessed without a subscription. This is painful for media outlets, because news is a business, and the traditional advertising model is increasingly under threat, in part due to the same technological platforms that are siphoning away attention and advertising budgets. As a result, editorial teams try to protect their revenue through subscriptions, but the paradox is that the more content goes behind paid subscriptions, the less open the internet becomes, and the harder it is for people to navigate quality information without expensive subscriptions.

Moreover, publishers appear to be moving beyond simply shutting down bots. On the contrary, content archives are becoming a valuable commodity. Media and academic publishers are increasingly striking deals with tech companies for access to their databases. The article cites an example: News Corp's agreement with OpenAI is reportedly valued at over $250 million over five years. Similar processes are underway in academia: major publishers, previously criticized for hiding taxpayer-funded research behind commercial barriers, are now selling access to troves of journals to tech companies. For example, Taylor & Francis signed a non-exclusive $10 million contract with Microsoft, granting access to over 3,000 academic journals.

To stop unwanted "robot readers," media outlets are implementing technical restrictions against AI crawlers. This impacts not only commercial crawling but also the Internet Archive bot, which records web history. Some news organizations are even calling the archive a "backdoor" to their catalogs, claiming it allows unscrupulous players to continue collecting data or allows users to bypass subscriptions.

The problem is that blocking the Wayback Machine affects the public memory of the internet. If major news sites stop being archived, holes will appear in the public record of the internet that can no longer be closed retroactively. A telling example is given: the Wayback Machine allows one to view The New York Times's homepage for June 1997, when the archive first "retrieved" the newspaper's website. But 30 years from now, researchers and simply curious users will likely not be able to access today's homepage as easily, even if the Internet Archive continues to exist.

Internet history is made up of everyday pages that will become sources for journalists, historians, and scholars tomorrow. And without anyone to preserve them, part of the digital era will be lost. Amid pressure from commercial interests and new challenges from AI, it is non-profit projects like the Internet Archive and Wikipedia that continue to uphold the idea of an open, collaborative, and transparent internet—even if this is becoming increasingly difficult.
 
Top Bottom