The Internet Archive has lost a significant legal battle after the US Court of Appeals upheld a ruling in Hachette v. Internet Archive, stating that its book digitization and lending practices violated copyright law. The case stemmed from the Archive’s National Emergency Library initiative during the pandemic, which allowed unrestricted digital lending of books, sparking backlash from publishers and authors. The court rejected the Archive’s fair use defense, although it acknowledged its nonprofit status. This ruling strengthens authors’ and publishers’ control over their works. But it immediately reminds me of how AI tools train and use data on the Internet, including books and more. If the nonprofit Internet Archive’s work is not fair use, how do the paid AI tools use this data?
Despite numerous AI copyright lawsuits, text-based data from news outlets usually doesn’t result in harsh rulings against AI tools, often ending in partnerships with major players.
You might think it’s different and argue that the Internet Archive directly uses books, but even though AI tools rely on all the data they have to generate your essay, you can still get specific excerpts or more detailed responses from them if you use a well-crafted prompt.
The Hachette v. Internet Archive case highlights significant concerns about how AI models acquire training data, especially when it involves copyrighted materials like books. AI systems often rely on large datasets, including copyrighted texts, raising similar legal challenges regarding unlicensed use. If courts restrict the digitization and use of copyrighted works without permission, AI companies may need to secure licenses for the texts used in training, adding complexity and potential costs. This could limit access to diverse, high-quality datasets, ultimately affecting AI development and innovation.
Additionally, the case underlines the limitations of the fair use defense in the context of transformative use, which is often central to AI’s justification for using large-scale text data. If courts narrowly view what constitutes fair use, AI developers might face more restrictions on how they access and use copyrighted books. This tension between protecting authors’ rights and maintaining open access to knowledge could have far-reaching consequences for the future of AI training practices and the ethical use of data.
Need a deeper dive into the case? Here is everything you need to know about it.
Hachette v. Internet Archive explainedHachette v. Internet Archive is a significant legal case that centers around copyright law and the limits of the “fair use” doctrine in the context of digital libraries. The case began in 2020, when several large publishing companies—Hachette, HarperCollins, Penguin Random House, and Wiley—sued the Internet Archive, a nonprofit organization dedicated to preserving digital copies of websites, books, and other media.
The case focused on the Archive’s practice of scanning books and lending them out online.
The story behind the Internet Archive lawsuitThe Open Library project, run by the Internet Archive, was set up to let people borrow books digitally. Here’s how it worked:
The Internet Archive thought this was legal because they only let one person borrow a book at a time. They called this system Controlled Digital Lending (CDL). The idea was to make digital lending work just like physical library lending.
When the COVID-19 pandemic hit in early 2020, many libraries had to close, making it hard for people to access books. To help, the Internet Archive launched the National Emergency Library (NEL) in March 2020. This program changed things:
While the NEL was meant to be temporary, it upset authors and publishers. They argued that letting many people borrow the same digital copy without permission was like stealing their work.
Publishers’ riotIn June 2020, the big publishers sued the Internet Archive. They claimed:
The publishers argued that the Internet Archive’s actions hurt the market for their books. They said people were getting free digital versions instead of buying ebooks or borrowing from licensed libraries.
Internet Archive’s defenseThe Internet Archive defended itself by claiming that its work was protected by fair use. Fair use allows limited use of copyrighted material without permission for purposes like education, research, and commentary. The Archive made these points:
They also pointed to their Controlled Digital Lending system as a way to respect copyright laws. Under CDL, only one person could borrow a book at a time, just like in a physical library.
The court’s decisions District Court Ruling (March 2023)In March 2023, a federal court sided with the publishers. Judge John G. Koeltl ruled that the Internet Archive’s actions were not protected by fair use. He said:
The Internet Archive appealed the decision to a higher court, the US Court of Appeals for the Second Circuit, hoping to overturn the ruling. However, the appeals court also ruled in favor of the publishers but made one important clarification:
The Hachette v. Internet Archive case has shown that even nonprofits like the Internet Archive can’t freely digitize and lend books without violating copyright laws. This ruling could also affect how AI companies use copyrighted materials to train their systems. If nonprofits face such restrictions, AI tools might need to get licenses for the data they use. Even if they have already started to make some deals, I wonder, what about the first entries?
Featured image credit: Eray Eliaçık/Bing
All Rights Reserved. Copyright , Central Coast Communications, Inc.