Fair use and access to illegal copies of books

Determine whether U.S. fair use doctrine permits accessing illegally obtained copies of copyrighted books—such as those hosted in shadow libraries—as part of fair use analysis, notwithstanding the unlawful acquisition of the copies.

Background

The article analyzes how LLMs have been trained using book corpora, including unauthorized copies from shadow libraries, and contrasts this practice with the legally sanctioned Google Books corpus. It surveys fair use case law permitting non-expressive uses and reverse engineering where access was lawful, and highlights that using illicitly obtained copies raises distinct legal concerns.

Against this backdrop, the authors explicitly note that whether fair use analysis allows reliance on access to illegal copies remains unsettled, creating uncertainty for both academic and commercial AI research. This unresolved issue sits at the core of ongoing lawsuits and policy debates about training data acquisition for AI systems.

References

Nonetheless, whether access to illegal copies of books will be allowed as part of fair use analysis is an open question.

Between Copyright and Computer Science: The Law and Ethics of Generative AI  (2403.14653 - Desai et al., 2024) in Section IV, The Path Ahead