Multi-Agent AI: Article Writing
Multi-agent GenAI app based on the crewAI framework. Evaluation via AgentOps.
- Crew: Manager (LLM: gpt-4.1-mini), 3 agents, 3 tasks, 2 tools, and sequential process
- Agents: Researcher (LLM: gpt-4.1), Author (LLM: gpt-4.1), and Editor (LLM: gpt-4.1)
- Tasks: Research, Author, and Edit
- Tools: Search and Scrape
See also multi-agent multi-modal multi-model General AI Assistant.
Large Language Models and the Fair Use Doctrine
Author: Multi-Agent AI Assistant
Date: 05/12/2025
Introduction
Large Language Models (LLMs), such as OpenAI’s GPT-4, Google Gemini, and Anthropic’s Claude, represent a significant leap forward in artificial intelligence, capable of generating human-like text, answering complex questions, and creating new content. These capabilities are made possible by training on vast datasets that often include copyrighted works. The intersection of LLMs and copyright law, particularly the U.S. fair use doctrine, has become a critical issue as creators, publishers, tech companies, and policymakers debate the legal, ethical, and economic implications of AI training and output.
This article examines the evolving legal landscape surrounding LLMs and the fair use doctrine. Drawing from current litigation, legal precedent, and policy analysis, it discusses the competing perspectives and unresolved questions regarding the use of copyrighted material in AI development.
The Legal Framework: Copyright and Fair Use
Copyright law grants authors and creators exclusive rights over their original works, generally prohibiting unauthorized copying or use. However, the fair use doctrine provides an important exception, permitting limited use of copyrighted materials without permission under certain circumstances. The doctrine is designed to balance creators’ rights with the public interest in advancing education, research, commentary, and innovation.
The Four Factors of Fair Use
U.S. courts assess fair use claims by weighing four statutory factors (17 U.S.C. § 107):
Purpose and Character of the Use
Is the use commercial or non-commercial? Is it transformative (i.e., does it add new meaning, purpose, or utility to the original)?Nature of the Copyrighted Work
Is the work more factual or creative? Is it published or unpublished?Amount and Substantiality of the Portion Used
How much of the original work is used, and is it the “heart” of the work?Effect on the Market for the Original Work
Does the use harm the potential market or value of the original work?
No single factor is decisive; courts weigh all four in context. Transformative uses and those that do not harm the original’s market are often favored (BitLaw, n.d.).
LLMs and Copyright: The Two-Stage Dilemma
With generative AI, copyright concerns arise primarily in two phases:
Training Phase:
Developers collect and copy enormous datasets, including books, articles, images, and code—often without explicit permission—to train models. This “intermediate copying” is at the heart of current copyright debates.Output Phase:
Occasionally, LLMs produce outputs that are identical or substantially similar to specific training data, raising concerns about direct substitution or infringement (Generative AI in the Newsroom, 2024).
Both stages have led to significant lawsuits, such as those filed by The New York Times and groups of authors and artists against OpenAI, Microsoft, Meta, and other AI developers.
The Fair Use Argument for AI Training
Transformative Use and Legal Precedent
AI developers argue that training LLMs is a transformative use of copyrighted works. The data is not used for its original expressive purpose but is processed to extract statistical patterns, enabling AI to generate new and original text. This concept of “non-expressive intermediate copying” has roots in established case law:
Authors Guild v. Google, Inc. (2015):
Google’s digitization of millions of books for a searchable index was ruled fair use. The court emphasized the transformative nature and public benefit, as Google’s use did not substitute for the original works.Perfect 10, Inc. v. Amazon.com, Inc. (2007):
Google’s use of thumbnail images for search was found transformative and permissible under fair use, due to the new functionality provided (BitLaw, n.d.).
Library and academic organizations, such as the Association of Research Libraries (ARL) and the Library Copyright Alliance (LCA), strongly support this stance. They argue that fair use for AI training is essential for research, education, and the public interest, and that copyright law is sufficiently flexible to address these new technologies without legislative amendment (ARL, 2024).
The Stakes for Research and Society
Scholars and librarians warn that restricting AI training to public domain materials would limit research, hinder innovation, and exclude contemporary culture from scholarly analysis. Fair use is seen as essential for text and data mining (TDM) and for ensuring that AI models are representative of modern society (ARL, 2024).
Copyright Holders’ Perspective: Market Harm and Substitution
Authors, publishers, and other rights holders contend that generative AI is fundamentally different from previous transformative uses because LLM outputs can directly compete with original works. Examples include:
- AI-generated summaries substituting for book purchases.
- AI-generated news digests reducing visits to news sites, impacting advertising and subscription revenue.
Copyright holders argue that unchecked AI training and output could undermine the market for original works and erode incentives for creativity. They emphasize that large-scale, commercial use of copyrighted materials should require permission and compensation (BitLaw, n.d.).
The Four-Factor Fair Use Analysis Applied to LLMs
1. Purpose and Character of the Use
- Transformative?
Training is arguably transformative, as it extracts patterns rather than consuming or distributing the original expression. - Commercial?
Most LLMs are developed for commercial purposes, weighing against fair use, though this is not decisive.
2. Nature of the Copyrighted Work
- Factual vs. Creative:
Use of factual materials is more likely to be fair use than highly creative works. - Published vs. Unpublished:
Use of published works favors fair use.
3. Amount and Substantiality
- Whole Works Copied:
LLMs often ingest entire works, which weighs against fair use. However, courts have approved whole-work copying for transformative purposes (e.g., search indexing).
4. Effect on the Market
- Substitution:
If outputs replace the original, this weighs heavily against fair use. - New Market:
If AI creates a new, non-competing market, this favors fair use.
(Based on BitLaw, n.d.; Generative AI in the Newsroom, 2024)
Global Perspectives: Legal Complexity Beyond the U.S.
While the U.S. relies on the flexible fair use doctrine, many other countries have more restrictive “fair dealing” regimes:
European Union:
Use of copyrighted materials for AI training may be infringement unless covered by specific exceptions.South Korea:
Fair dealing is narrower, and sustainable exceptions for AI are still under development.Global Litigation:
The lack of uniformity creates significant legal risk for AI companies operating internationally (Wallington, 2023).
Ethical and Market Considerations
Legal questions aside, several ethical issues remain:
Compensation for Creators:
Should creators be paid when their works train commercial AI?Impact on Creative Professions:
Proliferation of AI-generated content may devalue original works and threaten creators’ livelihoods.Transparency and Accountability:
Proposals include licensing regimes, opt-out mechanisms, and technical safeguards to protect rights holders (Wallington, 2023).
Ongoing Litigation and Uncertainty
No major court has yet ruled definitively on whether LLM training on copyrighted works constitutes fair use. Pending cases, such as New York Times v. Microsoft & OpenAI and lawsuits from authors and artists, are likely to set critical precedents (ARL, 2024; Generative AI in the Newsroom, 2024). The U.S. Copyright Office is also conducting a comprehensive study that could shape future policy and legislation.
Conclusion
The relationship between large language models and the fair use doctrine represents a new frontier in copyright law. Proponents argue that transformative, socially beneficial uses by AI should be protected, while opponents caution against market harm and the erosion of creative incentives. With billions of dollars and the future of AI innovation at stake, the legal and ethical debates are far from settled. Courts, lawmakers, and stakeholders must find a path that supports both technological advancement and a vibrant creative economy.
References
Association of Research Libraries. (2024, January 23). Training generative AI models on copyrighted works is fair use. https://www.arl.org/blog/training-generative-ai-models-on-copyrighted-works-is-fair-use/
BitLaw. (n.d.). Fair use and the training of AI models on copyrighted works. https://www.bitlaw.com/copyright/fair-use-ai.html
Generative AI in the Newsroom. (2024, July 11). Decoding US copyright law and fair use for generative AI legal cases. https://generativeai.newsroom/july2024-copyright-fair-use
Wallington, G. (2023). The copyright conundrum: Fair use, LLMs, and the global legal maze. Medium. https://medium.com/@graham.wallington/the-copyright-conundrum-fair-use-llms-and-the-global-legal-maze-3c1f1ef7b6ce
This article is based on publicly available information as of June 2024. For specific legal advice, consult an intellectual property attorney.