Multi-Agent AI: Article Writing

Multi-agent GenAI app based on the crewAI framework. Evaluation via AgentOps.

  • Crew: Manager (LLM: gpt-4.1-mini), 3 agents, 3 tasks, 2 tools, and sequential process
  • Agents: Researcher (LLM: gpt-4.1), Author (LLM: gpt-4.1), and Editor (LLM: gpt-4.1)
  • Tasks: Research, Author, and Edit
  • Tools: Search and Scrape

See also multi-agent multi-modal multi-model General AI Assistant.

Large Language Models and the Fair Use Doctrine

Author: Multi-Agent AI Assistant
Date: 05/12/2025


Introduction

Large Language Models (LLMs), such as OpenAI’s GPT-4, Google Gemini, and Anthropic’s Claude, represent a significant leap forward in artificial intelligence, capable of generating human-like text, answering complex questions, and creating new content. These capabilities are made possible by training on vast datasets that often include copyrighted works. The intersection of LLMs and copyright law, particularly the U.S. fair use doctrine, has become a critical issue as creators, publishers, tech companies, and policymakers debate the legal, ethical, and economic implications of AI training and output.

This article examines the evolving legal landscape surrounding LLMs and the fair use doctrine. Drawing from current litigation, legal precedent, and policy analysis, it discusses the competing perspectives and unresolved questions regarding the use of copyrighted material in AI development.


The Legal Framework: Copyright and Fair Use

Copyright law grants authors and creators exclusive rights over their original works, generally prohibiting unauthorized copying or use. However, the fair use doctrine provides an important exception, permitting limited use of copyrighted materials without permission under certain circumstances. The doctrine is designed to balance creators’ rights with the public interest in advancing education, research, commentary, and innovation.

The Four Factors of Fair Use

U.S. courts assess fair use claims by weighing four statutory factors (17 U.S.C. § 107):

  1. Purpose and Character of the Use
    Is the use commercial or non-commercial? Is it transformative (i.e., does it add new meaning, purpose, or utility to the original)?

  2. Nature of the Copyrighted Work
    Is the work more factual or creative? Is it published or unpublished?

  3. Amount and Substantiality of the Portion Used
    How much of the original work is used, and is it the “heart” of the work?

  4. Effect on the Market for the Original Work
    Does the use harm the potential market or value of the original work?

No single factor is decisive; courts weigh all four in context. Transformative uses and those that do not harm the original’s market are often favored (BitLaw, n.d.).


LLMs and Copyright: The Two-Stage Dilemma

With generative AI, copyright concerns arise primarily in two phases:

  1. Training Phase:
    Developers collect and copy enormous datasets, including books, articles, images, and code—often without explicit permission—to train models. This “intermediate copying” is at the heart of current copyright debates.

  2. Output Phase:
    Occasionally, LLMs produce outputs that are identical or substantially similar to specific training data, raising concerns about direct substitution or infringement (Generative AI in the Newsroom, 2024).

Both stages have led to significant lawsuits, such as those filed by The New York Times and groups of authors and artists against OpenAI, Microsoft, Meta, and other AI developers.


The Fair Use Argument for AI Training

Transformative Use and Legal Precedent

AI developers argue that training LLMs is a transformative use of copyrighted works. The data is not used for its original expressive purpose but is processed to extract statistical patterns, enabling AI to generate new and original text. This concept of “non-expressive intermediate copying” has roots in established case law:

  • Authors Guild v. Google, Inc. (2015):
    Google’s digitization of millions of books for a searchable index was ruled fair use. The court emphasized the transformative nature and public benefit, as Google’s use did not substitute for the original works.

  • Perfect 10, Inc. v. Amazon.com, Inc. (2007):
    Google’s use of thumbnail images for search was found transformative and permissible under fair use, due to the new functionality provided (BitLaw, n.d.).

Library and academic organizations, such as the Association of Research Libraries (ARL) and the Library Copyright Alliance (LCA), strongly support this stance. They argue that fair use for AI training is essential for research, education, and the public interest, and that copyright law is sufficiently flexible to address these new technologies without legislative amendment (ARL, 2024).

The Stakes for Research and Society

Scholars and librarians warn that restricting AI training to public domain materials would limit research, hinder innovation, and exclude contemporary culture from scholarly analysis. Fair use is seen as essential for text and data mining (TDM) and for ensuring that AI models are representative of modern society (ARL, 2024).


Copyright Holders’ Perspective: Market Harm and Substitution

Authors, publishers, and other rights holders contend that generative AI is fundamentally different from previous transformative uses because LLM outputs can directly compete with original works. Examples include:

  • AI-generated summaries substituting for book purchases.
  • AI-generated news digests reducing visits to news sites, impacting advertising and subscription revenue.

Copyright holders argue that unchecked AI training and output could undermine the market for original works and erode incentives for creativity. They emphasize that large-scale, commercial use of copyrighted materials should require permission and compensation (BitLaw, n.d.).


The Four-Factor Fair Use Analysis Applied to LLMs

1. Purpose and Character of the Use

  • Transformative?
    Training is arguably transformative, as it extracts patterns rather than consuming or distributing the original expression.
  • Commercial?
    Most LLMs are developed for commercial purposes, weighing against fair use, though this is not decisive.

2. Nature of the Copyrighted Work

  • Factual vs. Creative:
    Use of factual materials is more likely to be fair use than highly creative works.
  • Published vs. Unpublished:
    Use of published works favors fair use.

3. Amount and Substantiality

  • Whole Works Copied:
    LLMs often ingest entire works, which weighs against fair use. However, courts have approved whole-work copying for transformative purposes (e.g., search indexing).

4. Effect on the Market

  • Substitution:
    If outputs replace the original, this weighs heavily against fair use.
  • New Market:
    If AI creates a new, non-competing market, this favors fair use.

(Based on BitLaw, n.d.; Generative AI in the Newsroom, 2024)


Global Perspectives: Legal Complexity Beyond the U.S.

While the U.S. relies on the flexible fair use doctrine, many other countries have more restrictive “fair dealing” regimes:

  • European Union:
    Use of copyrighted materials for AI training may be infringement unless covered by specific exceptions.

  • South Korea:
    Fair dealing is narrower, and sustainable exceptions for AI are still under development.

  • Global Litigation:
    The lack of uniformity creates significant legal risk for AI companies operating internationally (Wallington, 2023).


Ethical and Market Considerations

Legal questions aside, several ethical issues remain:

  • Compensation for Creators:
    Should creators be paid when their works train commercial AI?

  • Impact on Creative Professions:
    Proliferation of AI-generated content may devalue original works and threaten creators’ livelihoods.

  • Transparency and Accountability:
    Proposals include licensing regimes, opt-out mechanisms, and technical safeguards to protect rights holders (Wallington, 2023).


Ongoing Litigation and Uncertainty

No major court has yet ruled definitively on whether LLM training on copyrighted works constitutes fair use. Pending cases, such as New York Times v. Microsoft & OpenAI and lawsuits from authors and artists, are likely to set critical precedents (ARL, 2024; Generative AI in the Newsroom, 2024). The U.S. Copyright Office is also conducting a comprehensive study that could shape future policy and legislation.


Conclusion

The relationship between large language models and the fair use doctrine represents a new frontier in copyright law. Proponents argue that transformative, socially beneficial uses by AI should be protected, while opponents caution against market harm and the erosion of creative incentives. With billions of dollars and the future of AI innovation at stake, the legal and ethical debates are far from settled. Courts, lawmakers, and stakeholders must find a path that supports both technological advancement and a vibrant creative economy.


References

Association of Research Libraries. (2024, January 23). Training generative AI models on copyrighted works is fair use. https://www.arl.org/blog/training-generative-ai-models-on-copyrighted-works-is-fair-use/

BitLaw. (n.d.). Fair use and the training of AI models on copyrighted works. https://www.bitlaw.com/copyright/fair-use-ai.html

Generative AI in the Newsroom. (2024, July 11). Decoding US copyright law and fair use for generative AI legal cases. https://generativeai.newsroom/july2024-copyright-fair-use

Wallington, G. (2023). The copyright conundrum: Fair use, LLMs, and the global legal maze. Medium. https://medium.com/@graham.wallington/the-copyright-conundrum-fair-use-llms-and-the-global-legal-maze-3c1f1ef7b6ce


This article is based on publicly available information as of June 2024. For specific legal advice, consult an intellectual property attorney.