Derivative Work Status of Large Language Models
Determine whether large language models trained on text datasets constitute derivative works of their training data and clarify the applicable legislation and case law governing this classification, as this determination directly impacts the permissibility of using various Creative Commons–licensed materials for model pre-training.
References
The discussion on whether LLMs should constitute a derivative work (a transformed version) of their training dataset is yet unresolved and legislation and case law is currently unclear.
— GPT-NL Public Corpus: A Permissively Licensed, Dutch-First Dataset for LLM Pre-training
(2604.00920 - Oort et al., 1 Apr 2026) in Section 3.2 (The Law Perspective), paragraph “LLM as derivative work”