Training an AI is orthogonal to copyright since the process of training doesn’t involve distribution.
You can train an AI with whatever TF you want without anyone’s consent. That’s perfectly legal fair use. It’s no different than if you copy a song from your PC to your phone.
Copyright really only comes into play when someone uses an AI to distribute a derivative of someone’s copyrighted work. Even then, it’s really the end user that is even capable of doing such a thing by uploading the output of the AI somewhere.
That’s assuming you own the media in the first place. Often AI is trained with large amounts of data downloaded illegally.
So, yes, it’s fair use to train on information you have or have rights to. It’s not fair use to illegally obtain new data. Even more, to renting that data often means you also distribute it.
For personal use, I don’t have an issue with it anyway, but legally it’s not allowed.
Incorrect. No court has ruled in favor of any plaintiff bringing a copyright infringement claim against an AI LLM. Here’s a breakdown of the current court cases and their rulings:
In both cases, the courts have ruled that training an LLM with copyrighted works is highly transformative and thus, fair use.
The plaintiffs in one case couldn’t even come up with a single iota of evidence of copyright infringement (from the output of the LLM). This—IMHO—is the single most important takeaway from the case: Because the only thing that really mattered was the point where the LLMs generate output. That is, the point of distribution.
Until an LLM is actually outputting something, copyright doesn’t even come into play. Therefore, the act of training an LLM is just like I said: A “Not Applicable” situation.
Just a heads up that anthropic have just lost a $1.5b case for downloading and storing copyrighted works. That’s $3,000 per author of 500000 books.
The wheels of justice move slowly but fair use has limits. Commercial use is generally not one. Commentary and transformation are, so we’ll see how this progresses with the many other cases.
Warner Brothers have recently filed another case, I think.
Training an AI is orthogonal to copyright since the process of training doesn’t involve distribution.
You can train an AI with whatever TF you want without anyone’s consent. That’s perfectly legal fair use. It’s no different than if you copy a song from your PC to your phone.
Copyright really only comes into play when someone uses an AI to distribute a derivative of someone’s copyrighted work. Even then, it’s really the end user that is even capable of doing such a thing by uploading the output of the AI somewhere.
That’s assuming you own the media in the first place. Often AI is trained with large amounts of data downloaded illegally.
So, yes, it’s fair use to train on information you have or have rights to. It’s not fair use to illegally obtain new data. Even more, to renting that data often means you also distribute it.
For personal use, I don’t have an issue with it anyway, but legally it’s not allowed.
Incorrect. No court has ruled in favor of any plaintiff bringing a copyright infringement claim against an AI LLM. Here’s a breakdown of the current court cases and their rulings:
https://www.skadden.com/insights/publications/2025/07/fair-use-and-ai-training
In both cases, the courts have ruled that training an LLM with copyrighted works is highly transformative and thus, fair use.
The plaintiffs in one case couldn’t even come up with a single iota of evidence of copyright infringement (from the output of the LLM). This—IMHO—is the single most important takeaway from the case: Because the only thing that really mattered was the point where the LLMs generate output. That is, the point of distribution.
Until an LLM is actually outputting something, copyright doesn’t even come into play. Therefore, the act of training an LLM is just like I said: A “Not Applicable” situation.
Just a heads up that anthropic have just lost a $1.5b case for downloading and storing copyrighted works. That’s $3,000 per author of 500000 books.
The wheels of justice move slowly but fair use has limits. Commercial use is generally not one. Commentary and transformation are, so we’ll see how this progresses with the many other cases.
Warner Brothers have recently filed another case, I think.