OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.
It’s honestly a good question. It’s perfectly legal for you to memorize a copyrighted work. In some contexts, you can recite it, too (particularly the perilous fair use). And even if you don’t recite a copyrighted work directly, you are most certainly allowed to learn to write from reading copyrighted books, then try to come up with your own writing based off what you’ve read. You’ll probably try your best to avoid copying anyone, but you might still make mistakes, simply by forgetting that some idea isn’t your own.
But can AI? If we want to view AI as basically an artificial brain, then shouldn’t it be able to do what humans can do? Though at the same time, it’s not actually a brain nor is it a human. Humans are pretty limited in what they can remember, whereas an AI could be virtually boundless.
If we’re looking at intent, the AI companies certainly aren’t trying to recreate copyrighted works. They’ve actively tried to stop it as we can see. And LLMs don’t directly store the copyrighted works, either. They’re basically just storing super hard to understand sets of weights, which are a challenge even for experienced researchers to explain. They’re not denying that they read copyrighted works (like all of us do), but arguably they aren’t trying to write copyrighted works.