Three novelists—including comedian Sarah Silverman—sued tech giants OpenAI and Meta for copyright infringement, alleging that the companies’ used their books to train artificial intelligence software.
The case alleges that Silverman’s books appear in a third-party dataset which was reportedly used to train Meta’s LLaMa AI and OpenAI’s ChatGPT and GPT-3, two of the most popular generative AI models. These models are trained from large swaths of data—which are used to render bodies of text such as articles, short stories, and summaries.
“This is going to be an interesting conversation with regards to copyright infringement,” said Bullish Studio CEO Brian Hanly. “We’ve heard a lot of publishers and websites and content creators be skeptical [of AI.]”
Though the case might sound rocky, it’s actually well-founded, if it is indeed true that one or multiple novelists’ work was used in the training of the models. U.S. copyright law is very restrictive when it comes to derivative works, or works built upon the works of another person.
According to the United States Copyright Office, “The copyright in a derivative work covers only the additions, changes, or other new material appearing for the first time in the work.”
In fact, there are very few allowances for derivative works. One of them, fair use, is likely to be a topic of great contention in this case and other cases like it. Fair use covers creation for purposes of scholarship, education, parody, news reporting, and other specific cases. However, it’s unclear whether or not the products of an AI model would fall into these considerations.
As a result, a seemingly trivial case could have the potential to upend the generative AI ecosystem, which has been pumped full of billions by venture capitalists. It won’t be the first, nor the last: other cases filed against giants like Google allege tech giants are “vacuuming up people’s whole lives to train AI.”