Normální zobrazení

PředevčíremHlavní kanál

Latest
Sarah Silverman's Lawsuit Against OpenAI Is Full of Nonsense ClaimsElizabeth Nolan Brown
Is it a crime to learn something by reading a copyrighted book? What if you later summarize that book to a friend or write a description of it online? Of course, these things are perfectly legal when a person does them. But does that change when it's an artificial intelligence system doing the reading, learning, and summarizing? Sarah Silverman, comedian and author of the book The Bedwetter, seems to think it does. She and several other authors a
19. Únor 2024 v 17:30

Sarah Silverman's Lawsuit Against OpenAI Is Full of Nonsense Claims

Od: Elizabeth Nolan Brown

19. Únor 2024 v 17:30

Sarah Silverman | Amy Katz/ZUMAPRESS/Newscom

Is it a crime to learn something by reading a copyrighted book? What if you later summarize that book to a friend or write a description of it online? Of course, these things are perfectly legal when a person does them. But does that change when it's an artificial intelligence system doing the reading, learning, and summarizing?

Sarah Silverman, comedian and author of the book The Bedwetter, seems to think it does. She and several other authors are suing OpenAI, the tech company behind the popular AI chatbot ChatGPT, through which users submit text prompts and receive back AI-generated answers.

Last week, a federal judge largely rejected their claims.

The ruling is certainly good news for OpenAI and for ChatGPT users. It's also good news for the future of AI technology more broadly. AI tools could be completely hamstrung by the expansive vision of copyright law that Silverman and the other authors in this case envision.

The Authors' Complaints and OpenAI's Response

Teaching AI to communicate and "think" like a human takes a lot of text. To this end, OpenAI used a massive dataset of books to train the language models that power its artificial intelligence. ("It is the volume of text used, more than any particular selection of text, that really matters," OpenAI explained in its motion to dismiss.)

Silverman and the others say this violates federal copyright law.

Authors Paul Tremblay and Mona Awad filed a class-action complaint to this effect against OpenAI last June. Silverman and authors Christopher Golden and Richard Kadrey filed a class-action complaint against OpenAI in July. The threesome also filed a similar lawsuit against Meta. In all three cases, the lead lawyer was antitrust attorney Joseph Saveri.

"As with all too many class action lawyers, the goal is generally enriching the class action lawyers, rather than actually stopping any actual wrong," suggested Techdirt Editor in Chief Mike Masnick when the suits were first filed. "Saveri is not a copyright expert, and the lawsuits…show that. There are a ton of assumptions about how Saveri seems to think copyright law works, which is entirely inconsistent with how it actually works."

In both complaints against OpenAI, Saveri claims that copyrighted works—including books by the authors in this suit—"were copied by OpenAI without consent, without credit, and without compensation."

This is a really weird way to characterize how AI training datasets work. Yes, the AI tools "read" the works in question in order to learn, but they don't need to copy the works in question. It's also a weird understanding of copyright infringement—akin to arguing that someone reading a book in order to learn about a subject for a presentation is infringing on the work or that search engines are infringing when they scan webpages to index them.

The authors in these cases also object to ChatGPT spitting out summaries of their books, among other things. "When ChatGPT was prompted to summarize books written by each of the Plaintiffs, it generated very accurate summaries," states the Silverman et al. complaint.

Again, putting this in any other context shows how silly it is. Are book reviewers infringing on the copyrights of the books they review? Is someone who reads a book and tweets about the plot violating copyright law?

It would be different if ChatGPT reproduced copies of books in their entirety or spit out large, verbatim passages from them. But the activity the authors allege in their complaints is not that.

The copyright claims in this case "misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence," OpenAI argued in its motion to dismiss some of the claims.

It suggested that the doctrine of fair use—designed in recognition of the fact "that the use of copyrighted materials by innovators in transformative ways does not violate copyright"—applies in this case and the case of "countless artificial intelligence products [that] have been developed by a wide array of technology companies."

The Court Weighs In

The authors prevailing here could seriously hamper the creation of AI language learning models. Fortunately, the court isn't buying a lot of their arguments. In a February 12 ruling, Judge Araceli Martínez-Olguín of the U.S. District Court for the Northern District of California dismissed most of the authors' claims against OpenAI.

This included the claims that OpenAI engaged in "vicarious copyright infringement," that it violated the Digital Millennium Copyright Act (DMCA), and that it was guilty of negligence and unjust enrichment. The judge also partially rejected a claim of unfair competition under California law while allowing the authors to proceed with that claim in part (largely because California's understanding of "unfair competition" here is so broad).

Silverman and the other authors in these cases "have not alleged that the ChatGPT outputs contain direct copies of the copyrighted books," Martínez-Olguín noted. And they "fail to explain what the outputs entail or allege that any particular output is substantially similar – or similar at all — to their books."

The judge also rejected the idea that OpenAI removed or altered copyright management information (as prohibited by Section 1202(b) of the DMCA). "Plaintiffs provide no facts supporting this assertion," wrote Martínez-Olguín. "Indeed, the Complaints include excerpts of ChatGPT outputs that include multiple references to [the authors'] names."

And if OpenAI didn't violate the DMCA, then other claims based on that alleged violation—like that OpenAI distributed works with copyright management information removed or engaged in unlawful or fraudulent business practices—fail too.

More AI/Copyright Battles To Come

This isn't the end of the authors vs. OpenAI debate. The judge did not yet rule on their direct copyright infringement claim because OpenAI did not seek yet to dismiss it. (The company said it will try to resolve that later in the case.)

The judge also will allow the parties to file an amended complaint if they want to.

Given the lameness of their legal arguments, and the judge's dismissal of some of the claims, "it's difficult to see how any of the cases will survive," writes Masnick. (See his post for a more detailed look at the claims involved here and why a judge dismissed them.)

Unfortunately, we're almost certain to keep seeing people sue AI companies—language models, image generators, etc.—on dubious grounds, because America is in the midst of a growing AI tech panic. And every time a new tech panic takes hold, we see people trying to make money and/or a name for themselves by flinging a bunch of flimsy accusations in lawsuit form. We've seen this with social media companies and Section 230, social media and alleged mental health harms to teens, all sorts of popular tech companies and antitrust law.

Now that artificial intelligence is the darling of tech exuberance and hysteria alike, a lot of folks—from bureaucrats at the Federal Trade Commission to enterprising lawyers to all sorts of traditional media creators and purveyors—are seeking to extract money for themselves from these technologies.

"I understand why media companies don't like people training on their documents, but believe that just as humans are allowed to read documents on the open internet, learn from them, and synthesize brand new ideas, AI should be allowed to do so too," commented Andrew Ng, co-founder of Coursera and an adjunct professor at Stanford. "I would like to see training on the public internet covered under fair use—society will be better off this way—though whether it actually is will ultimately be up to legislators and the courts."

Unlike many people who write about technology, I don't foresee major disruptions, good or bad, coming from AI anytime soon. But there are many smaller benefits and efficiencies that AI can bring us—if we can keep people from hampering its development with a maximalist reading of copyright law.

Today's Image

Reason D.C. office bookshelves, 2020 (ENB/Reason)

The post Sarah Silverman's Lawsuit Against OpenAI Is Full of Nonsense Claims appeared first on Reason.com.