No Laughing Matter: Sarah Silverman’s Copyright Suit Against Meta
By Daniel Jason Ain
The class-action lawsuit filed against Meta (Facebook’s parent company) earlier this month by comedian and author Sarah Silverman and other authors may be just the first of many high-profile copyright lawsuits against companies that market, sell and otherwise exploit artificial intelligence (“AI”) products.
A feature of Meta’s AI product, and others like Open AI’s ChatGPT, is the ability to replicate a writer’s style. As an occasional user of these AI programs, I decided to try this out myself. Asking ChatGPT to “Write a Seinfeld joke”, the program immediately returned a lengthy response that could have been written by Seinfeld himself: “Why do they call it ‘fast food’ when you spend half your life waiting in line? You walk into the joint thinking, “Hey, this will be quick. I’ll be in and out, no big deal. But then you get in line, and it’s like entering a time warp to the age of the dinosaurs. You could write a novel, watch a trilogy, and still, the line hasn’t moved an inch . . .”.
Silverman’s complaint does not suggest that AI’s ability to create new jokes in someone’s comedic style is itself copyright infringement. Rather, it focuses on training datasets, the “corpus of input material”  that is used to train LLaMA, Meta’s AI software that is designed to produce natural language. Meta’s massive dataset, as we know through papers published by Meta itself , includes hundreds of thousands of books and other copyrighted materials, including Silverman’s book of essays “Bedwetter”. These materials, as part of Meta’s dataset, can then be used to train LLaMA, actively influencing the text it produces.
Among the plaintiff’s numerous claims are alleged violations of Section 106 of The Copyright Act. Section 106 states, in relevant part: “[T]he owner of copyright under this title has the exclusive rights to do and to authorize any of the following: . . . (1) to reproduce the copyrighted work in copies or phonorecords; (2) to prepare derivative works based upon the copyrighted work . . .” (emphasis added). The Copyright Office defines a derivative work as “a work based on or derived from one or more already existing works. Common derivative works include translations, musical arrangements, motion picture versions of literary material or plays, art reproductions, abridgments, and condensations of preexisting works.” 
Interestingly, plaintiffs claim that both the LLaMA model itself and the output of LLaMA are infringing derivative works, and they go on to claim that “because the output of the LLaMA language models is based on expressive information extracted” from the copyrighted works, “every output” is an infringing derivative work. 
The suit is in its very early days, and Meta has not yet needed to answer; if the case proceeds, it is unclear which defenses may be mounted to counter the claim that LLaMA itself and its outputs are actually derivatives of the copyrighted books at issue, and how fair use defenses might also be asserted.
The implications of this case, and others that are likely to follow, are massive not just for individual authors, but for entire industries. We are currently seeing this firsthand in the entertainment industry, where AI has been a major point of strike negotiations. During this time that the use of copyrighted material to train AI is in legal limbo, unions are having to spend precious negotiating capital to protect members. The Writers Guild of America, still on strike, has taken the position that material covered under the WGA’s CBA should not be used to train AI.
This article is intended as a general discussion of these issues only and is not to be considered legal advice or relied upon. For more information, please contact RPJ Senior Associate Daniel Jason Ain who counsels clients in areas of entertainment, media and literary, intellectual property and employment law. Mr. Ain is admitted to practice law in the State of New York and the District of Columbia (admission pending).