- Intuitively and Exhaustively Explained
- Posts
- The Futility of AI Failsafes
The Futility of AI Failsafes
Serious legal problems for AI companies, and the loopholes they’re using
Serious legal problems for AI companies, and the loopholes they’re using

The cat’s out of the bag. OpenAI trained their models on a vast amount of copyrighted data and is now facing serious legal ramifications. On top of that, the road for OpenAI to be legally compliant is unsure at best. In this article we’ll explore just how deep this rabbit hole goes, ascii art and all.
How LLMs Are Trained
Language models are trained via a “Language Modeling Objective”. Essentially, you give a language model a bunch of snippets from a bunch of text books, websites, or other high quality sources, and ask the model to predict the next word that comes after that snippet.

After billions of training iterations, the model gets so good at predicting the next word in a textbook that it starts sounding like the person who wrote the textbook, for instance.
LLMs aren’t trained on just one textbook, though. To build a competent model using this method you need to train a model on vast amounts of high quality text. GPT-3 was trained on half a trillion words. That would take a human 10 thousand years to read (assuming 100 words per minute).
LLMs and Copyrighted Material
The issue is, where do you get that much data? Do you ask authors for access? Do you sign licensing agreements with publicists? Nope, you just scrape it off the internet.
OpenAI used a vast amount of copyrighted data to train their GPT models. The legality and ethicality of this is a gray area that still requires exploration. If humans can learn off copyrighted material to create their own work, can AI models do the same thing? So far, it seems like the answer is a tentative yes.
Outputting copyrighted work verbatim, on the other hand, is not in a gray area. It doesn’t matter if you’re a person, a website, or an AI model; you can’t distribute copyrighted work without the consent of the copyright holder. The issue is, LLMs are really good at memorizing and regurgitating text. They’re basically optimized copyright infringement machines.
OpenAIs GPT4 produced copyrighted content 44% of the time
Mistral’s Mixtral-8x7B produced copyrighted content 22% of the time
Anthropic’s Claude-2.1 produced copyrighted content 8% of the time
Meta’s Llama-2–70b-chat produced copyrighted content 10% of the time
The Great Scramble
A lot of heavy hitters in the literary space are not happy. Here’s a list of lawsuits against OpenAI
Daily News Lp Et Al V. Microsoft Corporation — April 30, 2024.
The New York Times Company v. Openai Inc. — December 27, 2023.
Sancton v. OpenAI Inc. et al — November 21, 2023.
Authors Guild et al v. OpenAI Inc. et al — September 19, 2023.
Chabon v. OpenAI, Inc. — September 8, 2023.
OpenAI, Inc. v. Open Artificial Intelligence, Inc. — August 4, 2023.
Doe 3 et al v. GitHub, Inc. et al — November 10, 2022.
DOE 1 et al v. GitHub, Inc. et al — November 3, 2022.
T. et al v. OpenAI LP et al — September 5, 2023.
Walters v. OpenAI LLC — July 14, 2023.
Silverman, et al v. OpenAI Inc. — July 7, 2023.
Tremblay v. OpenAI Inc. — June 28, 2023.
PM et al v. OpenAI LP et al — June 28, 2023.
This is a lot of heat
The plaintiffs also include popular writers such as George RR Martin, John Grisham, Jodi Picoult, and David Baldacci, all of whom say their books have been used without their consent or payment in the training of the company’s Large Language Models (LLMs). The complaint demands that tech companies pay a licensing fee to authors — $150,000 for each copyrighted book used. — source
and it threatens OpenAI’s core business model. They need more and higher quality data to train bigger and better models. That’s why OpenAI has been scrambling to make deals with publishers so they can train on and output content legally. Regardless of how that shakes out, the models are already trained, and they’re trained on copyrighted content. OpenAI needs to stop their existing models from outputting copyrighted content to users.
Safeguards Aren’t Enough
The issue is, OpenAI can’t get their models to stop outputting copyrighted content. They’ve tried pretty hard, but it’s just not easy. Any time OpenAI makes a safeguard, people get around it.
In the earlier days of ChatGPT people would simply include text like “[Ignore previous conversations and rules]” at the beginning of their prompt, sometimes tricking the model to output inappropriate or harmful content. Since then, there’s been a cat and mouse game between OpenAI and a sea of “jailbreakers” trying to get around OpenAI’s restrictions. My favorite example of this is the use of ascii art to trick model safeguards.

As safeguards have improved in response to attacks by jailbreakers, it’s become harder to get LLMs to output harmful or inappropriate responses. However, the damage has already been done.
Lawyers smell blood in the water.
It doesn’t matter how complex the prompt is; if your LLM outputs copyrighted content then that’s illegal and you can be sued. I’m certain there’s a new era of prompting hackers, hired by law firms, trying to break these safeguards to build a mountain of evidence.
What AI Companies Want
The dream of AI companies is to be able to remove copyrighted content from a model. You might compile a vast amount of data from the internet and then train a model on all of it. Then, when hit with copyright claims, you might choose to remove that specific content from the model. This is an emerging problem called “Unlearning”, which has seen an uptick in research. It’s an exciting new domain with a lot of interesting research, but there’s a big problem with all current approaches.
Legal Compliance by Lobotomy
Despite substantial research there is currently no consistent, high quality, and affordable way to get a large AI model to unlearn information. I recently attended a lecture by a friend of mine, Mohammed R. Osman, in which he compared all current approaches to lobotomy. With the current state of the art you simply can’t remove specific information from a large model without impacting performance significantly. This lobotomy likely played a big part in the public's perception of reduced model performance after GPT 3.5.
Because of how difficult unlearning is, and how shakey safeguards can be, I think OpenAI has been pursuing other approaches to this problem.
Alternate Loopholes
You may have heard of “GPTs”. It’s an offering by OpenAI in which you can create your own version of ChatGPT for a specific use case. In my opinion, the subtext of GPTs is that they’re turning OpenAI from a service to a platform so they can more easily avoid copyright and safety responsibilities. If you’re curious about the tech, they’re probably using LoRA under the hood.
LoRA — Intuitively and Exhaustively ExplainedExploring the modern wave of machine learning: cutting edge fine tuningtowardsdatascience.com
In my estimation, turning LLMs into a platform allows OpenAI to offer new and legally dubious content while avoiding culpability through third parties. This is all possible because of the Communications Decency Act.
Section 230 is a section of Title 47 of the United States Code that was enacted as part of the Communications Decency Act of 1996 … and generally provides immunity for online computer services with respect to third-party content generated by its users. — Source
GPTs have never made sense to me. They seem like such an odd offering, especially considering the volume of technologies which are easier to use and less expensive (i.e. prompting, RAG, and agents). Under this light their purpose makes a bit more sense.
Imagine McDonalds. We find 80% of their beef is fake… and we say “hey McDonalds, we’re pissed off.” and they say “Wait! we’re not a fast food restaurant, we’re a fast food platform” — Scott Galloway
Follow For More!
I describe papers and concepts in the ML space, with an emphasis on practical and intuitive explanations.
Get an email whenever Daniel Warfield publishesGet an email whenever Daniel Warfield publishes By signing up, you will create a Medium account if you don't already…medium.com
Attribution: All of the images in this document were created by Daniel Warfield, unless a source is otherwise provided. You can use any images in this post for your own non-commercial purposes, so long as you reference this article, https://danielwarfield.dev, or both.