OpenAI works on copyright solution for large AI models


According to Sam Altman, content producers who contribute to the capabilities of an AI model will benefit from it in the future. How exactly isn’t clear yet.

At an AI summit at the White House, OpenAI CEO Sam Altman said his company is working on AI models that respect copyright. The goal, he said, is for content creators to be paid when their content, or in the case of images, their style, is used. Technical details are not yet known.

When OpenAI introduced the ChatGPT plugins, it showed an understanding of the potential impact of a large language model with tools on the content ecosystem. The more interaction that takes place in the chatbot ecosystem, the less attention – and therefore money – content creators will receive for their products outside the chatbot.

“We appreciate that this is a new method of interacting with the web, and welcome feedback on additional ways to drive traffic back to sources and add to the overall health of the ecosystem,” OpenAI writes.


Possible options for text generation would be a Spotify-like streaming solution based on the tokens used if the generation can be uniquely attributed to sources, or a flat rate based on the amount of data one provides to OpenAI. Currently, websites can technically indicate whether they want to be crawled by ChatGPT or not, similar to the Google index.

AI models and copyright – it’s complicated

The use of images and text to train large AI models without explicit consent is already controversial from a copyright perspective. In addition, generative AI models are capable of producing text or images that are very similar to the original. International lawsuits are pending. One of the larger ones is Getty Images vs. Stability AI (Stable Diffusion).

OpenAI and other AI companies could address this by only using data to train large AI models when it’s clear it’s allowed to do so. The question is whether it is commercially feasible to collect the necessary amount of data with permission.

While pre-trained language models are relatively static and currently only updated every few months or even years, ChatGPT, for example, can use a browser plugin to ingest information from the web in real-time and combine it with knowledge from the training data. This real-time capability of large language models with tools (plugins) takes the copyright debate to a new level.

Microsoft’s Bing chatbot works similarly. Microsoft CEO Satya Nadella has promised publishers that the chatbot’s outbound traffic will be defined as a success factor for the product and that publishers will share in its success.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top