Published on: 13 February 2024

NY Times vs OpenAI: copyright conflict explained

In late December 2023, it was announced that The New York Times had launched a lawsuit against OpenAI and Microsoft in the United States. The newspaper claims that the companies are infringing its copyrights. They are doing this by using millions of its (news) articles without permission to train their AI systems, including ChatGPT. Moreover, according to The New York Times, the systems would generate almost verbatim excerpts of articles on demand that would normally require a subscription to the newspaper.

The chatbots therefore now rival it as a source of reliable information, threatening the newspaper’s long-term survival. The New York Times fears not only that the number of subscribers will decline in the future, but also that it will lose advertising revenue as a result.

What about this exactly? Can you use copyrighted works to train your AI system with? And may this system then generate them one-to-one? In this article, I will tell you more about this.

What is the conflict about?

OpenAI and Microsoft train their chatbots by feeding them huge amounts of digital data. This data is pulled from web pages through a certain technique. This is also known as ‘scrapping’. Among this data are copyrighted works, such as, in this case, articles from The New York Times.

In copyright law, in principle, the copyright holder is the only one who may disclose or reproduce the work. Does someone else want to use the work? If so, that person must ask permission to do so. The New York Times claims that it did not give OpenAI and Microsoft permission to scrape its copyrighted works and then reproduce them one-to-one. The newspaper therefore now demands that OpenAI and Microsoft destroy the chatbots and training sets that incorporate material of hers. In addition, the newspaper does not demand a specific amount of damages, but estimates the damages at millions of dollars.

In response, OpenAI posted a statement on its website. It states that ChatGPT has no intention of reproducing the articles one-to-one and is in the process of addressing this. In addition, it believes it does not need permission at all to use The New York Times’ articles. OpenAI invokes the ‘fair use’ exception in US copyright law.

What does the fair use exception mean?

Fair use is a concept from the United States. As such, it only applies in US copyright law.

Fair use means that there is no copyright infringement when the use of the protected work is fair to the extent that the rights holder must allow it. When is the use fair? This is determined based on four factors:

The purpose and nature of the use: for example, use for research purposes is more likely to be considered fair than use for a commercial purpose.
The nature of the protected work: use of a work with little creative content is more likely to be allowed than the use of very creative works.
The size and content of the copied part: the smaller part of the protected work is used the quicker this use will be fair.
The effect of the use: does the use have no effect on the exploitation possibilities of the rightholder? Then the use may be fair.

In its statement, OpenAI believes it is making fair use of The New York Times’ articles because ChatGPT is using the works to create an entirely new work. In doing so, it points out that many other academics, organisations and countries see training AI systems with copyrighted works as fair use. However, whether OpenAI’s reliance on the exception will actually succeed is up to the courts.

Do we also know a fair use exception in the Netherlands?

No, in the Netherlands we do not have a general copyright exception such as the fair use concept. On the contrary, our Copyright Act contains more specific exceptions. Some of these correspond to the fair use concept.

Some examples are:

The education exception: provided the work is lawfully disclosed, reasonable remuneration is paid and personality rights such as attribution are respected, parts of protected works may be used for educational purposes without the permission of the rights holder.
The press exception: News media are allowed to copy each other’s posts or articles without the permission of the rights holder, provided personality rights such as attribution are respected
The citation exception: it is allowed to quote or paraphrase parts of protected texts without permission provided that the quoted text has been lawfully made public, no more than necessary is copied, personality rights such as attribution are observed and the quoting is done in an announcement, review, polemic or scientific treatise.

Is copyright scrapping allowed in the EU?

In the EU, we have the DSM Directive which contains rules on the scrapping of copyright works. This distinguishes between two categories for scrapping:

Research organisations and cultural heritage institutions, such as universities and publicly accessible libraries; and
Other organisations and institutions.

Research organisations and cultural heritage institutions

In the EU, research organisations and cultural heritage institutions do not need to seek permission from the rights holder to scrape protected works. But note. This only applies if they have lawful access to the protected works. This is the case, for example, if the rights holder has made the works freely accessible online.

Other organisations and institutions

For other organisations and institutions, scrapping of copyrighted works is allowed unless the copyright owner has taken ‘appropriate measures’ to counteract it. For example, the copyright owner can indicate in the general terms and conditions that scrapping is not allowed. Scrapping can also be countered by adding online metadata of the robot.txt type to the content. Has the copyright owner taken such appropriate measures? Then scrapping is not allowed without the copyright holder’s permission.

As for The New York Times, it has been using robot.txt to prevent scrapping by OpenAI’s GPTbot since August 2023. Thus, if EU rules were to apply, OpenAI would no longer be allowed to simply scrape articles from The New York Times.

What does this mean in terms of conflict for the future of chatbots?

Many companies today use a chatbot in their service delivery. For instance, a chatbot can effortlessly and independently generate answers based on the content of a company’s website or knowledge base. This can save companies a lot of costs on customer service, for instance. Still, the development of chatbots – partly due to The New York Times’ lawsuit brought – is under a magnifying glass. For instance, OpenAI points out that AI models can only solve new problems if they have access to the vast collection of human knowledge and creations. As such, the company still hopes to shape a partnership with The New York Times.

The outcome of the lawsuit will be some time in the future, but could thus potentially thwart the chatbot revolution.

Any questions?

Do you have any questions following this article or other AI-related issues? Then contact one of our lawyers by mail, telephone or fill in the contact form for a free initial consultation. We will be happy to think along with you.

Britt Beumer

+31(0) 205 210 130

bbeumer@flib.nl

Contact form