Online only

The Gray Lady takes on the artificial hack

THE NEW YORK TIMES COMPANY, Plaintiff, v. MICROSOFT CORPORATION, OPEN AI, INC,...

The opening of a long and very expensive work

THE New York Times - traditionally known as the "Gray Lady" for its, er, pizzazz - on 27 December filed suit against OpenAI and its major shareholder Microsoft for mass breach of copyright in its articles. The claim (8 megabyte PDF) does not specify an amount of damages, but refers to "billions of dollars in statutory and actual damages that they owe for the unlawful copying and use of the NYT’s uniquely valuable works". It further asks the US Federal court for the Southern District of New York to order "destruction... of all GPT or other LLM models and training sets that incorporate Times Works". A "GPT" is, we discover, a "generative pre-trained transformer" and an "LLM" is a "large language model" - both accurate names for "machine-learning" technologies marketed as "artificial intelligence" or "AI".

The court filing notes that "The Times reached out to Microsoft and OpenAI in April 2023 to raise intellectual property concerns and explore the possibility of an amicable resolution, with commercial terms and technological guardrails that would allow a mutually beneficial value exchange between Defendants and The Times." Clearly no agreement has been reached. OpenAI told the NYT that it had been "moving forward constructively" with the NYT and that it was "surprised and disappointed" by the lawsuit. Spokesperson Lindsey Held went on: "We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models..."

The main causes of action are copyright infringement and "damage to the... brand through so-called AI 'hallucinations'," where such false texts - or confabulations - are associated with the newspaper.

In one of the appendices to the filing - "Exhibit J" - the newspaper shows 100 examples in which ChatGPT - a text generator powered by OpenAI - has produced extracts of Times stories verbatim, or with a handful of words substituted for near synonyms. That appears to the Freelance to squash the argument made by fans of "AI" that these systems actually learn the essence of the data on which they are trained and create anew. No: as we have long argued, everything the system ingests is in there, re-arranged, and can be extracted whole.

Given this, we can for the moment park the argument that the outputs of AI are "derivative works" of the ingested originals - much as Forbidden Planet (1956) is a derivative of The Tempest (1611), or Brazil (1985) is a derivative of 1984 (1949), or much more so.

It seems rather likely that the NYT will seek a settlement that results in a very large payment for infringements to date and continuing royalties for the use OpenAI makes of its work. We do hope that the issues of journalism ethics will get a productive airing along the way.

There is some irony here, in that the NYT has form for ignoring the rights of actual human contributors. Back in 1995 Jonathan Tasini, then President of the US freelances' union the National Writers Union, and others sued the NYT for selling work online without a licence.

The outcome, after a second trip to the US Supreme Court, was a net $11 million paid to freelance contributors - and the NYT claiming outright ownership of subsequent works. The first trip to the Supreme Court was to appeal a strange ruling by District Judge Sonia Sotomayor that electronic databases were "revisions" of newspapers and therefore the paper's copying was permitted under a clause in US copyright law apparently intended to make it easier to update dictionaries. Sotomayer was elevated to the Supreme Court in 2009 and remains in post.

The New York Times Company v. Microsoft Corporation court documents at courtlistener.com