Online only

Who owns what robots eat and what they excrete?

HOW DOES copyright apply to so-called "artificial intelligence"? What is the legality of "feeding" robots? Who owns the resulting output?

An image of a robot writing computer code, by a robot

A robot writing computer code, generated from that prompt by Dall-e-2

Image probably not © Mike Holderness

The second question is quite obscure. At least one cynical reader has suspected that the 2021 UK government consultation on AI and copyright was set up to distract us and our lawyers from the first question - on the use of our work - with the shiny legal and philosophical issues it poses. Fortunately, that consultation has been shelved.

We shall therefore for now take as our motto the words of the Letter of Paul to the Colossians: "beware lest any man spoil you through philosophy and vain deceit." We sketch the discussion about the less important question later; and while proofreading this we were alerted to the need to cover artificial fake works.

Input

If we were faced with actual artificial intelligence (AI), we could ask it "how does copyright apply to you" and get at least as good an answer as we can arrive at using human brains.

We are not and cannot.

What we are talking about are computer systems - machines - that ingest vast amounts of words or pictures and "learn" from them. That is, they encode patterns in the "input sets" in a variety of fundamentally obscure ways that are inspired by researchers' current understanding of how brains make minds. The process is better called "machine learning" (ML).

A user may ask such a system "please write me an article on machine learning and copyright". It proceeds, roughly speaking, by restating the question and then predicting what the next words of the answer will be, based on the patterns of human language that it has learned... and then the next words...

The generation of a response to "please generate an image of a machine, learning" is at a deep level similar, but much harder to put into words.

A machine, learning - generated from that prompt by Stable Diffusion

Image probably not © Mike Holderness

No serious commentator disputes that all the substantial contents of the "input sets" are - or have been - copyright works. Some - such as some works by US government employees - will be in the public domain by the operation of law. Others are in the public domain because copyright has expired, in most places and cases at the end of the seventieth year after the death of the author.

(This leads to the idea of a version of, say, ChatGPT that is trained only on public-domain works. Given the exponentially rising trend in the production of texts, it will, forsooth, therefore tend to respond in an unspeakable mixture of late-19th-century prose and early-21st-century bureaucratese.)

And the process of "scraping" documents from the web (and perhaps other sources) is undoubtedly copying.

The question, then, is whether this copying is covered by any of the "exceptions" to copyright - that is, one of the rules under which your work can be used without permission or payment.

Exception: temporary copies

Machine learning tool purveyors hace claimed that they are covered by exceptions allowing "temporary copies". These exceptions are intended to make it legal, for example, for the "browser" app or program that you are using to read this text to make a temporary copy of it, to speed things up.

Indeed, when someone feeds this page into the training of a machine-learning system it is possible for them to delete the literal copy, leaving only the "pattern" stored in the system. But, crucially, researchers are increasingly finding that it is possible to prompt ML systems to spit out the originals on which they were trained. That means that this page will effectively be copied into the system.

(An analogy is a "concordance": a work that lists every word that occurs in another work, typically the Bible, and where. A biblical concordance is not the Bible; but it is possible to reconstruct the Bible from it.)

So the exception for temporary copying looks dubious.

Exception: text and data mining

The law of the United Kingdom provides an exception to copyright for the purposes of making "copies for text and data analysis for non-commercial research" - in Section 29A of the Copyright, Designs and Patents Act 1988 (CDPA). This applies only where the user "has lawful access to the work" - so circumventing paywalls or ignoring specified licence conditions is not covered. It allows these users to "carry out a computational analysis of anything recorded in the work for the sole purpose of research for a non-commercial purpose".

EU law has a similar provision in Article 3 of the Directive on copyright and related rights in the Digital Single Market. It allows for "reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access".

Now, the owners of the LAION dataset of images claim that it gathered images for non-profit research under the German implementation of that EU law. LAION was used to train the Midjourney and Stable Diffusion machine-learning image generators, among others; but both appear to be heading toward being for-profit operations.

This is... problematic.

The EU Directive does, unlike the UK law, mention "opt-outs": it says non-profit text and data mining is lawful "on condition that the use of works and other subject matter referred to... has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online". Corporations will claim that this is satisfied by the holders of copyright not explicitly forbidding each of their robots by name, which is ridiculous.

Exception: US ‘fair use’

ML companies based in the US sometimes fall into the trap of assuming that US law applies everywhere. But few countries share the exception to copyright for "fair use" in US law.

This leaves it largely to each court to decide whether the act of copying before it is permitted as an "exception" to copyright. Section 107 of the US Copyright Act states that courts shall take into account:

the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
the nature of the copyrighted work;
the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
the effect of the use upon the potential market for or value of the copyrighted work.

In interpreting this, US courts have placed emphasis on how much each use is "transformative". If you can get the original back out from the ML system, it seems to the Freelance that the answer is "not very".

Derivative works

In all copyright and authors' rights legal systems, creating a "derivative work" requires the permission of the author (or other owner). Clear examples of derivative works are translations into other languages or making a stage or film script from a novel. Even if you can't get the original back out from the ML system exactly, we suspect there is an argument to be made that its output is a derivative work.

All this will be settled only in the course of the many lawsuits launched in recent months, with more almost certainly to come. While each of these grinds its way through its system, the practices of the ML companies get entrenched and normalised. That's pretty much what happened with the UK and EU exceptions to copyright: they were seen as legalising what was already being done. Pro-author lobbying did keep both limited, in theory, to non-profit use.

Output

Now we come to the more philosophically challenging question: who or what owns the output of a machine-learning or "artificial intelligence" system?

UK law is fairly unusual in having specific provision for "computer-generated works". In section 9(3) of the CDPA we learn that in the case of "a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken" - very similar wording to that used to grant rights to film and music producers. The right lasts for 50 years.

But in section 178 "computer-generated" is defined as meaning that "the work is generated by computer in circumstances such that there is no human author of the work."

Many legal scholars therefore believe that no ML-generated works at all may in fact be protected, given for example the role of humans in telling the ML system what they want and in selecting from among a range of outputs.

In the "common-law" legal systems we especially rely on court cases in the higher courts to discover what Parliament meant when it passed a law. We have found just one piece of relevant case-law on the "computer-generated works" clause. In Nova Production v Mazooma Games [2006] EWHC 24 (Ch) Nova Productions Limited sued two other makers of computer arcade games for producing works that bore a visual resemblance to its Pocket Money snooker arcade game.

Mr Justice Kitchin held that David Jones of Nova was the person by whom the arrangements necessary for the creation of the work were undertaken. He also held, however, that the defendants had not copied a "substantial part" of Pocket Money - what was copied were "generalised ideas at a high level of abstraction". All Nova's claims fell. So this case doesn't help much.

In the EU, in Case C 145/10, Painer, Advocate-General Verica Trstenjak recommended to the Court of Justice of the European Union (CJEU) that "only human creations are... protected." That's very clear. The Court, though, did not incorporate this wording into its December 2011 judgment.

The UK government did propose in its 2021 consultation to extend copyright in "AI works", in the belief that this would stimulate a Great British AI Industry. That proposal is shelved.

In the US, the Register of Copyrights in September 2022 accepted an application to register Zarya of the Dawn, a machine-generated comic. Then in February she (or her Office) withdrew registration of the images in the book, stating that the original application did not disclose that the images were created by an "AI model". The text remains registered.

This follows the Review Board of the United States Copyright Office in February 2022 confirming a refusal to register a work entitled A Recent Entrance to Paradise, a single two-dimensional image. The Board concluded that "because copyright law as codified in the 1976 Act requires human authorship, the Work cannot be registered."

A digression into UK patents

In a patent application Steven Thaler had stated that the work "was autonomously created by a computer algorithm running on a machine". Thaler also has a case awaiting judgment in the UK Supreme Court in which he claims that the same computer system should be named as the inventor of a patent, under the Patents Act 1977.

Update: The UK Supreme Court ruled on 20 December 2023 that Thaler had in fact withdrawn the patent application when, having been requested by the Patent Office to name a human inventor, he refused. Though this may not strictly bind the Court on copyright matters, it is certainly suggestive.

So the legal position is very cloudy, but much simpler than the multifarious philosophical positions. Probably there is no effective copyright in "AI works" at present, except in the cases that can be shown to be human works. An analysis by solicitors at Bird & Bird Singapore in response to that country's proposal to introduce "copyright protection for AI-generated works" comes to this conclusion.

False attribution and ‘passing off’

It never stops. While proofreading this we came across this message from Jane Friedman, who writes about the publishing industry:

"As of today [7 August 2023], there are about half a dozen books being sold on Amazon, with my name on them, that I did not write or publish. Some huckster generated them using AI. This promises to be a serious problem for the book publishing world."

Naturally, she complained to Amazon. She updated us:

"A brief update: After going back a few times with Amazon on this issue, I was notified the books would not be removed based on the information I provided. Since I do not own copyright in these AI works and since my name is not trademarked, I'm not sure what can be done."

The US Authors Guild offered support. In the morning she updated again:

"As of this morning, the books appear to have been removed from Amazon. How long until it happens again? What about authors who don't have the ability to raise a big red flag like I do?"

Under UK law, there is a specific cause of copyright action against such "hucksters" for "false attribution" (CDPA §84). It is triggered by "a person who... issues to the public copies of a [literary, dramatic, musical or artistic] work... in or on which there is a false attribution". On the face of it, that would appear to include Amazon as well as the probably well-hidden culprit.

Irish copyright law has a similar provision (§113).

In the US, this would be covered by the civil offence of "passing off" under the "Lanham Act" §43(a) - or even in common law. There is also a civil offence of "reverse passing off", defined by law firm Finnegan as occurring "when a person falsely designates the 'origin' of someone else's goods or services, misrepresenting them as its own". Arguing this could be expensive, since the new Copyright Claims Board US small claims procedure appears not to apply to claims under the Lanham Act.

In EU member states other than Ireland we think an author would have to bring proceedings for breach of their moral right to be identified.

Doubtless something else will happen while this is uploading...

2 September 2023 Added a note about opt-outs.

'Artificial Intelligence' our coverage to date