Longer online version: shorter for print here

Copyright and Artificial Intelligence

Notes for a response to a consultation

The UK government is consulting on proposed changes to copyright law; and its preferred option would allow companies such as Google and Facebook to use your journalistic work to "train" their so-called artificial intelligence tools without payment or explicit permission.

We present some notes toward a response: see the link below to the consultation, which closes on 25 February. We offer them with apologies that we have not had time to translate them from the rather special "consultationese" language. We hope they may assist you in responding yourself.

A robot writing computer code, generated from that prompt by Dall-e-2

Image probably not © Mike Holderness

Democracy is not possible unless members of the electorate has access to reliable news on which to base their voting decisions. That cannot happen without ethical, independent journalism.

Free markets, too, are impossible unless buyers and sellers have access to truthful independent information, and journalism is the prime means of transmitting such information.

The UK’s copyright framework has sustained journalists and other creative workers for centuries, through the fundamental principle that the author of a creative work has the exclusive right to authorise use of that work, whether directly as a freelance or through a contract of employment. The “creative industries” make an enormous contribution to the UK economy, thoroughly documented in others' responses.

Recently, however, mass breach of copyright in news reporting has played a substantial part in creating the crisis that now faces the news industry. Monopolistic digital advertising provision that depend on purloined news reporting to draw eyeballs has been particularly destructive. The EU, Canada and Australia have recognised this with measures designed to force the internet corporations to negotiate licences – which those corporations continue to resist.

More recently, digital corporations – including those that now face competition law proceedings internationally for their seizure of the advertising market – have “scraped” every word and image they can find to “train” their large language models (LLMs), which they market as “artificial intelligence” (AI). They have done so without permission or recompense.

Though this consultation deals only with the copyright and authors’ rights implications of this development, we cannot comment on the subject without noting the implications of LLMs for journalistic ethics and for the proliferation of misinformation and disinformation.

A sharp example of the danger of LLMs to public information at the time of drafting this is the discovery that “Grok”, the LLM recently incorporated into Twitter (currently trading as “X”) took false assertions about the attendance of UK Prime Ministers at US presidential nominations, made for partisan political purposes, and summarised these falsehoods as facts. Partisan commentators proceeded to cite Grok as a reference for those assertions. (More on this here.)

There is another matter that is sometimes – as for example in the text of the government consultation document – not recognised as an aspect of copyright law.

Ultimately, the authenticity of a journalistic work is guaranteed by the byline on it: the named journalists take responsibility for their work. As the NUJ has often observed, the theoretical rights to demand attribution (a byline) and to defend the integrity of a piece of work against distortion (for example by the distribution of altered copies) is far too hard to enforce in UK law.

In practice, the experience of members of the NUJ is that these so-called “moral rights” to identification and integrity are applied only as aggravating factors causing an uplift to damages for clear breach of their economic rights. The provision against false attribution is for this reason rarely used: in order to bring a case someone who had a work attributed falsely to them, to the detriment of their honour or reputation, would have to establish “quantum” for the case. What would be the fairly negotiated price for publication of a work that you did not create and whose publication is damaging to you? Do you have to establish the cash value of your good name to bring a civil suit for that amount?

Even enforcing the economic rights is of course difficult for the freelance journalists and other authors who typically retain copyright in their work. These are almost all sole traders, and typically the infringers of their copyright are corporations that enjoy the services of fleets of lawyers.

LLMs are machines for generating altered copies of the works on which they are trained.

LLMs are also notorious for “confabulating” responses (called by those who over-estimate their intelligence “hallucinating”). They will invent source references for untruths from whole cloth, often using the names of authors who are active in the field of the question put to them.

LLMs will output “works” that are recognisable as being based on the work of particular writers and photographers. They will makes explicit claims that named individuals have written words that they have not: claims that are contrary to the honour or reputation of those named.

This has two effects, one of global importance and one very particular to NUJ members. It will further diminish trust in news reporting, encouraging the view beloved of dangerous populists that there are no facts at all. And it will damage or destroy the reputations, and thus the careers, of individual journalists.

The above is just one of the reasons why the only basis on which use of works to train LLMs may be permitted is with explicit prior consent and payment.

We note with sadness that when His Majesty’s Government launched this consultation on 17 December it was clearly designed to promote one response, which is a policy promoted in particular by Alphabet Inc (owner of Google). The need to explain and source that editorial judgement on the consultation text was removed by the Prime Minister’s announcement on 13 January that that option was indeed already government policy.

We cannot in good faith express a preference for any of the four options presented in the consultation, given the above and the way the options are expressed.

We instead insist:

That there must be a scheme to compensate journalists (and other authors and performers) for unauthorised use of our work to date by digital corporations.
Such a scheme must fairly apportion compensation between journalists – whether employed or freelance – and their publishers – except in cases where the payment is due entirely to a freelance author or photographer.
That the only basis on which such uses may be permitted in future is with explicit prior consent. The proposal for an “opt-out” regime is identical to the policy that Google (now Alphabet Incorporated) has pursued since its inception: namely that there shall be no controls except the ROBOTS.TXT file on a website. We recall the strenuous efforts made by Google and other internet corporations to kill news industry proposals for a comprehensive and flexible standard for machine-readable permissions. The ROBOTS.TXT mechanism clearly does not work. It does not in practice allow opt-out at the level of individual works. It does not in practice allow the owner of a website to opt out of being scraped by new “robots” since their existence is announced primarily by the site log recording the fact that they have just scraped the entire site.
As search engines based on deterministic indexing are increasingly replaced by LLM-based tools, opting a website out of being “crawled” to train LLMs effectively turns into opting it out of being findable. This replicates a tactic used by Google in its battle against regulation requiring it to compensate journalists (and media owners) for the contribution our work makes to its advertising revenues: if we do not agree to it making uncompensated use of our work, that work will be excluded from search and invisible to members of the public who do not already know of its existence. This threat was most recently deployed in New Zealand in October 2024
No new exception to copyright is required or permissible.
It is not possible to craft an exception of the type suggested that does not fall foul of the UK’s international treaty obligations, in the shape of the “three-step test” set out in the TRIPS agreement, which we remind readers provides that exceptions are permissible only:
1. in certain special cases;
2. which do not conflict with a normal exploitation of the work or other subject-matter; and
3. do not unreasonably prejudice the legitimate interests of the right holders.
Further, it is hard to see how the proposed mechanism for “reserving rights” would not breach Article 5(2) of the Berne Convention, which holds that: “The enjoyment and the exercise of these rights shall not be subject to any formality...”
Any requirement to opt out would, besides breaching the two fundamental components of UK international commitments noted above, generate problems for the rule of law domestically. How could an opt-out be enforced? By an individual author suing Meta (Facebook” for the amount they would have charged for permission to train an LLM on her work, had she negotiated such a licence? It seems to be time to consider how a regime of statutory damages covering all abuses of the economic and moral rights in the works of an individual human author might be developed.
There should indeed be a “transparency” requirement on the owners of LLMs and on “AI developers” to disclose the sources of their training material.
That “transparency” requirement must provide highly “granular” data. It must for example provide a free interface that allows individual authors and performers to query the use made of all their works, searching either by author name or by the content of an individual work.
In the event that His Majesty’s Government were to ignore the argument we have made above and proceed with an opt-out mechanism, it would be absolutely vital to the entirety of the UK “creative industries” that this mechanism relate only to text and data mining and in particular that it not affect any of the existing exceptions that are linked to licensing schemes. The effects of the alternative are illustrated by the near-total collapse of Canada’s educational publishing industry following the passage of an ill-advised exception regime.
That the concept of “not-for-profit” text and data mining is of no use is illustrated by the history of image generation systems. A large part of the corpus of images on which theseo were trained was scraped by the LAION initiative. The owners of this claim that it gathered images for non-profit research under the German implementation of the EU exception for non-profit text and data mining. LAION was used to train the Midjourney and Stable Diffusion machine-learning image generators, among others; but both appear to be heading toward being for-profit operations. More recently OpenAI has, we understand, been moving from an open-source to a for-profit model.

In conclusion, we cannot help advising that the entire policy exercise of which this consultation forms a part is ill-founded. The concept of “British AI” is laughable to anyone who has any experience of networked computing. There are many reasons for this: we shall restrict ourselves to noting two.

Firstly, “first-mover advantage” means that the chances of a UK-owned “industry” springing up and succeeding in competition with the US-based corporations that have invested dollars and precious carbon emissions (and our members’ work) thus far are slim indeed.

Secondly, technologies such as this do not have a geographical location. Many of the same US corporations that are building LLMs are at the same time propelling users to move everything “into the cloud”. A data centre in Oxfordshire may generate a handful of jobs for security staff, cleaners and maintenance: but there is no other “British” component. If a data centre’s users have any sense at all they will “mirror” elsewhere everything they do using the computational resources it contains – perhaps in Bloemfontein if recent reports are not hype. Either part of the operation could be moved to Portland, Oregon at the click of a button. In fact, the users of the computing resources may not bother to enquire where their bytes are.

The consultation gov.uk

'Artificial Intelligence' our coverage to date

'AI' opens up an old philosophical wound by retailing rubbish