Online only

Using ‘AI’ is never safe

Robot

A robot writing computer code, generated from that prompt by Dall-e-2

BACK IN March 2023 the Freelance asked: "Is ‘artificial intelligence’ always wrong?" We concluded that everyone should always assume that such a system is "confabulating". The world stays the same: "as a journalist, take nothing at face value. Check the sources for everything."

It turns out that it's worse than that, and getting worse still. A paper published yesterday in Nature shows that humans are not good at checking the output of what the authors properly call "language models" - and are more likely to be misdirected to over-trust newer models.

The paper by Lexin Zhou at the University of Cambridge and colleagues is entitled "Larger and more instructable language models become less reliable". They looked at 32 variants of three basic models: ChatGPT, LLaMA and BLOOM. Later variants trained on more of your work in fact produced more correct answers. Whoop! Bonuses for the developers! But the larger models also spat out more false responses; and, crucially, they almost entirely stopped responding "can't answer that", or avoiding answering by, for example, changing the subject. And they continued to make errors on objectively easier problems.

Enthusiasts will respond that they are cleverer at "prompt engineering" and can get the best out of their favourite AI. Sorry: the paper reports that "users may be swayed by prompts that work well for difficult instances but simultaneously get more incorrect responses for the easy instances."

Our take-away quote: the newer models "do not secure areas of low difficulty in which either the model does not err or human supervision can spot the errors". They will lie and you are likely to fail to catch the lies.

Libel liability

So: if you go ahead and publish the output of a machine-learning system and it turns out that it libels someone, where do you stand legally? On 24 September attention was diverted from the demise of the printed Evening Standard by a story from deadline.com that the online remnant is planning to run 'AI' reviews in the style of the late Brian Sewell, clarified to be only a one-off. Those who recall the works of the waspish Brian would not be surprised if there were defamation in there.

Two teachers of journalism law, Mark Hanna and David Banks, were among those to point out that reviews by humans are often covered by the defence that they express honestly-held opinions. Can an 'AI' hold an opinion of any kind? Anyone who publishes such an artificial libel likely has no defence.

Copyright update

In August 2023 the Freelance asked: "Who owns what robots eat and what they excrete?" We concluded that AI companies' claim to be covered by the exception to copyright allowing temporary copying "looks dubious" and that use of the EU law permitting "text and data mining" for non-profit research on condition that copyright owners had not opted out was "ridiculous".

On the US doctrine of "fair use" we concluded that "if you can get the original back out from the ML system" - which you can - it seems that the answer to the question of how much does training an AI fit certain criteria to be fair use is "not very".

Now Daniel J Gervais of Vanderbilt University Law School and colleagues have published a paper "The Heart of the Matter: Copyright, AI Training, and LLMs". We are pleased to find that the authors agree - after much more and more professorial consideration - broadly with our conclusions. We present two notable conclusions:

"Does the transient or 'imperfect' nature of these copies [made during training] alter the conclusion regarding infringement? In scenarios involving the training of AI models, the answer is likely to be negative as the cases have long considered imperfect or incomplete and temporary copies as potentially infringing."

"The fact is that these numerical representations could often be 'worked backwards' to recreate a precise and complete version of the original content used as training material. For example, it has been unequivocally demonstrated that by performing data extraction attacks, it is possible to recover individual training data examples."

The authors also raise an issue that we had overlooked: the prohibitions in various jurisdictions on removing "rights management information" - which would include bylines and "metadata" in photos. Their conclusion: "there is very little case law on this type of infringement, but it is likely to be considered by the courts in a number of pending cases in the United State."

These and the other issues are likely to be heard in the UK courts too, in the case Getty Images (US) Inc & Ors v Stability AI Ltd., [2023] EWHC 3090 (Ch). Back in December Mrs Justice Joanna Smith declined to grant Stability AI's motion for summary judgment in its favour and we await hearing dates.

Of the other authors of this law paper, Noam Shemtov is at Queen Mary University of London, and Haralambos Marmanis and Catherine Zaller Rowland are with the US Copyright Clearance Center, a collective licensing body. The paper recommends collective licensing to compensate creators for the use made of our work by 'AI' It stresses that this should be voluntary licensing: we would have preferred it to stress that work should be used only with explicit, informed prior consent.