The quality of output produced by generative AI systems is largely dependent on the quality of data used for training them. AI firms have long been seeking access to large quantities of high quality datasets to use in the development process. However, creative industry representatives continue to raise concerns over the use of their content in AI training, arguing that use without their consent and in return for payment of a royalty is copyright infringement.
A landmark case involving Getty Images and Stability AI is currently pending trial before the High Court in London. Getty Images claimed that Stability AI, in developing and training AI system ‘Stable Diffusion’, is responsible for infringing its intellectual property rights.
In its consultation paper, however, the government said it “does not believe that waiting for ongoing legal cases to resolve will provide the certainty that our AI and creative industries need in a timely fashion, or, potentially, at all” and is therefore considering “a more direct intervention through legislation to clarify the rules in this area and establish a fair balance in law”. It said it has “not settled on the precise nature of that intervention – or, if necessary, the precise nature of any legislation” and intends for the responses it receives to its consultation to shape its thinking in that regard.
The consultation paper looks at issues of copyright in two main AI contexts: the training of AI models and the outputs produced by AI systems. Various options for reform have been set out by the government in respect of each area.
In respect of AI training, the government acknowledged that the current copyright framework “does not meet the needs of [the] UK’s creative industries or AI sectors”.
“Creative and media organisations are concerned that their works are used to train AI without their permission, and they are unable to secure remuneration through licensing agreements,” the government said. “They have also highlighted a lack of transparency from AI developers about what content is or has been used and how it is acquired, which can make it difficult to enforce their copyright. Likewise, AI firms have raised concerns that the lack of clarity over how they can legally access training data creates legal risks, stunts AI innovation in the UK and holds back AI adoption.”
“The lack of clarity about the current regime means that leading AI developers do not train their models in the UK, and instead train in jurisdictions with clearer or more permissive rules. Since copyright law applies in the jurisdiction where copying takes place, this means that AI developers are not obliged to respect rights under UK law. This harms our UK AI sector too, as investment from the major AI developers is limited and UK-based SMEs who cannot train overseas are disadvantaged. We cannot allow this to continue,” it added.
One of the options the government is consulting on would be to require AI developers to obtain an “express licence” in all cases where they wish to train their models on copyright works in the UK. It said this would “provide legal certainty in how copyright law operates with AI models in the UK” and “provide a clear route to remuneration for creators”, but it said it would likely make the UK “significantly less competitive compared to other jurisdictions – such as the EU and US”, with the risk that AI developers choose to invest in other countries because the UK would be “a less attractive location for AI development”.
Another option consulted on, at the other end of the extreme, would be to extend the current text and data mining exception to copyright to allow data mining on copyright works – including for AI training – without right holders’ permission, for commercial use for any purpose and with few or no restrictions. The government said that would change the UK copyright framework radically in a way that would be “highly likely to constrain the growth of the creative and media sectors”.
At this stage, the government’s preferred option is to introduce a new data mining exception – but to do so alongside mechanisms that enable rights holders to opt their content out from being used for AI training. Those mechanisms would involve developers being transparent about the works their models are trained on and right holders, either individually or collectively, being able to “easily reserve their rights”.
Dan Conway, chief executive of UK publishing trade body the Publishers Association, said the measures being proposed “are as yet entirely untested and unevidenced”. He said: “There has been no objective case made for a new copyright exception, nor has a water-tight rights-reservation process been outlined anywhere around the globe.”