Achieving quality data for AI in healthcare and why it is important

Out-Law Analysis | 19 May 2022 | 2:38 pm | 6 min. read

The potential of AI to streamline clinical and non-clinical processes, improve the diagnosis of diseases, and speed up the development of new medicines and other treatments depends to a large degree on confidence in the quality of data and transparency in how the technology operates and delivers specific outcomes.

The risk of harm to patients that can arise from the use of AI systems fed with poor quality data is not yet properly addressed in existing or proposed new regulation.

A drive to develop new standards has taken on growing importance as a result.

These are issues experts will explore at a forthcoming event being hosted by the Digital Leadership Forum and Pinsent Masons on 26 May.

The importance of quality data

While some life sciences companies and healthcare providers have been more cautious than organisations in some other sectors to embrace the potential of AI, many are now investing heavily in AI-related innovation.

AI systems are only as good as the data they rely on. Quality data is needed to train the systems to ensure the outcomes derived from use of the systems are appropriate and fair. If information is missing, inaccurate or flawed in another way, it will impact on the effectiveness of the AI tool.

In a healthcare context, ineffective or unreliable AI is dangerous – particularly where the faults with the underlying data are not identified or properly understood – with risks including misdiagnosis and the mis-prescribing of treatments. Where discrimination arises, there is a risk of non-compliance with equalities legislation as well as misdiagnosis and/or inappropriate treatment. The growing volume of research into AI use is shining light on these risks.

A University of Oxford study published last year found that AI systems used to detect skin cancer risk were less accurate for people with dark skin because very few of the images from which those systems are modelled feature people with dark skin.

The Centre for Data Ethics and Innovation (CDEI) has also previously questioned whether cancer treatment research trials at major pharmaceutical companies that have “overwhelmingly white participants” are truly representative. It cited a further specific trial of an algorithmic decision making tool in the West Midlands for radiology scans which it said “was primarily trained on Hungarian populations” and “may not be representative of UK patients”.

The importance of ensuring that the data on which AI systems are built is representative of the people the tool is designed to help was also stressed in another study led by Imperial College London. The researchers said: “Efforts should be made to ensure the data on which algorithms are based is representative of the populations that they will be deployed in and with sufficient breadth and depth to capture the multitude of clinically important associations between ethnicity, demographic, social and clinical features that may exist.”

The study highlighted that the UK holds some of the best health data sets globally, but said more evidence is needed to determine “whether and how AI and data-driven technologies could be utilised to help improve the health of minority ethnic groups”.

The Imperial College study further warned of the risk of perpetuating existing inequalities – it is easy to see how this could happen if AI systems are trained on data from just some demographics.

Reducing bias and ensuring data fed into AI systems is truly representative is vital to building trust in the technology. That trust is important in turn in encouraging people from all backgrounds to consent to their data being used in health research initiatives such as the training of innovative new AI systems, which then again ensures the outcomes those systems produce are more reliable and effective for everyone. It is a virtuous circle.

The shortcomings of regulation

The need for quality data for use of AI in healthcare is well understood. However, there remains a lack of clarity over what organisations need to do in practice to achieve quality data.

The lack of clarity is arguably borne out of a gap in legislation and regulatory frameworks.

Where the data being fed into AI systems constitutes personal data, as it will be where it can lead to the identification of individuals, then its use is governed by data protection law. The data protection regime applies across all sectors and includes particularly stringent requirements around the processing of health data. Its emphasis is on safeguarding data privacy and, while it imposes other notable obligations on those processing the data – such as around the data’s accuracy – the data protection framework does not address many of the ethical issues that arise in the context of the use of data and AI in healthcare.

The prospect of a specific EU regulation to address the risk of harm arising from use of AI systems goes a little way to addressing the issue. The EU AI Act that is envisaged seeks to regulate the use of ‘high-risk’ AI systems in particular and promises to embed safeguards such as human oversight requirements into EU law. It also promises to promote data governance and management practices that support the use of quality data in the training of algorithms and identification of biases. It will further require developers of high-risk AI systems to provide assurances over the accuracy of their systems.

Further new regulation has been recently proposed to facilitate the European Health Data Space (EHDS), which, it is envisaged, will allow access to health data by researchers, companies or institutions subject to getting a permit from a health data access body within an EU member state. Access will only be granted if the requested data is used for specific purposes, in closed, secure environments and without revealing the identity of the individual. It is also strictly prohibited to use the data for decisions detrimental to citizens, such as designing harmful products or services or increasing an insurance premium.

However, while the draft EU AI Act and EHDS proposals remain open to amendment by EU law makers, they currently lack the granular detail life sciences companies and healthcare providers developing or procuring AI systems would be looking for to help them understand how they or their suppliers might meet the new obligations anticipated.

No UK equivalent of the EU AI Act or EHDS has yet been proposed, although an indication of future policy direction is expected to materialise in a paper the Office for AI is currently developing on governing and regulating AI.

That “pro-innovation national position” being worked on by the Office for AI was trailed in the UK’s national AI strategy published last year. The strategy also drew attention to the role for standards in facilitating trust in AI and growth in use of the technology. It said: “The integration of standards in our model for AI governance and regulation is crucial for unlocking the benefits of AI for the economy and society, and will play a key role in ensuring that the principles of trustworthy AI are translated into robust technical specifications and processes that are globally-recognised and interoperable.”

Standards initiatives and next steps for healthcare organisations

One action initiated from the national AI strategy is the piloting of a new AI Standards Hub, led by the Alan Turing Institute and supported by both the British Standards Institution (BSI) and National Physical Laboratory. The development of new standards for AI is seen as vital to giving bodies operating under a new AI assurance scheme being developed by the CDEI something to reference against.

In the national AI strategy, the government recognised the need for there to be alignment globally on new standards. It referenced the existing work the BSI is leading on with international partners on AI international standards around concepts and terminology; data; bias; governance implications; and data life cycles. The AI Standards Hub is expected to build on the existing work, “coordinate UK engagement in AI standardisation globally, and explore with stakeholders the development of an AI standards engagement toolkit to support the AI ecosystem to engage in the global AI standardisation landscape”.

Funding for specific data-related standards for AI in healthcare was also announced by health secretary for England, Sajid Javid, last year. It is another indication of the increased awareness of the need to provide organisations in the sector with more detailed requirements to help them manage data risk effectively when using the technology and design data governance frameworks around AI development.

For life sciences companies and healthcare providers, the prospect of new standards that specify technical requirements around the use of AI – including around data quality – will be welcome. Vertical standards would help fill an existing gap in regulation. They should take every opportunity to engage in the development of the standards to ensure they are effective and can be implemented in practice.

The issue of data quality boils down to ethics and strong governance. While ethics in AI is a new thing for many organisations, ethics policies and ethics committees are core to the activities of life sciences companies and health providers. There is therefore a natural governance framework in place that can be tailored for AI development and use in healthcare and mapped to emerging regulatory requirements and new standards.

Where life sciences companies and health providers source their AI from third party technology suppliers, they will want to put in place contractual provisions that ensure that appropriate data quality standards have been applied and that other assurances are provided over the accuracy and reliability of the AI system.

Accelerating the use of quality and trustworthy data in the health and life sciences sector is the topic of a forthcoming event being hosted by the Digital Leadership Forum and Pinsent Masons. The event will feature speakers from the National Physical Laboratory and AI Standards Hub, the Ada Lovelace Institute, Lovedby and Pinsent Masons’ Annabelle Richard. To attend the event, please register on the Digital Leadership Forum website.