AFTER colour film came into general use in the mid-20th century, manufacturer Kodak sent out reference cards to help developers achieve the best colour matching in photographs.
The so-called Shirley cards, distributed for several decades from the 1950s, showed smiling Caucasian women with porcelain complexions.
On many of the reference cards, the women’s skin colour was labelled “normal”.
Defining one particular kind of skin – one particular kind of human – as normal, and all others as deviations from that norm, can have serious consequences for a range of social outcomes including health.
Historical under-representation of women in cardiovascular studies, and the resulting extrapolation from data collected in men, has been linked to underdiagnosis and undertreatment of female disease.
The rapidly growing field of artificial intelligence (AI), or machine learning, promises improved and more accessible diagnostic tools in many conditions, but it also poses new risks of exclusion.
A common acronym in computer science is GIGO: “garbage in, garbage out”.
It seems obvious that basing an algorithm on inadequate or exclusionary data might fit that description. It is certainly unlikely to produce a clinical tool that is equally useful in all populations.
US dermatologist Dr Adewole Adamson has, for example, questioned the accuracy of algorithms designed to distinguish between benign and malignant moles for people with darker skin.
The problem, Dr Adamson writes, is that the data used in AI tools mostly come from images of light skin.
While melanoma is significantly more common in pale-skinned people in the US, it is diagnosed later in people of colour and is more often fatal, he writes.
“I know that AI has huge potential to clear up health disparities,” Dr Adamson recently told the BMJ. “We just need to take the time to vet it properly like we would any other new health product.”
If an algorithm was not based on a representative sample, a provider should state that it was only suitable for use on white people, he said.
Lack of transparency around data sources and systems design is a general issue with AI, as Australian data expert Ellen Broad makes clear in her thought-provoking book, Made by humans: the AI condition.
One of the problems is that datasets always, by definition, paint a picture of the past, yet we use them to predict the future.
Broad describes AI systems built on historical data that have unintentionally entrenched prejudice in areas as diverse as human resources recruitment and the criminal justice system.
“It’s becoming clear that lots of the data sets we want to make predictions with, from hospital admissions to medical records to CVs to student report cards, are not as neutral as we would want them to be to ensure unbiased, ‘accurate’ results,” she writes.
Algorithms designed to predict the likelihood of criminal recidivism, for example, may be distorted if people from particular ethnic groups or socio-economic backgrounds are more likely to be arrested and charged than others.
Where such an algorithm is used to guide sentencing – as with the controversial COMPAS tool in the US – the potential to perpetuate disadvantage is obvious.
From a health care perspective, a team of US researchers looked at the ways bias might be embedded in clinical algorithms and the potential for this to exacerbate disparities in care.
Records might be inadequate, and the date therefore distorted, for those groups of patients who were more likely to receive fractured care or only be seen when their condition was severe, the researchers wrote. This was known to be the case for several vulnerable populations, including immigrants and people of lower socio-economic status or with psychosocial issues.
Some groups might also be more likely to be seen by less experienced clinicians who could make different clinical decisions and record data differently from those treating wealthier patients, leading to further inconsistencies in the dataset.
To avoid amplifying existing disparities in health care, clinical decision support algorithms should be tested for discriminatory elements at all stages of development, the researchers concluded.
We tend to think of data as neutral but, like anything produced by humans, they are far from that.
If we want to avoid decision-making tools that are the 21st century equivalent of Shirley cards, we need to make sure that both the system design and the underlying data adequately reflect the diversity of the human populations they will be applied to.
Jane McCredie is a Sydney-based health and science writer.
The statements or opinions expressed in this article reflect the views of the authors and do not represent the official policy of the AMA, the MJA or InSight+ unless so stated.