Is Your Data Biased? Lessons from COVID-19 and the Healthcare Industry

Artificial intelligence (AI) and machine learning (ML) algorithms have transformed products and industry by accelerating data-driven decisions and enhancing technology functionality. But they are not without flaws and can be inherently biased. Although the hope is that these tools would help bridge the divide between class, race, gender, socio-economic status and more, when data from the real world is tainted with bias and misinformation, AI will inevitably fail us. How do we consciously and ethically use AI to reduce the inequities in society?

Racial Bias Permeates AI Scripting and Outcomes

Bias in AI algorithms permeate many industries, including healthcare and high tech. AI can lead to misleading findings due to incomplete or poor data. People of color have historically faced prejudice within the medical community through denial of care, lack of access, poor resource allocation, bias in medical diagnoses, and more. Furthermore, due to systemic racism, historical medical data on people of color is either biased, skewed, or missing. Feeding this kind of incomplete and inaccurate data into an AI model or algorithm only gives us biased, skewed or missing results – and these implications are deadly.

For example, many of the diagnostic tools and algorithms that detect malignant moles or analyze images of skin are not as accurate on darker skin tones as lighter ones. Melanoma, a skin disease that affects more white Americans than people of color, has a much higher mortality rate among Black Americans due to a failure in diagnosing the condition during its early stages. In fact, according to the American Cancer Society, the five-year survival rate for melanoma among white Americans is 94% versus only 66% among Black Americans.

In the past year, the COVID-19 pandemic has shown the deep divide in access to care and medical treatment between white Americans and communities of color. These at-risk communities have had limited access to COVID-19 testing sites. And, according to NPR, in the early stages of the pandemic, not all testing sites collected demographic data. Therefore, AI and ML models that use this kind of incomplete or biased data, in addition to a history of implicit bias in the medical community, inadvertently have far-reaching consequences that determine:

  • who needs emergency PPE equipment;
  • who is in greater need of financial relief;
  • which hospitals and communities need more support; and
  • who needs the vaccine.

As a result, Black, Latinx and Indigenous peoples have been disproportionally affected by COVID-19.

Taking the Bias Out of AI and ML Algorithms

AI and machine learning help us extract valuable information. In many ways, AI can be more efficient and accurate than humans in recognizing patterns and uncovering insights. However, if society’s implicit biases and history of systemic racism isn’t actively addressed then these patterns are mirrored and perpetuated in the AI-driven solutions we create. So how can companies help?

  • An Intersectional Approach. Acknowledge that data are biased due to a history of systemic racism, sexism, and other prejudices. When using historical data, understand that the data may not completely reflect the attitudes, behaviors, and thinking of all people. A good data scientist will check data quality to detect biases before programming ML or AI algorithms. Raise questions around the use of ethical and complete data is a way to ensure data-driven solutions are holistic in nature.
  • Analyze the In-Betweens. Do not assume that data yields all answers or paints an accurate depiction of society. Analyze from the source of the data, what is missing from the database or determine additional unbiased data sources that you need. Identify gaps in data and focus on segments of the population that might otherwise be overlooked when developing data-driven solutions.
  • Conscious and Ethical AI Solutions. AI can be a powerful tool for social change and, when used correctly, can help eliminate the inequities we see in society. The advantage of using AI is that it is capable of processing and analyzing large volumes of data. By identifying data and information gaps, we can supplement existing data with proxies into our machine learning models. The recommendations and conclusions that result from these supplemented models give us a better chance at developing more equitable solutions.

The Way Forward

Heavily relying on historical precedent to drive future solutions unintentionally perpetuates prejudice in our systems. AI and machine learning algorithms draw in data from the world as it currently stands, not as the just society we are striving to create. As innovations continue to advance, it is imperative to recognize bias in our data, identify where the gaps exist, and work to develop more inclusive databases for the future.