The launch of Nightingale Open Science last month – a new hub of medical data sets that could train AI (Artificial Intelligence) systems to predict and diagnose conditions earlier and better – was exciting to see. Since AI can only ever be as good as the data it’s trained on, the open source, high-quality, deidentified data sets offered by the platform will be welcomed by an AI community that struggles to find or access appropriate data to train their models (as reported by the FT – subscription required). It would be interesting to dive under the hood and see exactly what data types and sources Nightingale Open Science is using, and how that data is being curated, to really assess its value.
I see many comments on the FT article that data sets like those offered by the Nightingale platform are nothing novel, citing the UK biobank as an example. Electronic health records (EHRs) are another great data source in existence, with the potential to provide comprehensive and high quality data to our healthcare professionals, clinical researchers and data scientists. The point is that there are not enough high quality data sets available and access to them is very variable. Inadequate digital infrastructure and resources within national healthcare systems means many can’t really make the most of the invaluable data they are sitting on, and so rely on third parties such as Nightingale to create meaningful data sets that can be used for public benefit.
Moving back to the issue of improving the quality of AI-based decision-making, for me the most positive aspect of the Nightingale platform is its aim for representativeness. Its data is mainly collected from the US and Taiwan but it is soon expanding that collection to Kenya and Lebanon. AI algorithms have been shown to amplify existing health disparities because they’ve been trained on healthcare cost data as a proxy for healthcare needs. Minorities, for example, may seem less costly to the healthcare system, but it may just mean that they haven’t been able to access the healthcare that they need, and are therefore underserved by the healthcare system and not represented in the underlying data. It’s this bias that new medically diverse and high quality data sets, including those created by Nightingale Open Science, are aiming to overcome.