An artistic colorful abstract

The Importance of High Quality Data for AI Reliability

Garbage in = Garbage Out = High Risk 

By Lindsay Westervelt  J P Systems, Inc.

Artificial Intelligence (AI) is a hot topic right now in healthcare for numerous reasons. Namely, AI has the potential to reduce clinical burnout and fatigue by improving Clinical Decision Support. 

AI uses Machine Learning to create predictive logic and “act” accordingly. (www.sas.com)  For Healthcare, AI and Machine Learning algorithms must rely on copious amounts of granular, high quality data. That’s why we’ve asked the nationally recognized J P Systems’ Clinical Data Quality team to weigh in on how data quality affects AI algorithms and what healthcare organizations and developers can do to maximize AI-supported outcomes.

Question: What constitutes data quality and how does it affect healthcare?

Data Quality Team (hereafter DQT): Data quality work involves analyzing clinical data using a variety of tools. We assess data formats, completeness, adherence to prescribed standards, required value sets, and message constraints. We also review healthcare domains, which include not only demographics and payor values, but all of the clinical domains (e.g., Lab tests/ results or Allergies). 

The goal of this work is to improve the quality of healthcare data exchanged between health networks and their community partners. Healthcare providers need complete patient data so they can make critical decisions about patient care. However, 50-70% of exchanged patient records aren’t usable due to missing, miscoded, or misplaced data. This increases the clinician’s burden if they don’t trust the data, which is often the case when internal and external data are mixed within systems that don’t require high quality clinical data.

Ultimately, poor quality clinical data exacts a cost on everyone. Physicians become frustrated or struggle with increased burnout because they have to repeat tests or spend valuable time searching for data. The clinical environment suffers when physicians schedule multiple appointments for one patient to repeat tests and deny other patients those time slots. Patients don’t receive proper care, but they do receive multiple bills that their insurance might not cover. Finally, researchers and developers have to create work arounds, define and implement data transformations, or hire more people to figure out data problems.

Q: How could data quality affect Machine Learning?

DQT: There are multiple types of Machine Learning, but all of them require inputting data into an algorithm to create AI logic. (link) To support Machine Learning and AI efforts in healthcare, developers work closely with clinicians to refine the process by testing the algorithm, collecting feedback, and using that feedback to improve it. 

Data is stored in EHRs and can be exchanged through the use of HL7 Consolidated Clinical Document Architectures (C-CDAs). The C-CDAs are broken up into clinically logical groups and domains (e.g., allergies, medications, and documents). C-CDAs are organized in a way that reflects clinician workflows and EHR vendors’ system architecture. 

To properly structure data and system architecture, an organization must abide by rules, which are called data standards. These standards are used to provide guidance, explain data elements and groupings, and improve data exchange between various healthcare providers. Previously these rules fell under Office of the National Coordinator (ONC) Common Core; now it is moving to the United States Core Data for Interoperability (USCDI) and HL7 FHIR® resources. 

Under FHIR®, EHR systems will need to have all FHIR® resources (think: clinical data message structures) in place so that a receiving system can accept the exchanged data. Otherwise, the receiving system will reject the sender’s data and return it with an error message. This could affect Machine Learning because tools like Natural Language Processing (NLP) rely on data to parse text for patterns and irregularities. 

There’s a direct relationship between the quality of clinical data and AI functioning. If an AI doesn’t have the proper clinical data or format to return a right answer, it can’t learn well, and it can’t find innovative pattern matching. AI has a lot of functions, and none of them can be done with partial data. 

Q: What would it take for AI to function in healthcare as needed?

DQT: When people think about interoperability, they’re usually only thinking as far as patient matching or harmony with value sets, but there’s more to it than that. EHR systems have been around since the 70s and didn’t start exchanging data until the late 90s/early 2000s. That means we have 25 to 30 years of clinical data in systems that are old, irrelevant, or formatted incorrectly, and the migration of historic data into current standards is challenging.

We encourage healthcare organizations to start with baby steps; don’t just jump into AI and turn it on. Discuss the need to have clinical data quality thresholds, clinical data defects, or established taxonomies. 

We still have a lot more work to do, and everyone has a part to play. We need to not just check the boxes on the Cures Act and data standards; we need to make sure we’re doing all we can to exchange high quality data effectively. This includes auditing all types of data coming in from external partners. That starts at the data level because AI cannot reach its potential with low quality data. Period.