Data: Just What the Doctor Ordered

Researchers at the Center on Artificial Intelligence Research for Health will use AI to improve health outcomes.

You need to be screened for a disease. Would you rather have your photo taken or go through extensive, expensive and invasive genetic testing?

Sounds too good to be true, but it’s a real question thanks to advancements in artificial intelligence and the work of Wael AbdAlmageed, research associate professor at the Ming Hsieh Department of Electrical and Computer Engineering at USC Viterbi. He is using AI and facial recognition analysis to accurately predict congenital adrenal hyperplasia, a disease that causes subtle facial changes.

AbdAlmageed’s research is one example of the work being done at the Center on Artificial Intelligence Research for Health, or AI4Health. The center is part of the $1 billion-plus USC Frontiers of Computing initiative and the new USC Silicon Beach Campus. Housed at the Information Sciences Institute, AI4Health aims to nurture collaborations between researchers in AI and in the health sciences.

“ISI has already been using AI for health research,” said director Michael Pazzani, who works with co-directors AbdAlmageed, Jose-Luis Ambite, Abigail Horn and Carl Kesselman. “One of the goals of AI4Health is to do it more systematically to make it easier for medical school researchers to find people with expertise in AI.”

Pazzani sees huge potential in how AI might improve health care. “AI can help clinicians more accurately and quickly diagnose and treat patients; help researchers discover the relationships between symptoms, diseases and treatments; and potentially allow us to alert people when they need to seek medical care, particularly before a condition causes significant harm and requires costly treatments,” he said.

AI4Health already has more than a dozen researchers working on AI research as it applies to health. Following are some of their projects.

Jose Luis-Ambite

How Hospitals Learn from One Another’s Data


Jose-Luis Ambite, research associate professor of computer science and ISI research team leader, and Dimitris Stripelis, ISI research scientist


Data (and lots of it) is what powers AI. How can AI be used in the medical field when patient data must remain private?


Generally, the more data an AI model is trained on, the better its performance and accuracy. In medical domains, where strict privacy regulations are in place to protect patients, data scarcity is a challenge. The solution? Federated learning. 

Federated learning is a method for training machine learning models collaboratively, using data from various distributed sources, without ever sharing the data itself.

For example, let’s say Hospital A has lots of MRI images; tumors have been identified on some of them. Hospital A could use these images to train their own local machine learning model, model A, to identify tumors on incoming MRI images. Hospitals B through Z might all do the same. So, we have local models A through Z, all trained to look for tumors on MRI images.

The hospitals can’t share their MRI images because of privacy regulations, but they can share their local models. So, local models A through Z are aggregated to create the global model. The global model is sent back to each of the hospitals for further training on their local MRI images. Those new local models A through Z are aggregated again to create a new global model.

This global model updating process repeats, becoming more accurate with each iteration. The goal is “model convergence,” the point at which additional training cannot improve the global model’s performance. For example, after many iterations, the global model is tested on a dataset and finds certain tumors. There’s another iteration of training and when it is tested again, the updated global model is no better at finding tumors than it was before the last iteration. At that point, it has reached convergence.

With an eye on convergence, Ambite and Stripelis have developed a novel method of federated learning that is higher-performing and more secure than the current state of the art.

“Our focus has been on developing algorithms with fast convergence,” Ambite said, noting that this saves time and energy, which is particularly important as the types and number of data sources increase. The ISI team’s federated learning method can be applied to the “Internet of Things” — cellphones, sensors, anything collecting personal health data — which means potentially millions of devices acting as individual data sources.

And while the data itself doesn’t need to be encrypted because it always remains behind a firewall at the source of origin, the ISI team has increased privacy protections. Stripelis explained, “No private data leaves a source (i.e., a hospital), and all local models are exchanged and aggregated using fully homomorphic encryption.” 


The team had access to a large body of biomedical data to test their methodology. First, they trained a model using the entire dataset, the centralized model. Then, they partitioned the dataset onto different servers to simulate a realistic biomedical federated environment and trained the federated model. Each of these servers represented a hospital that could not share its data with the larger group, rather, could only share its own local model. 

“And the surprise was, the performance of both the centralized and federated models was essentially the same. So that was very good news!” Ambite said. In other words, the model trained using Ambite’s and Stripelis’ federated learning method performed as well as the model trained using all the data with no privacy restrictions.


Ambite and Stripelis have a vision — and a provisional patent — for the future of this technology and the federated learning ecosystem at large. They envision the Federated Learning Marketplace, a platform where data owners form federations and a model provider allows participants to either freely use the federated models if they contribute their data or pay a fee that is distributed back to the data owners. It’s a way of incentivizing hospitals to share data and allow small hospitals or clinics to use a federated model for free if they share their datasets.

“This way, more people are incentivized to build better and bigger models,” Ambite said. “We think this could be very impactful in the medical domain.”

Yolanda Gil

Building AI Scientists


Yolanda Gil, ISI Fellow and senior director for strategic initiatives in AI and data science; research professor in computer science and in spatial sciences


Scientists use the scientific method to find answers to problems by observing, asking questions and using testing and experimentation. If AI could reason about data and science in the same way scientists do, we could scale up research, ask more questions, continually update findings and more. So, is it possible to create AI systems that think like scientists? 


“First, we study how scientists work, or what is technically called cognitive task analysis,” Gil explained. “This is when we try to understand the reasoning behind science research by interviewing scientists and looking at how they approach new questions. Then we reproduce their problem-solving process with an AI.” 

The AI system Gil and her team have developed is called DISK: automated Discovery of Scientific Knowledge. It can be deployed across many domains and is currently being used to tackle several health challenges at AI4Health. For example, DISK is being used by neuroscientists in the ENIGMA (Enhancing Neuro Imaging Genetics through Meta Analysis) Consortium, which brings together researchers in imaging genomics to understand brain structure, function and disease based on brain imaging and genetic data. With ENIGMA, data is collected from organizations (e.g., hospitals and universities) worldwide. The data is never shared. Instead, the organizations analyze their own data and combine their results together into global findings.  

“We start with a scientific question,” Gil said. “Perhaps I want to see if aging is correlated with the size of the hippocampus in the brain. Does the hippocampus grow or shrink as we age? We would talk to scientists, and they might tell us the first thing that they would do is find and gather together data that are in particular age ranges that include measurements of hippocampus size. DISK can do a very complicated data query. In this example, I want to have clinical studies that collect MRI imaging data that looks at hippocampus volume, includes age and can be used to estimate size for individuals of different ancestry or gender.”  

Using DISK, researchers can quickly and efficiently analyze a scientific question like this, using the methodology a scientist would use but at a much larger scale and using far more data than without AI.


“In applying DISK to different scientific domains, I really thought that the methods that scientists typically apply would be far more diverse, that we would see fewer regularities,” said Gil. “But we see a lot of common methods that are followed across problems and domains that appear to be very different. So, when we take DISK to a new domain, we can actually reuse quite a bit in terms of the general problem-solving structure.”  


“With more data gathered from more individuals, the confidence of the results and findings increases,” Gil said. “Scientific papers often report initial findings based very few data points that are available.” 

Gil said she envisions a world where “a scientist would publish a paper with an original breakthrough question and then we’d let the AI update the results as we gather more data through more clinical trials, more sensor data and novel data collection instruments.”

Michael Pazzani

Explaining an AI Diagnosis to Clinicians and Patients


Michael Pazzani, ISI principal scientist and director, AI4Health


For a medical diagnosis from AI to be useful, both the clinician and the patient must understand it. Can AI be used to explain its diagnosis?


There are a few important steps in using AI to diagnose medical conditions. First, the AI system discovers patterns from electronic health records where the correct diagnosis is known. Then, it applies those patterns to new patients whose diagnoses are unknown. And finally, it explains the rationale for the diagnoses to the clinician and the patient. 

“AI systems that automate discovering patterns in electronic health records and applying these patterns to new patients is a mature area in which we have been involved for more than a decade,” Pazzani said. “For example, systems we have developed can identify whether a mole is benign or cancerous from a photograph of the mole. Other systems our group has worked on can diagnose diseases such as pneumonia from chest X-rays, or glaucoma from photos of the retina.” 

Deploying these systems in practice, however, requires explaining the diagnosis to the clinician and to the patient. “Current systems either output a probability, such as there is 94% chance a mole is cancerous, or produce a developer-centric explanation that requires a Ph.D. in engineering to understand,” Pazzani explained. His team is developing a user-centric approach that can learn from data and then produce explanations that clinicians and patients can easily understand. “Our explanations are informed by studying the explanations produced by clinicians as they interact with their peers, with interns and with patients,” he said.


“The key insight is that instead of learning the direct relationship between an image and a disease, we learn to recognize diagnostic features and then the relationship between these features and a disease,” Pazzani said. “In melanoma, diagnostic features include whether the mole has an asymmetric border, white streaks or variations in color. In glaucoma, they include whether there is a hemorrhage and whether the rim of the optic nerve is thin. So our AI would use these features to help explain the diagnosis to the clinician and patient.”


Pazzani sees a not-too-distant future where, with such a system, the neighborhood optometrist might screen for glaucoma as effectively as an ophthalmologist, or the average person could determine if a mole should be biopsied in a medical clinic.

Carl Kesselman

For Good Science, You Need to Start with Good Data


Carl Kesselman, the William H. Keck Chair of Engineering and professor in the Daniel J. Epstein Department of Industrial and Systems Engineering; director of ISI’s Informatics Systems Research Division


“The AI revolution is predicated on good data,” said Kesselman. “Without good data, careful evaluation, transparency and reproducibility, you produce bad results.” So how do we make sure we’re using good data?


Kesselman and his team are making good data accessible to the dental, oral and craniofacial research community with FaceBase, a resource that provides open access to genetic, molecular and imaging data.

“We collect the data. We provide methods that help ensure the quality of the data. We distribute it so that it can be used as input for AI algorithms,” Kesselman explained. “We’re taking all this diverse, sometimes messy, sometimes bad data, and we’re providing rigor and structure to it so that it looks more like science and less like alchemy.” 

An important goal of AI4Health is to ensure results are transparent, reproducible and correct. “Just because I have an AI algorithm that I think is better than your algorithm, or perhaps better than a doctor, how do I know?” Kesselman asked.

The answer is, you don’t — unless you’re able to reproduce the results. In fact, a 2016 survey of more than 1,500 scientists conducted by the publication Nature found that 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.

“So, this notion of reproducibility and transparency is really important, especially if we’re making diagnoses associated with people’s health,” said Kesselman.

FaceBase has become such an important source of quality data that Kesselman and his team have received funding from the National Science Foundation and the National Institutes of Health specifically to work on making FaceBase data AI-ready.


“I think there are two surprises,” Kesselman said. “The first is just how bad this situation is. There are very few good solutions for taking care of your data. It is all very ad hoc, and there is a lot of room for improvement.” The second surprise? “What we found is that, given the types of technology and approaches that we’ve developed, we’ve been surprised by how willing many people are to try to do better.”


“If we are successful, we will get to the point of new knowledge and new results, using methods that are radically faster than what we currently have, because we’ve made the process of discovering new things much more efficient,” Kesselman explained. “But also, the things that we create will be correct. We will significantly reduce the number of erroneous and retracted results. Our goal is that we produce better results, better health care, better science, more rapidly and more correctly.

“So, that’s our big picture goal — nothing less than we want to revolutionize the way that medicine is done,” he added.

Wael AbdAlmageed

The Picture Worth a Thousand Tests


Wael AbdAlmageed, research associate professor, Ming Hsieh Department of Electrical and Computer Engineering; ISI research director and founding director of the Visual Intelligence and Multimedia Analytics Laboratory (VIMAL)


Many medical conditions have phenotypes associated with them, that is, observable characteristics that occur because of the condition. Given the advances being made in computer vision, can AI “see” phenotypes to diagnose medical conditions and improve the quality of care and life of patients?


With computer vision — the subfield of AI that enables computers to acquire, process and analyze digital images and make recommendations or decisions based on that analysis — images are quite literally more than meets the eye. AI can “see” subtleties in images that a human cannot. 

For example, the subtle changes in the face that correlate with congenital adrenal hyperplasia, or CAH, a genetic condition that affects the adrenal glands.

AbdAlmageed and his team are using AI to look for a phenotypic biomarker — an observable, measurable and objective characteristic that is a result of CAH. These are different from genotypic biomarkers that cannot be observed (e.g., bloodwork showing low cortisol) or patient-reported symptoms that are subjective (e.g., feeling tired).

“We are trying to find a cost-effective, noninvasive phenotypic biomarker that can be used to assess the severity and progression of congenital adrenal hyperplasia and personalize disease management and treatment to improve the quality of life and health outcomes for CAH patients,” said AbdAlmageed.

To do this, they look at morphology, the physical form and structure of a patient. “We are using AI to analyze the features and 3D morphology of the face and correlate these features with CAH,” he said.


“We were first ever to discover that facial features can be used as a phenotypic biomarker for this disease, which opens doors for similar biomarkers for similar genetic disorders, such as fetal alcohol syndrome and congenital hypoventilation syndrome,” AbdAlmageed noted.


AbdAlmageed hopes to develop “FDA-approved systems to help pediatricians manage this and similar diseases to improve the quality of life of children.”