Medical History May Help Predict Autism in Young Children

Summary: New machine learning models assess the connection between hundreds of clinical variables, including doctor visits and health records for seemingly unconnected conditions, to predict the likelihood of ASD in young children.

Source: Penn State

Medical insurance claims might do more than help pay for health concerns; they could help predict them, according to new findings from an interdisciplinary Penn State research team published in BMJ Health & Care Informatics.

The researchers developed machine learning models that assess the connections among hundreds of clinical variables, including doctor visits and health care services for seemingly unrelated medical conditions, to predict the likelihood of autism spectrum disorder in young children. 

“Insurance claim data, which is de-identified and widely available in marketing scan datasets, provides thorough, longitudinal medical details about the patient,” said corresponding author Qiushi Chen, assistant professor of industrial and manufacturing engineering in the Penn State College of Engineering.

“The scientific literature in the field suggests that kids with autism spectrum disorder also often have higher rates of clinical symptoms, such as different types of infections, gastrointestinal problems, seizures, as well as behavior indications.

“Those symptoms are not a cause of autism but are often manifested among kids with autism especially at young ages, so we were inspired to synthesize the medical information to quantify and predict that associated likelihood.”

The researchers fed the data into machine learning models, training it to assess hundreds of variables to find correlations that are related to an increased likelihood for autism spectrum disorder. 

“Autism spectrum disorder is a developmental disability,” said co-author Guodong Liu, associate professor of public health sciences, of psychiatry and behavioral health and of pediatrics at Penn State College of Medicine.

“It takes observation and several screenings for a clinician to make a diagnosis. The process is usually lengthy, and many kids miss the window for early interventions — the most effective way to improve outcomes.” 

One of the commonly used screening tools to help identify young children with an elevated likelihood of autism spectrum disorder is called the Modified Checklist for Autism in Toddlers (M-CHAT), which is normally given at routine well-child visits at 18 and 24 months old. It consists of 20 questions focused on behaviors related to eye contact, social interactions and some physical milestones such as walking.

Guardians answer based on their observations, but, according to Chen, development varies so significantly at these ages that the tool may misidentify children. As a result, children often are not officially diagnosed until they are four or five years old, meaning they miss years of potential early interventions. 

“Our new model, which quantifies the sum of identified risk factors together to inform the likelihood level, is already comparable to — and in some cases even slightly better than — the existing screening tool,” Chen said.

“When we combine the model with the screening tool, we have a very promising approach for clinicians.” 

According to Liu, it would be practically feasible to integrate the model with the screening tool for clinical use. 

“A unique strength of this work is that this clinical informatics approach can be easily incorporated into the clinical flow,” Liu said.

“The prediction model could be embedded in a hospital’s Electronic Health Record system, which is used to chart patient health, as a clinical decision support tool to flag the high-risk children so that both clinicians and the families could take actions sooner.” 

This work, funded by the National Institutes of Health, the Penn State Social Science Research Institute and the Penn State College of Engineering, is the basis of a new $460,000 grant awarded to Chen and Whitney Guthrie, clinical psychologist at the Children’s Hospital of Philadelphia Center for Autism Research and assistant professor of psychiatry and pediatrics at the University of Pennsylvania Perelman School of Medicine, by the National Institute of Mental Health. 

They are using the new grant to analyze precisely how well the combined hospital record data and screen results predict autism diagnoses, as well as exploring other potential screening tools that could better equip clinicians to help their patients. 

“Not only is the current tool missing many children on the autism spectrum, but many children who are detected by our screening tools experience long waitlists because of our limited diagnosis capacity,” Guthrie said.

This shows a stethoscope
They are using the new grant to analyze precisely how well the combined hospital record data and screen results predict autism diagnoses, as well as exploring other potential screening tools that could better equip clinicians to help their patients. Image is in the public domain

“Although it does detect many children, the M-CHAT also has very high rates of false positives and false negatives, which means that many autistic children are missed, and other children are referred for an autism evaluation when they may not need one. Both problems contribute to the long wait — often many months or even years — for further evaluation.

“The consequences for children who are missed by our current screening tools are particularly important because delayed diagnosis often means that children miss the window for early intervention entirely. Pediatricians need better screening tools to accurately identify all children who need an autism evaluation as early as possible.”

Part of the problem is the limited number of psychologists, developmental pediatricians and other experts in pediatric development who can make an autism spectrum disorder diagnosis. According to Chen, the solution may exist in industrial engineering. 

“The key idea is improving how we use resources,” Chen said. “With Dr. Guthrie’s clinical expertise and my group’s modeling capabilities, we aim to develop a tool that primary care physicians without specialized training can apply to make confident assessments to diagnose children as early as possible in order to get the care they need as soon as possible.” 

Additional paper contributors include first author Yu-Hsin Chen, a graduate student pursuing her doctorate in industrial and manufacturing engineering who will also write her dissertation on the grant work; and co-author Lan Kong, professor of public health sciences, Penn State College of Medicine.  

About this autism research news

Author: Adrienne Berard
Source: Penn State
Contact: Adrienne Berard – Penn State
Image: The image is in the public domain

Original Research: Open access.
Early detection of autism spectrum disorder in young children with machine learning using medical claims data” by Qiushi Chen et al. BMJ Health & Care Informatics


Early detection of autism spectrum disorder in young children with machine learning using medical claims data


Early diagnosis and intervention are keys for improving long-term outcomes of children with autism spectrum disorder (ASD). However, existing screening tools have shown insufficient accuracy. Our objective is to predict the risk of ASD in young children between 18 months and 30 months based on their medical histories using real-world health claims data.


Using the MarketScan Health Claims Database 2005–2016, we identified 12 743 children with ASD and a random sample of 25 833 children without ASD as our study cohort. We developed logistic regression (LR) with least absolute shrinkage and selection operator and random forest (RF) models for predicting ASD diagnosis at ages of 18–30 months, using demographics, medical diagnoses and healthcare service procedures extracted from individual’s medical claims during early years postbirth as predictor variables.


For predicting ASD diagnosis at age of 24 months, the LR and RF models achieved the area under the receiver operating characteristic curve (AUROC) of 0.758 and 0.775, respectively. Prediction accuracy further increased with age. With predictor variables separated by outpatient and inpatient visits, the RF model for prediction at age of 24 months achieved an AUROC of 0.834, with 96.4% specificity and 20.5% positive predictive value at 40% sensitivity, representing a promising improvement over the existing screening tool in practice.


Our study demonstrates the feasibility of using machine learning models and health claims data to identify children with ASD at a very young age. It is deemed a promising approach for monitoring ASD risk in the general children population and early detection of high-risk children for targeted screening.

Join our Newsletter
I agree to have my personal information transferred to AWeber for Neuroscience Newsletter ( more information )
Sign up to receive our recent neuroscience headlines and summaries sent to your email once a day, totally free.
We hate spam and only use your email to contact you about newsletters. You can cancel your subscription any time.