New causes of autism found in ‘junk’ DNA

Summary: Using artificial intelligence, researchers discover mutations in noncoding regions of the human genome that may result in autism. The noncoding mutations are associated with altered gene regulation in children with ASD. Additionally, the mutations affect gene expression in the brain and genes already linked to autism, such as those responsible for neuron development and migration.

Source: Simons Foundation

Leveraging artificial intelligence techniques, researchers have demonstrated that mutations in so-called ‘junk’ DNA can cause autism. The study, published May 27 in Nature Genetics, is the first to functionally link such mutations to the neurodevelopmental condition.

The research was led by Olga Troyanskaya in collaboration with Robert Darnell. Troyanskaya is deputy director for genomics at the Flatiron Institute’s Center for Computational Biology (CCB) in New York City and a professor of computer science at Princeton University. Darnell is the Robert and Harriet Heilbrunn Professor of Cancer Biology at Rockefeller University and an investigator at the Howard Hughes Medical Institute.

Their team used machine learning to analyze the whole genomes of 1,790 individuals with autism and their unaffected parents and siblings. These individuals had no family history of autism, meaning the genetic cause of their condition was probably spontaneous mutations rather than inherited mutations.

The analysis predicted the ramifications of genetic mutations in parts of the genome that do not encode proteins, regions often mischaracterized as ‘junk’ DNA. The number of autism cases linked to the noncoding mutations was comparable to the number of cases linked to protein-coding mutations that disable gene function.

The implications of the work extend beyond autism, Troyanskaya says. “This is the first clear demonstration of non-inherited, noncoding mutations causing any complex human disease or disorder.”

Scientists can apply the same techniques used in the new study to explore the role noncoding mutations play in diseases such as cancer and heart disease, says study co-author Jian Zhou of CCB and Princeton. “This enables a new perspective on the cause of not just autism, but many human diseases.”

Only 1 to 2 percent of the human genome is made up of genes that encode the blueprints for making proteins. Those proteins carry out tasks throughout our bodies, such as regulating blood sugar levels, fighting infections and sending communications between cells. The other 98 percent of our genome isn’t genetic dead weight, though. The noncoding regions help regulate when and where genes make proteins.

Mutations in protein-coding regions account for at most 30 percent of autism cases in individuals without a family history of autism. Evidence suggested that autism-causing mutations must happen elsewhere in the genome as well.

Uncovering which noncoding mutations may cause autism is tricky. A single individual may have dozens of noncoding mutations, most of which will be unique to the individual. This makes the traditional approach of identifying common mutations among affected populations nonviable.

Troyanskaya and her colleagues took a new approach. They trained a machine learning model to predict how a given sequence would affect gene expression.

“This is a shift in thinking about genetic studies that we’re introducing with this analysis,” says Chandra Theesfeld, a research scientist in Troyanskaya’s lab at Princeton. “In addition to scientists studying shared genetic mutations across large groups of individuals, here we’re applying a set of smart, sophisticated tools that tell us what any specific mutation is going to do, even those that are rare or never observed before.”

The researchers studied the genetic basis of autism by applying the machine learning model to a treasure trove of genetic data called the Simons Simplex Collection. The Simons Foundation, the Flatiron Institute’s parent organization, produced and maintains the repository. The Simons Simplex Collection contains the whole genomes of nearly 2,000 ‘quartets’ made up of a child with autism, an unaffected sibling and their unaffected parents.

This shows how the AI discovered the mutations

Genes predicted to be disrupted by regulatory mutations in people with autism tended to be involved in brain cell functioning and fell into two categories. One category relates to synapses, communication hubs between neurons, and the other relates to chromatin, the highly structured form of DNA and proteins required for proper gene expression from chromosomes. The image is credited to Image courtesy the Troyanskaya lab.

These foursomes had no previous family history of autism, meaning that non-inherited mutations were probably responsible for the affected child’s condition. (Such mutations occur spontaneously in sperm and egg cells as well as in embryos.)

The researchers used their model to predict the impact of non-inherited, noncoding mutations in each child with autism. They then compared those predictions with the effects of the same, unmutated strand in the child’s unaffected sibling.

“The design of the Simons Simplex Collection is what allowed us to do this study,” says Zhou. “The unaffected siblings are a built-in control.”

Noncoding mutations in many of the children with autism altered gene regulation, the analysis suggested. Moreover, the results suggested that the mutations affected gene expression in the brain and genes already linked to autism, such as those responsible for neuron migration and development. “This is consistent with how autism most likely manifests in the brain,” says study co-author Christopher Park, a research scientist at CCB. “It’s not just the number of mutations occurring, but what kind of mutations are occurring.”

The researchers tested the effects of some of the noncoding mutations in laboratory experiments. They inserted predicted high-impact mutations found in children with autism into cells and observed the resulting changes in gene expression. These changes affirmed the model’s predictions.

Troyanskaya says she and her colleagues will continue improving and expanding their method. Ultimately, she hopes the work will improve how genetic data are used for diagnosing and treating diseases and disorders. “Right now, 98 percent of the genome is usually being thrown away,” she says. “Our work allows you to think about what we can do with the 98 percent.”

About this neuroscience research article

Simons Foundation
Media Contacts:
Anastasia Greenebaum – Simons Foundation
Image Source:
The image is credited to Image courtesy the Troyanskaya lab.

Original Research: Closed access
“Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk”. Jian Zhou, Christopher Y. Park, Chandra L. Theesfeld, Aaron K. Wong, Yuan Yuan, Claudia Scheckel, John J. Fak, Julien Funk, Kevin Yao, Yoko Tajima, Alan Packer, Robert B. Darnell & Olga G. Troyanskayai.
Nature Genetics. doi:10.1038/s41588-019-0420-0


Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk

We address the challenge of detecting the contribution of noncoding mutations to disease with a deep-learning-based framework that predicts the specific regulatory effects and the deleterious impact of genetic variants. Applying this framework to 1,790 autism spectrum disorder (ASD) simplex families reveals a role in disease for noncoding mutations—ASD probands harbor both transcriptional- and post-transcriptional-regulation-disrupting de novo mutations of significantly higher functional impact than those in unaffected siblings. Further analysis suggests involvement of noncoding mutations in synaptic transmission and neuronal development and, taken together with previous studies, reveals a convergent genetic landscape of coding and noncoding mutations in ASD. We demonstrate that sequences carrying prioritized mutations identified in probands possess allele-specific regulatory activity, and we highlight a link between noncoding mutations and heterogeneity in the IQ of ASD probands. Our predictive genomics framework illuminates the role of noncoding mutations in ASD and prioritizes mutations with high impact for further study, and is broadly applicable to complex human diseases.

Feel free to share this Neuroscience News.
Join our Newsletter
I agree to have my personal information transferred to AWeber for Neuroscience Newsletter ( more information )
Sign up to receive the latest neuroscience headlines and summaries sent to your email daily from
We hate spam and only use your email to contact you about newsletters. We do not sell email addresses. You can cancel your subscription any time.