AI Learns to Design the Human Body’s Most Elusive Proteins

Summary: A new machine learning method has achieved what even AlphaFold cannot — the design of intrinsically disordered proteins (IDPs), the shape-shifting biomolecules that make up nearly 30% of all human proteins. These unstable proteins play key roles in cellular communication, sensing, and disease, yet their ever-changing structures have defied traditional AI prediction models.

Using automatic differentiation and physics-based simulations, scientists created an algorithm that can fine-tune amino acid sequences for specific functions. This breakthrough could transform synthetic biology, drug discovery, and our understanding of disorders like Parkinson’s and cancer.

Key Facts:

New AI Method: Uses automatic differentiation to design disordered proteins based on real molecular physics, not predictions.
Unlocking the Unknown: IDPs, which never settle into fixed structures, are essential for cell signaling and are linked to neurodegenerative diseases.
Broad Impact: The discovery paves the way for designing synthetic proteins for medicine, sensors, and molecular engineering.

Source: Harvard

In synthetic and structural biology, advances in artificial intelligence have led to an explosion of designing new proteins with specific functions, from antibodies to blood clotting agents, by using computers to accurately predict the 3D structure of any given amino acid sequence.

But the structure of close to 30% of all proteins expressed by the human genome are challenging to predict for even the most powerful AI tools, including the Nobel-winning AlphaFold.

This shows DNA and a person's head. — With automatic differentiation, the researchers were able to train a computer to recognize how small changes in protein sequences – even single amino acid changes – affect the final desired properties of proteins. Credit: Neuroscience News

Never settling into a fixed shape but constantly shifting around, these so-called intrinsically disordered proteins are key to countless biological functions like cross-linking molecules, sensing, or signaling, but their inherent instability makes them difficult to design from scratch.

A team at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) and Northwestern University have demonstrated a new machine learning method that can design intrinsically disordered proteins with tailored properties.

The work opens doors to new understanding of these mysterious biomolecules and possible new insights into origins of and treatments for diseases.

The work is published in Nature Computational Science and was co-led by SEAS graduate student Ryan Krueger and former NSF-Simons QuantBio Fellow Krishna Shrinivas, now an assistant professor at Northwestern, in collaboration with Michael Brenner, the Catalyst Professor of Applied Mathematics and Applied Physics at SEAS.

Shrinivas said he became interested in studying intrinsically disordered proteins because they are out of reach of current AI-based methods, such as Google DeepMind’s AlphaFold, for predicting and designing proteins with distinct conformations.

Yet, such disordered proteins are important to many fundamental aspects of biology, and it is known that mutations to these proteins are linked to diseases like cancer and neurodegeneration.

One example of a disordered protein is alpha-synuclein, long implicated in Parkinson’s and other diseases.

To design IDPs for synthetic or therapeutics uses, Shrinivas said, “we needed to either come up with better AI models, or, we needed to come up with a way to actually take those physics models where you not only get good predictions, but you also get the physics for free.”

Automatic differentiation algorithms

The paper describes a computational method powered by algorithms that can perform “automatic differentiation,” or automatic computation of derivatives – instantaneous rates of change – in order to rationally select for protein sequences with desired behaviors or properties.

The technique is a widely used tool for deep learning and training neural networks, but Brenner and his lab were among the first to recognize other potential use cases, such as optimizing physics-based molecular dynamics simulations.

With automatic differentiation, the researchers were able to train a computer to recognize how small changes in protein sequences – even single amino acid changes – affect the final desired properties of proteins.

They likened their method to a very powerful search engine for amino acid sequences that fit the criteria needed to perform a function – say, one that creates loops or connectors, or can sense different things in the environment.“We didn’t want to have to take a bunch of data and train a machine learning model to design proteins,” Krueger said.

“We wanted to leverage existing, sufficiently accurate simulations to be able to design proteins at the level of those simulations.”

The method leverages a traditional framework for training neural networks called gradient-based optimization to identify new protein sequences with efficiency and precision.

The result is that the proteins the researchers designed are “differentiable,” that is, they are not best-guesses predicted by AI, but rather based in molecular dynamics simulations, using real physics, that take into account how proteins actually, dynamically behave in nature.

Funding: The research received federal support from the National Science Foundation AI Institute of Dynamic Systems, the Office of Naval Research, the Harvard Materials Research Science and Engineering Center, and the NSF-Simons Center for Mathematical and Statistical Analysis of Biology at Harvard.

Key Questions Answered:

Q: Why can’t current AI systems like AlphaFold predict all protein structures?

A: Because nearly 30% of human proteins, called intrinsically disordered proteins, lack a stable shape, constantly shifting and defying fixed 3D models.

Q: How does this new AI method work differently to predict protein structures?

A: Instead of predicting static shapes, it uses physics-based molecular simulations and automatic differentiation to “teach” AI how sequence changes affect protein behavior.

Q: Why are disordered proteins so important?

A: These proteins drive essential biological processes and are linked to diseases such as Parkinson’s, cancer, and Alzheimer’s, making them key to future treatments.

About this AI and genetics research news

Author: Anne Manning
Source: Harvard
Contact: Anne Manning – Harvard
Image: The image is credited to Neuroscience News

Original Research: Closed access.
“Generalized design of sequence–ensemble–function relationships for intrinsically disordered proteins” by Ryan Krueger et al. Nature Computational Science

Abstract

Generalized design of sequence–ensemble–function relationships for intrinsically disordered proteins

The design of folded proteins has advanced substantially in recent years. However, many proteins and protein regions are intrinsically disordered and lack a stable fold, that is, the sequence of an intrinsically disordered protein (IDP) encodes a vast ensemble of spatial conformations that specify its biological function. This conformational plasticity and heterogeneity makes IDP design challenging.

Here we introduce a computational framework for de novo design of IDPs through rational and efficient inversion of molecular simulations that approximate the underlying sequence–ensemble relationship. We highlight the versatility of this approach by designing IDPs with diverse properties and arbitrary sequence constraints.

These include IDPs with target ensemble dimensions, loops and linkers, highly sensitive sensors of physicochemical stimuli, and binders to target disordered substrates with distinct conformational biases.

Overall, our method provides a general framework for designing sequence–ensemble–function relationships of biological macromolecules.