Summary: Researchers developed an innovative AI tool, DeepGO-SE, that excels in predicting the functions of unknown proteins, marking a significant advance in bioinformatics. Leveraging large language models and logical entailment, this tool can deduce molecular functions even for proteins without existing database matches, offering a groundbreaking approach to understanding cellular mechanisms.
Its precision has placed DeepGO-SE among the top algorithms in an international function prediction competition, demonstrating its potential in drug discovery, metabolic pathway analysis, and beyond. The team aims to apply this tool to explore proteins in extreme environments, opening new doors for biotechnological advancements.
Key Facts:
- DeepGO-SE outperforms existing methods in predicting the functions of proteins, including those previously uncharacterized.
- The tool applies large language models and logical entailment to infer protein functions based on biological principles and amino acid sequences.
- Ranked in the top 20 of over 1,600 algorithms, DeepGO-SE shows promise for applications in drug discovery, protein engineering, and more.
Source: KAUST
A new artificial intelligence (AI) tool that draws logical inferences about the function of unknown proteins promises to help scientists unravel the inner workings of the cell.
Developed by KAUST bioinformatics researcher Maxat Kulmanov and colleagues, the tool outperforms existing analytical methods for forecasting protein functions and is even able to analyze proteins with no clear matches in existing datasets.
The model, termed DeepGO-SE, takes advantage of large language models similar to those used by generative AI tools such as Chat-GPT. It then employs logical entailment to draw meaningful conclusions about molecular functions based on general biological principles about the way proteins work.
It essentially empowers computers to logically process outcomes by constructing models of part of the world — in this case, protein function — and inferring the most plausible scenario based on common sense and reasoning about what should happen in these world models.
“This method has many applications,” says Robert Hoehndorf, head of the KAUST Bio-Ontology Research Group, who supervised this research, “especially when it is necessary to reason over data and hypotheses generated by a neural network or another machine learning model,” he adds.
Kulmanov and Hoehndorf collaborated with KAUST’s Stefan Arold, as well as researchers at the Swiss Institute of Bioinformatics, to assess the model’s ability to decipher the functions of proteins whose role in the body are unknown.
The tool successfully used data regarding the amino acid sequence of a poorly understood protein and its known interactions with other proteins and precisely predicted its molecular functions. The model was so accurate that DeepGO-SE was ranked in the top 20 of more than 1,600 algorithms in an international competition of function prediction tools.
The KAUST team is now using the tool to investigate the functions of enigmatic proteins discovered in plants that thrive in the extreme environment of the Saudi Arabian desert. They hope that the findings will be useful for identifying novel proteins for biotechnological applications and would like other researchers to embrace the tool.
As Kulmanov explains: “DeepGO-SE’s ability to analyse uncharacterized proteins can facilitate tasks such as drug discovery, metabolic pathway analysis, disease associations, protein engineering, screening for specific proteins of interest and more.”
About this artificial intelligence research news
Author: Michael Cusack
Source: KAUST
Contact: Michael Cusack – KAUST
Image: The image is credited to Neuroscience News
Original Research: Open access.
“Protein function prediction as approximate semantic entailment” by Robert Hoehndorf et al. Nature Machine Intelligence
Abstract
Protein function prediction as approximate semantic entailment
The Gene Ontology (GO) is a formal, axiomatic theory with over 100,000 axioms that describe the molecular functions, biological processes and cellular locations of proteins in three subontologies.
Predicting the functions of proteins using the GO requires both learning and reasoning capabilities in order to maintain consistency and exploit the background knowledge in the GO. Many methods have been developed to automatically predict protein functions, but effectively exploiting all the axioms in the GO for knowledge-enhanced learning has remained a challenge.
We have developed DeepGO-SE, a method that predicts GO functions from protein sequences using a pretrained large language model. DeepGO-SE generates multiple approximate models of GO, and a neural network predicts the truth values of statements about protein functions in these approximate models. We aggregate the truth values over multiple models so that DeepGO-SE approximates semantic entailment when predicting protein functions.
We show, using several benchmarks, that the approach effectively exploits background knowledge in the GO and improves protein function prediction compared to state-of-the-art methods.