Summary: Researchers used artificial intelligence to predict how enzymes interact with various substrates. The team developed an AI model that can accurately predict whether an enzyme can work with a particular molecule.
Their enzyme substrate prediction (ESP) model provides a valuable tool for drug research and biotechnology, with applications ranging from the creation of new drugs to the production of biofuels.
The AI-based method developed can predict with 91% accuracy whether an enzyme can work with a specific molecule.
The ESP model can work with any combination of an enzyme and over 1,000 different substrates.
The developed method will assist in drug research, biotechnology, and metabolic simulation of cells, aiding the understanding of physiology across various organisms.
Source: Heinrich-Heine University Duesseldorf
Enzymes are molecule factories in biological cells. However, which basic molecular building blocks they use to assemble target molecules is often unknown and difficult to measure.
An international team including bioinformaticians from Heinrich Heine University Düsseldorf (HHU) has now taken an important step forward in this regard: Their AI method predicts with a high degree of accuracy whether an enzyme can work with a specific substrate.
They now present their results in the scientific journal Nature Communications.
Enzymes are important biocatalysts in all living cells: They facilitate chemical reactions, through which all molecules important for the organism are produced from basic substances (substrates). Most organisms possess thousands of different enzymes, with each one responsible for a very specific reaction. The collective function of all enzymes makes up the metabolism and thus provides the conditions for the life and survival of the organism.
Even though genes that encode enzymes can easily be identified as such, the exact function of the resultant enzyme is unknown in the vast majority – over 99% – of cases. This is because experimental characterizations of their function – i.e. which starting molecules a specific enzyme converts into which concrete end molecules – is extremely time-consuming.
Together with colleagues from Sweden and India, the research team headed by Professor Dr Martin Lercher from the Computational Cell Biology research group at HHU has developed an AI-based method for predicting whether an enzyme can use a specific molecule as a substrate for the reaction it catalyzes.
Professor Lercher: “The special feature of our ESP (“Enzyme Substrate Prediction”) model is that we are not limited to individual, special enzymes and others closely related to them, as was the case with previous models. Our general model can work with any combination of an enzyme and more than 1,000 different substrates.”
PhD student Alexander Kroll, lead author of the study, has developed a so-called Deep Learning model in which information about enzymes and substrates was encoded in mathematical structures known as numerical vectors.
The vectors of around 18,000 experimentally validated enzyme-substrate pairs – where the enzyme and substrate are known to work together – were used as input to train the Deep Learning model.
Alexander Kroll: “After training the model in this way, we then applied it to an independent test dataset where we already knew the correct answers. In 91% of cases, the model correctly predicted which substrates match which enzymes.”
This method offers a wide range of potential applications. In both drug research and biotechnology it is of great importance to know which substances can be converted by enzymes.
Professor Lercher: “This will enable research and industry to narrow a large number of possible pairs down to the most promising, which they can then use for the enzymatic production of new drugs, chemicals or even biofuels.”
Kroll adds: “It will also enable the creation of improved models to simulate the metabolism of cells. In addition, it will help us understand the physiology of various organisms – from bacteria to people.”
Alongside Kroll and Lercher, Professor Dr Martin Engqvist from the Chalmers University of Technology in Gothenburg, Sweden, and Sahasra Ranjan from the Indian Institute of Technology in Mumbai were also involved in the study. Engqvist helped design the study, while Ranjan implemented the model which encodes the enzyme information fed into the overall model developed by Kroll.
The substrate scopes of enzymes: a general prediction model based on machine and deep learning
For most proteins annotated as enzymes, it is unknown which primary and/or secondary reactions they catalyze. Experimental characterizations of potential substrates are time-consuming and costly. Machine learning predictions could provide an efficient alternative, but are hampered by a lack of information regarding enzyme non-substrates, as available training data comprises mainly positive examples.
Here, we present ESP, a general machine-learning model for the prediction of enzyme-substrate pairs with an accuracy of over 91% on independent and diverse test data.
ESP can be applied successfully across widely different enzymes and a broad range of metabolites included in the training data, outperforming models designed for individual, well-studied enzyme families.
ESP represents enzymes through a modified transformer model, and is trained on data augmented with randomly sampled small molecules assigned as non-substrates.
By facilitating easy in silico testing of potential substrates, the ESP web server may support both basic and applied science.