Summary: A novel study study using AI has uncovered 161,979 new RNA viruses, significantly expanding our understanding of Earth’s viral diversity. These discoveries were made by analyzing genetic data using a machine learning model, which identified previously unrecognized viruses hidden in public databases.
The findings reveal a vast array of viruses in extreme environments worldwide, showing the resilience and adaptability of RNA viruses. This research paves the way for further exploration of viral and microbial diversity, potentially reshaping how scientists study Earth’s ecosystems.
Key Facts
- AI identified over 161,000 new RNA virus species from genetic data.
- Viruses were found in extreme environments, highlighting their adaptability.
- This study is the largest viral discovery to date, vastly expanding knowledge of viral diversity.
Source: University of Sydney
Artificial intelligence (AI) has been used to reveal details of a diverse and fundamental branch of life living right under our feet and in every corner of the globe.
161,979 new species of RNA virus have been discovered using a machine learning tool that researchers believe will vastly improve the mapping of life on Earth and could aid in the identification of many millions more viruses yet to be characterised.
Published in Cell and conducted by an international team of researchers, the study is the largest virus species discovery paper ever published.
“We have been offered a window into an otherwise hidden part of life on earth, revealing remarkable biodiversity,” said senior author Professor Edwards Holmes from the School of Medical Sciences in the Faculty of Medicine and Health at the University of Sydney.
“This is the largest number of new virus species discovered in a single study, massively expanding our knowledge of the viruses that live among us,” Professor Holmes said.
“To find this many new viruses in one fell swoop is mind-blowing, and it just scratches the surface, opening up a world of discovery. There are millions more to be discovered, and we can apply this same approach to identifying bacteria and parasites.”
Although RNA viruses are commonly associated with human disease, they are also found in extreme environments around the world and may even play key roles in global ecosystems. In this study they were found living in the atmosphere, hot springs and hydrothermal vents.
“That extreme environments carry so many types of viruses is just another example of their phenomenal diversity and tenacity to live in the harshest settings, potentially giving us clues on how viruses and other elemental life-forms came to be,” Professor Holmes said.
HOW THE AI TOOL WORKED
The researchers built a deep learning algorithm, LucaProt, to compute vast troves of genetic sequence data, including lengthy virus genomes of up to 47,250 nucleotides and genomically complex information to discover more than 160,000 viruses.
“The vast majority of these viruses had been sequenced already and were on public databases, but they were so divergent that no one knew what they were,” Professor Holmes said.
“They comprised what is often referred to as sequence ‘dark matter’. Our AI method was able to organise and categorise all this disparate information, shedding light on the meaning of this dark matter for the first time.
The AI tool was trained to compute the dark matter and identify viruses based on sequences and the secondary structures of the protein that all RNA viruses use for replication.
It was able to significantly fast track virus discovery, which, if using traditional methods, would be time intensive.
Co-author from Sun Yat-sen University, the study’s institutional lead, Professor Mang Shi said: “We used to rely on tedious bioinformatics pipelines for virus discovery, which limited the diversity we could explore.
“Now, we have a much more effective AI-based model that offers exceptional sensitivity and specificity, and at the same time allows us to delve much deeper into viral diversity. We plan to apply this model across various applications.”
Co-author Dr Zhao-Rong Li, who researches in the Apsara Lab of Alibaba Cloud Intelligence, said: “LucaProt represents a significant integration of cutting-edge AI technology and virology, demonstrating that AI can effectively accomplish tasks in biological exploration.
“This integration provides valuable insights and encouragement for further decoding of biological sequences and the deconstruction of biological systems from a new perspective. We will also continue our research in the field of AI for virology.”
Professor Holmes said: “The obvious next step is to train our method to find even more of this amazing diversity, and who knows what extra surprises are in store.”
Funding: The researchers declare no competing interests. The research was supported by the National Natural Science Foundation of China, the Shenzhen Science and Technology Program, the Natural Science Foundation of Guangdong Province, the Guangdong Province “Pearl River Talent Plan” Innovation and Entrepreneurship Team Project, the Hong Kong Innovation and Technology Fund (ITF) and the Health and Medical Research Fund. Professor Holmes is funded by a National Health and Medical Research Council of Australia Investigator grant and by AIR@InnoHK administered by the Innovation and Technology Commission, Hong Kong Special Administrative Region, China.
About this artificial intelligence and genetics research news
Author: Luisa Low
Source: University of Sydney
Contact: Luisa Low – University of Sydney
Image: The image is credited to Neuroscience News
Original Research: Open access.
“Using artificial intelligence to document the hidden virosphere” by Edwards Holmes et al. Cell
Abstract
Using artificial intelligence to document the hidden virosphere
Current metagenomic tools can fail to identify highly divergent RNA viruses. We developed a deep learning algorithm, termed LucaProt, to discover highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 metatranscriptomes generated from diverse global ecosystems.
LucaProt integrates both sequence and predicted structural information, enabling the accurate detection of RdRP sequences.
Using this approach, we identified 161,979 potential RNA virus species and 180 RNA virus supergroups, including many previously poorly studied groups, as well as RNA virus genomes of exceptional length (up to 47,250 nucleotides) and genomic complexity. A subset of these novel RNA viruses was confirmed by RT-PCR and RNA/DNA sequencing.
Newly discovered RNA viruses were present in diverse environments, including air, hot springs, and hydrothermal vents, with virus diversity and abundance varying substantially among ecosystems.
This study advances virus discovery, highlights the scale of the virosphere, and provides computational tools to better document the global RNA virome.