There is so much we don't know about microbes, even while these organisms are the most numerous on the planet. A recent study called our understanding of microbial life 'profoundly ignorant,' and highlighted the fact that we can't even say whether microbial biodiversity is increasing, decreasing, or staying the same. There are so many microbes, researchers have struggled to find ways for people to visualize these vast numbers. For example, it's been said that, "there are 100 million times as many bacteria in the oceans (13 × 1028) as there are stars in the known universe." There are so many microbes in dental plaque that only a single gram of the stuff contains around as many humans that ave every existed (about 1 × 1011). (It's been estimated that only around 1,500 microbial pathogens can infect humans, which shows how many microbes are harmless or potentially beneficial to us).
Scientists have recently begun to try to dig into this vast unknown world, sometimes called microbial dark matter. In a new study reported in nature, metagenomic tools were used to reveal more about mystery bacteria. The researchers used powerful computational tools to assess 1.2 billion microbial proteins, some of which were bacterial, and some viral. The findings have been reported in Nature.
This effort took advantage of a massive dataset (like the one decribed in the video below), including information from over 26,000 microbiomes that were used to create the Novel Metagenome Protein Families (NMPF) Catalog. New functions for these protein families can now be predicted, or fresh datasets can compared to them, suggested senior study author Nikos Kyrpides, senior author of the study and head of the Microbiome Data Science group at the Joint Genome Institute (JGI).
In this work, the investigators began with 8 billion metagenome genes, then removed any genes that had some similarity to genes that have been described. About 1.2 billion unknown genes remained. These were clustered into groups, or families. Then the researchers focused on protein families that had 100 or more individuals. The study concluded that there is more than twice as much diversity among these protein families than there is in microbial reference genes. There may be even greater diversity too, since not all microbial proteins have been added yet.
"We've more than doubled the number of protein families known up until now, and identified many novel structure predictions," said lead study author Georgios Pavlopoulos, now a research director at the Biomedical Sciences Research Center Alexander Fleming.
Microbial dark matter is very challenging to study for a variety of reasons, one of which is that the growth conditions for these microbes are often unknown, and cultivating them in the lab for study can be difficult. Communities of microbes, or microbiomes, can also be nearly impossible to accurately replicate in the lab, because they arise from a natural mix of microbes that often behave in a unique way. Analyzing their genomes of the microbes could provide many insights, however.
However, genetic research often relies on reference sequences. Without anything to compare a sequence to, it's simply another string of nucleotide bases.
There are some microbial proteins and genes that are similar to ones that are already known and in some cases, well understood. But in other cases, microbial genes or proteins are totally unlike anything that has every been described. This type of information is often useless. Computational tools like artificial intelligence could help change that, however. Scientists are also delving into these mystery genes and proteins to learn more about them.
"In this endeavor, we have not only ventured into the uncharted territory of understanding the vast landscape of functional diversity, but we have also pushed the boundaries by applying AI methodologies to unravel their roles," Pavlopoulos said. "Consequently, we have amassed an extensive repository of groundbreaking insights, significantly expanding the horizons of potential functions across various categories of proteins, including those with pivotal applications in biotechnology, such as DNA editing enzymes."
We still have a lot to learn about these proteins, however. This study did not perform a close investigation of the function these proteins. However, a structural analysis was done, which suggested that some of these proteins have a form unlike anything that has been seen before, while others did have similarities to some known protein structures.
"There is still 70 to 80 percent of known microbial diversity out there that is not yet captured genomically," Kyrpides said. "So, that diversity is definitely holding a lot of new secrets in terms of functional diversity as well."
Sources: Lawrence Berkeley National Laboratory, Nature