There are about 20,000 genes that encode for protein in the human genome, and we don't know much about many of them. Since the majority of the human genome was sequenced many years ago (and the small gaps in that sequence have since been filled in), scientists have also confirmed that these protein-coding genes are actually expressed in the human body. A lot of these proteins have been maintained in the genome for a very long time - they are said to be conserved, so they are probably important; and they are not similar to others, so we don't have any clues to their function. However, research has also revealed that many of these genes are related to human disease.
Researchers are now calling attention to these neglected genes. In new work published in PLoS Biology, scientists report the creation of an “Unknome database.” In this effort, proteins from various organisms are assigned a score that measures their “knownness." The study authors used this database to show that thousands of proteins have a knownness score that is close to zero, and we are not making much progress when it comes to the unknome - the number of proteins in this database is only being reduced slowly. This new effort can help address the problem, however.
This work also indicated that a group of proteins in this database have important roles in cell function. The researchers selected 260 human genes that have counterparts in fruit flies, with knownness scores that were less than one in both species - so there was virtually no information about them. Many of these genes caused death in the fruit fly if they were knocked out, showing how essential they are. The genes were found to be related to processes such as growth, development, the cell's stress response, protein quality control, and fertility.
One major challenge when it comes to the study of unknown proteins is a lack of reagents. When scientists want to study proteins that are known, it is often very easy to obtain antibodies speciic to that protein. which can be used in assays like western blots or immunohistochemistry. These can show whether a protein is present in a tissue and how much might be there, or where proteins are expressed in a tissue sample, respectively. But if an investigator wants to do these studies on an unknown protein, they have to develop and validate their own antibodies, which can be a costly and time-consuming process.
However, even considering those issues, we can no longer afford to ignore the proteins we don't know about. Even though scientists have been diligently studying proteins for decades, there are thousands in fruit flies and humans that we don't even know basic things about. They could open up a new understanding of different diseases and how to treat them, the researchers suggested.
"These uncharacterized genes have not deserved their neglect," said Sean Munro of MRC Laboratory of Molecular Biology in Cambridge, England. "Our database provides a powerful, versatile and efficient platform to identify and select important genes of unknown function for analysis, thereby accelerating the closure of the gap in biological knowledge that the unknome represents."
Sources: Public Library of Science, PLoS Biology