The human genome was estimated to contain anywhere from 50,000 to 90,000 genes around the year 2000, and that number has been steadily revised downward. After the human genome was sequenced, it appeared that there were around 20,500 genes. Now, a research team led by scientists at the Spanish National Cancer Research Centre (CNIO) has found that there are probably even fewer genes that code for protein; they estimate that 20 percent of genes that have been classified as coding may actually be non-coding. This work, reported in Nucleic Acids Research, may have serious implications for biomedical research.
It has been challenging to determine the exact number of coding genes; the human genome is complex, and there are thousands of genes. The team started by carefully comparing proteins that are expressed in cells, called the proteome, in the GENCODE/Ensembl, RefSeq and UniProtKB reference databases. Of 22,210 coding genes listed, only 19,446 of the genes were found in all three databases.
The team focused on the 2,764 genes that were only found in one or two of the references; after looking at annotations and experimental evidence, they found that nearly all of them were predicted to be pseudogenes (which have unknown functions but seem to be non-coding), or other genes that don't encode for protein.
The team also identified 1,470 coding genes in the databases that don’t evolve like other genes, and probably aren't protein-coding. The researchers concluded that 4,234 genes in all are non-coding genes.
"We have been able to analyze many of these genes in detail," explained Michael Tress of the CNIO Bioinformatics Unit "and more than 300 genes have already been reclassified as non-coding." The results are already being included in the new annotations of the human genome by the GENCODE international consortium, of which the CNIO researchers are part.
More work remains before we know everything about the human genome. "Our evidence suggests that humans may only have 19,000 coding genes, but we still do not know which 19,000 genes are,” noted first author Federico Abascal of the Wellcome Trust Sanger Institute in the United Kingdom.
The study may cause some serious ripples in some research. "Surprisingly, some of these unusual genes have been well studied and have more than 100 scientific publications based on the assumption that the gene produces a protein,” added David Juan of the Pompeu Fabra University.
You can learn more about what we do know from the following video, by Harvard University.
Sources: AAAS/Eurekalert! via CINO, Nature Genetics, Nucleic Acid Research