In the genome, genes that code for protein are often divided into sections called exons, which are separated by spacers called introns. When a region of DNA is transcribed into messenger RNA (mRNA) or other types of RNA molecules, the introns are spliced out of the transcripts. A complex system of cellular regulation determines whether a sequence is an exon or an intron, and some intronic splice sites can be chosen over others. There are around 20,000 protein-coding genes in the human genome, and within those genes are about 180,000 exons. Some exons are also used to make RNA molecules that do not get translated into proteins. The exome is the portion of the genome composed of exons. Researchers have now found about one million novel exons in the human genome. The findings have been reported in Genome Research.
Protein-coding genes only account for about one percent of the human genome, and very little is known about the function of the remainder of the DNA, which was once called junk DNA and is sometimes referred to as the dark genome.
"We've started to chip away at the dark genome by finding nearly one million previously unknown exons through a method called exon trapping," said senior study author Timothy Hughes, a professor and chair of the department of molecular genetics at the University of Toronto.
Plasmids are small bits of DNA and are often used in the laboratory. In exon trapping, plasmids can be used to search for exons in regions of DNA. "While exon trapping is not widely used anymore, it proved to be effective when used in combination with high-throughput sequencing to scan the entire human genome."
Autonomous exons do not need additional support to be made into mature RNA molecules.
This research team has found that while introns are meant to be spliced out of mature RNA transcripts in certain places, the system isn't perfect, and there are mRNA transcripts that carry sequences with non-functional pieces. The video below outlines some of the reasons why genes may carry introns, even when they are removed from transcripts.
The roles of these exons is still unclear. "They seem to appear in the human genome mainly due to random mutation and are unlikely to play a significant role in our biology. This is evidence that evolution in humans involves a lot of trial and error, most likely enabled by the vast size of our genome," Hughes suggested.
Exons in the human genome that don't sit within a known gene, but carry mutations, may potentially be harmful and should be documented, noted the researchers. For example, long non-coding RNA (lncRNA) molecules often have regulatory roles that can affect the expression of other genes, although the function of many lncRNAs is still unknown. Mutations in lncRNAs have also been associated with certain types of cancer. The majority of the exons found in this study are thought to be lncRNAs.
When portions of introns are included in transcribed RNA sequences, they are referred to as pseudoexons. Research has indicated that these pseudoexons can strengthen weak splice sites, where introns are meant to to be spliced out of transcripts. In these cases, the pseudoexon ends up in the mature RNA transcript, and could cause problems. Other research has linked pseuodexons to human disease.
Sources: University of Toronto, Genome Research