DEC 31, 2019

Using Machine Learning to Analyze Gene Regulation

WRITTEN BY: Carmen Leitch

Computational tools are becoming increasingly important in biological research. Massive amounts of data has been generated with powerful microscopes, and by applying techniques like high-throughput robotics or advanced genetic sequencing technologies. Machine learning has been used to try to sift through this enormous amount of information, but the algorithms are often created by computer scientists that don’t know a lot about how biologists ask questions. New work reported in bioRxiv aims to bridge that gap.

Learn more about computational biology from the video.

Quantitative biologists have now developed an approach for designing machine learning algorithms that make sense to biologists. These algorithms take advantage of artificial neural networks (ANNs). Inspired by human neuroanatomy, a neural network consists of interconnected nodes, which are like artificial neurons, and weights act to control the strength of connections between the nodes. These nodes and weights were given a physiochemical definition in this new process.

Cold Spring Harbor Lab researchers Justin B. Kinney, an Assistant Professor, and postdoctoral fellow Ammar Tareen used this strategy to learn more from massively parallel reporter assays (MPRAs) about how genes are regulated. MPRAs involve assessing how the expression of thousands or even hundreds of thousands of genes are controlled.

While our cells all carry a copy of the genome, which contains all our genes, not every gene is expressed by every cell all the time. To carry out its functions, cells have to carefully control which genes are expressed when; cells have a variety of ways to turn genes on and off. If that regulation (which is discussed in the video below) goes awry, serious problems can result that often lead to disease. Reporter assays can show which genes are being expressed; when a gene is 'on' and is transcribed, a detectable reporter is transcribed with it, and these reporters can be quantitatively measured.

"That mechanistic knowledge—understanding how something like gene regulation works—is very often the difference between being able to develop molecular therapies against diseases, and not being able to," Kinney said.

Kinney and Tareen have created ANNs that can reflect common biological concepts about genes and how they are controlled. The machine learning tools have been forced to analyze data from reporter assays in a way that a research biologist would understand.

Kinney believes that industrial artificial intelligence technologies will be useful in the life sciences. He is already using their new strategy to study human biology. For example, his team is already looking at the gene circuits that are involved in human disease.

Sources: Phys.org via Cold Spring Harbor Laboratory, bioRxiv