A recent study published in Nature explores how artificial intelligence (AI) is being used to better understand the how human genetic disruptions lead to a variety of diseases. For the study, a new machine learning model known as Geneformer was developed to help scientists better understand how genes interact with each other.
“Geneformer has vast applications across many areas of biology, including discovering possible drug targets for disease,” said Dr. Christina Theodoris, MD, who is an assistant professor in the Department of Pediatrics at UC San Francisco, and lead author of the study. “This approach will greatly advance our ability to design network-correcting therapies in diseases where progress has been obstructed by limited data.”
Study lead author, Dr. Christina Theodoris, MD. (Credit: Michael Short/Gladstone Institutes)
Trying to map out and understand links between gene activity can be a tricky process, as active genes can switch other genes on and off, but the first gene is often shut off by the now-active genes, so the end result of trying to map these genes ends up looking like a tangled mess. This process becomes even more complicated when trying to map out the entire human genome of 20,000 genes. However, better understanding genetic networks at these levels would give scientists better insights into how this gene activity causes diseases.
“If a drug targets a gene that is peripheral within the network, it might have a small impact on how a cell functions or only manage the symptoms of a disease,” said Dr. Theodoris. “But by restoring the normal levels of genes that play a central role in the network, you can treat the underlying disease process and have a much larger impact.”
Traditionally, machine learning algorithms can be trained on datasets pertaining to one disease but must be retrained on new datasets for each disease the researchers want to study. This is where Geneformer comes in, which uses a machine learning method known as “transfer learning”, meaning it can used to conduct analyses on multiple diseases.
To test this new “transfer learning” process, the researchers first pretrained Geneformer using data on human tissues with approximately 30 million cells of gene activity. They then made small adjustments within Geneformer so it could make predictions on gene activity and how this might cause diseases to form. in the end, the researchers found that Geneformer’s predictions were far more accurate that traditional methods due to the pretraining it received in the beginning of the trial.
“In the course of learning what a normal gene network looks like and what a diseased gene network look like, Geneformer was able to figure out what features can be targeted to switch between the healthy and diseased states,” said Dr. Theodoris. “The transfer learning approach allowed us to overcome the challenge of limited patient data to efficiently identify possible proteins to target with drugs in diseased cells.”
Going forward, the team is looking to expand Geneformer’s abilities pertaining to gene network analysis, along with already making the Geneformer available as open-source so other researchers can access it for their own studies.
What new discoveries will scientists make about genetic activity and how AI can be sued in these endeavors? Only time will tell, and this is why we science!
Sources: Nature, Gladstone Institutes, SAS Institute, Built In, Hugging Face
As always, keep doing science & keep looking up!