The race for one of biology's greatest challenges – the prediction of 3D structures Proteins Based on Their Amino Acid Sequences – Increases Thanks to New Approaches to Artificial Intelligence (AI).
Late last year, Google's AI firm DeepMind introduced an algorithm called AlphaFold that combined two techniques developed in this field and competed with established competitors And in April of this year, a US researcher unveiled an algorithm that uses a completely different approach, claiming that his AI predicts structures up to a million times faster than DeepMind's, though probably not in so accurate in all situations.
In a broader sense, biologists wonder how else deep learning – the AI technique used by both approaches – could be applied to the prediction of protein arrangements that ultimately determine the function of a protein. These approaches are cheaper and faster than existing laboratory techniques such as X-ray crystallography, and the knowledge could help researchers better understand diseases and develop drugs. "There's a lot of excitement about where things could go now," says John Moult, a University of Maryland biologist at College Park and founder of the biennial Critical Assessment of Protein Structure Prediction (CASP) competition The inventor of the algorithm, Mohammed AlQuraishi, a biologist at Harvard Medical School in Boston, Massachusetts, has not directly compared the accuracy of his method with that of AlphaFold – and he suggests that AlphaFold would outperform its technique in terms of protein accuracy With sequences similar to those for analysis are available as a reference. But he says that because his algorithm uses a mathematical function to compute protein structures in a single step – and not in two steps, such as AlphaFold, which uses the similar structures as the basis for the first step – he can construct structures in milliseconds and not in Hours or seconds predict days.
"AlQuraishi's approach is very promising. It builds on advances in deep learning and some new tricks that AlQuraishi invented, "says Ian Holmes, a Computational Biologist at the University of California, Berkeley. "It's possible that in the future, his idea can be combined with others to move the field forward," says Jinbo Xu, a computer scientist at the Toyota Technological Institute in Chicago, Illinois, who attended CASP13.
The core of AlQuraishi's system is a neural network, a kind of algorithm that is inspired by brain wiring and learns from examples. It is fed with known data about the assignment of amino acid sequences to protein structures and then learns to create new structures from unknown sequences. The new part of his network lies in his ability to create such assignments from end to end. Other systems use a neural network to predict certain features of one structure and another algorithm to laboriously search for a plausible structure that contains these features. The AlQuraishi network takes months to train, but once it's trained, it can turn a sequence into a structure almost immediately.
His approach, which he refers to as a recurring geometric network, predicts the structure of a segment of a protein, partly on the basis of what comes before and after it. This is much like the interpretation of a word in a sentence can be influenced by surrounding words. These interpretations are in turn influenced by the noun.
Technical difficulties caused the AlQuraishi algorithm to perform poorly on CASP13. He published details of the AI in Cell Systems in April 1 and made his code publicly available on GitHub, in the hope that others will build on the work. (The structures of most of the proteins tested in CASP13 have not yet been published, so he still could not compare his method directly to AlphaFold.)
AlphaFold competed successfully with CASP13 and was created to attract attention, when it outperformed all other algorithms on hard targets by nearly 15%.
AlphaFold works in two steps. As with other approaches used in the competition, it begins with so-called multi-sequence alignments. It compares the sequence of a protein with similar sequences in a database to identify pairs of amino acids that are not side by side in a chain, but tend to coexist. This suggests that these two amino acids are close together in the folded protein. DeepMind trained a neural network to make such pairings and to predict the distance between two paired amino acids in the folded protein.
By comparing his predictions with precisely measured distances in proteins, DeepMind learned to better guess how proteins would fold. A parallel neural network predicted the angles of the joints between successive amino acids in the folded protein chain.
However, these steps can not predict a structure on its own since the exact set of predicted distances and angles may not be physically possible. In a second step, AlphaFold created a physically possible – but almost random – folding order for a sequence. Instead of another neural network, an optimization method called gradient descent was used to iteratively refine the structure to approximate the (not quite possible) predictions from the first step.
Some other teams used one of the approaches, but none used either. In the first step, most teams predicted contact only in pairs of amino acids, not the distance. In the second step, mostly complex optimization rules are used instead of gradients, which is almost automatic.
"They did a great job. They are about a year ahead of the other groups, "says Xu.
DeepMind still needs to publish all the details about AlphaFold – but other groups have begun adopting tactics that have been demonstrated by DeepMind and other leading teams at CASP13. Jianlin Cheng, a computer scientist at the University of Missouri in Colombia, says that he will be modifying his deep neural networks to get some functionality from AlphaFold, for example by adding layers to the neural network in the distance prediction phase. More layers – a deeper network – often allow networks to process information more deeply, hence the name Deep Learning.
"We look forward to using similar systems," said Andrew Senior, senior computer scientist at DeepMind The AlphaFold team.
Moult said there was much discussion at CASP13 about how deep learning could otherwise be applied to protein folding. Maybe it might help to refine approximate structural predictions; report how secure the algorithm is in convolutional prediction; or model interactions between proteins.
Although computer predictions are not yet accurate enough to be used in drug development, increasing accuracy enables other applications, such as a protein that turns into a vaccine for immunotherapy. "These models are starting to come in handy," says Moult.