Partnerships ~ Reproduced under the consent of AI Juniors
By Sunny Saito
"The protein challenge", which first emerged in the 1960s, is a 50-year-old unsolved problem about how proteins' amino acid sequence determines their three-dimensional atomic structure. This problem, first proposed in 1972 by Christian Anfinsen – an American biochemist – in his Nobel Prize speech, arises from the idea that a protein’s structure can be fully described by its amino acid sequence. This problem has long puzzled scientists. Early attempts to use computers to predict protein structures in the 1980s and 1990s failed poorly and methods in published papers often disintegrated when applied to different proteins. Later advancements did lead to the acquisition of the three-dimensional structure of a protein through X-ray crystallography. However, this came with its own downsides, including inefficiency and high costs; the process would take a whole year and cost $120,000 just for a single protein.
When all seemed lost, DeepMind, a London-based startup, shed light on this 50-year old protein challenge in November of 2020. They created an artificial intelligence system called AlphaFold, which turned out to have great potential in solving the long-unsolved Protein Problem.
AlphaFold is a deep learning system based on the idea that folded proteins can be represented as spatial graphs and its residues as nodes. This idea of transforming protein structures into graphs leads to two crucial steps in solving the protein problem: predictive modeling and optimization.
The first step, predictive modeling, uses deep learning, specifically deep convolutional neural networks (CNN), to input protein sequences and converting them into matrices. These protein sequences consist of over 170,000 protein structures from the Protein Data Base and other protein sequences of unknown structure. As they are entered into the model, AlphaFold is trained with an immense quantity of data and further refined with multiple sequence alignment (MSA) and evolutionarily related sequences. After repetitions of inputting, training, and refining, two significant matrices are derived: the distance distribution matrix and torsion angle matrix. This then leads to the second step: optimization. Using an iterative gradient descent method, the process of optimization translates the two matrices into a 3D structure. To do so, the algorithm first begins with a smooth 3D structure model, then it continually updates the model until the distogram (matrix of distances between different parts of a protein) of the predicted structure almost equals the output. Through these crucial two steps, AlphaFold obtained the ability to develop reliable predictions of proteins' physical structures within just a few days. Not only this, but the AlphaFold algorithm is also able to analyze, how reliable its predictions are for each part of the protein structure with great accuracy.
AlphaFold's protein prediction system does not merely provide a solution to the Protein Challenge but also has the potential for numerous significant real-world advancements and impacts. For instance, the current COVID-19 Pandemic. Earlier this year, Deepmind, using AlphaFold, predicted several protein structures of the COVID-19 virus. These predictions were examined by experimentalists, to which they commented that the predictions had achieved a "high degree of accuracy". This sheds light on further understanding the structure of the virus and advancing future pandemic response efforts.
This groundbreaking development by Deepmind effectively demonstrates the impact AI can have on scientific discovery and how it can accelerate progress in some of the most fundamental fields that shape our world. As professor Venki Ramakrishnan, a Nobel laureate and president of the royal society, stated: "This computational work represents a stunning advance on the protein-folding problem, a 50-year-old grand challenge in biology. It has occurred decades before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research."