Google DeepMind’s AlphaFold shows why science may be the killer application for artificial intelligence

GettyImages-2176921835-e1764328068614 Google DeepMind's AlphaFold shows why science may be the killer application for artificial intelligence

While many companies continue to search for the killer application of AI, biochemists have already found it. This application is protein folding. It falls this week Fifth anniversary The debut of Alpha Fold 2, an artificial intelligence system created by Google Deep Mind It can predict the structure of a protein from its DNA sequence with a high degree of accuracy.

In those five years, Alpha Fold 2 The AI models they succeeded have become essential and ubiquitous tools in biochemical research, just like microscopes, Petri dishes, and pipettes. Artificial intelligence models are beginning to change the way scientists search for new drugs, promising faster and more successful drug development. They have begun helping scientists find solutions to everything from ocean pollution to creating crops that are more resilient to climate change.

“The impact has already exceeded all of our expectations,” said John Jumper, Google DeepMind’s chief scientist who leads the company’s protein structure prediction team. luck. In 2024, Demis Hassabis, co-founder and CEO of Jumper and Google DeepMind, will share the Nobel Prize in Chemistry for their work creating AlphaFold 2.

How to use AlphaFold to make protein structure predictions is now taught as a standard tool to many graduate-level biology students around the world. “It’s just part of the training to become a molecular biologist,” Jumper said.

luck He chronicled Google DeepMind’s quest to solve what is known as the “protein folding problem.” A special story for 2020. Proteins have complex physical shapes, and before AlphaFold, describing those shapes required expensive and time-consuming laboratory experiments.

The company eventually solved the problem with Transformer, the same type of AI that powers popular chatbots like ChatGPT. But instead of training the converter on text to output the next most likely word, the AI model was trained on a database of known protein DNA sequences and protein structures, as well as information about DNA sequences that appear to co-evolve, as this provides clues to protein structure. He is then asked to predict the structure of the protein.

“Sometimes, I have to reassure myself that it worked,” said Pushmeet Kohli, vice president of research at Google DeepMind, who leads its efforts to apply AI to science. “There could be many, many ways we failed.”

Kohli also said that AlphaFold has proven that AI can not only make technology companies a lot of money, but can contribute to science and, ultimately, improve humanity. “AlphaFold really underscored the basic principle and vision that if we’re developing this technology, this AI, what is the most important thing that humanity can use this thing for? And I think science is the ideal use case for AI. I wouldn’t say it’s the only use case, but it’s certainly the most compelling use case.”

From 180,000 protein structures to 240 million

Proteins are long chains of amino acids that act as engines of life, controlling most biological processes. How a protein works depends, in turn, on its shape. When cells produce proteins, amino acids spontaneously fold into tangled, twisted structures, with pockets, protrusions, and sometimes long, dangling tails.

The laws of chemistry and physics determine this folding. That is why Nobel Prize-winning chemist Christian Anvinsen hypothesized in 1972 that DNA alone should fully determine the final structure that a protein takes. It was a great guess. At that time, not a single genome had yet been sequenced. But Anfinsen’s theory launched an entire subfield of computational biology with the goal of using complex mathematics, rather than experimental experiments, to model proteins. The problem is that there are more possible protein structures than there are atoms in the universe, so modeling them, even using high-powered computers, is extremely difficult.

Before AlphaFold 2, the only way for a scientist to know a protein’s structure with any confidence was through one of a few expensive and lengthy experimental processes. As a result, scientists were only able to determine the structure of about 180,000 proteins before AlphaFold 2. Other computational methods for predicting protein structure were only accurate about 50% of the time, which didn’t help biochemists much, especially since they had no way of knowing in advance when a prediction might be trustworthy.

Thanks to AlphaFold 2, there are now more than 240 million proteins whose structures can be predicted. These proteins include every protein produced by the human body, as well as proteins associated with major human diseases, such as Covid, malaria and Chagas disease.

Google DeepMind has made AlphaFold 2 freely available for researchers to download and run on their computers. But, to make her predictions more accessible, she also created an online server where researchers can upload the DNA sequence of a protein and get a structure prediction. Google DeepMind has also generated structural predictions for almost every known protein and deposited them in a database managed by the European Bioinformatics Institute of the European Molecular Biology Laboratory, which is located outside Cambridge, England.

To date, more than 3.3 million people have used the AlphaFold 2 to date. AlphaFold’s original work has been directly cited in more than 40,000 academic papers, 30% of which focused on studying various diseases. One study found that the AI model directly or indirectly contributed to about 200,000 research publications. The tool has also been mentioned in more than 400 successful patent applications, according to data from Google DeepMind.

says the bird luck He was very pleased with how scientists were able to use AlphaFold to find keys to vital processes “where they didn’t even know what to look for.” For example, scientists recently used AlphaFold to help discover a previously unknown protein compound that is necessary to allow sperm to fertilize an egg.

Andrea Pauli, a biochemist at the Research Institute for Molecular Pathology in Vienna, Austria, who discovered this protein on the surface of sperm, told Scientific Journal. nature Her team uses AlphaFold 2 “for every project” because it “accelerates discovery.”

Uncover life’s secrets, from heart disease to honeybees

Among the discoveries AlphaFold has played a role in is determining the structure of a key protein at the heart of low-density lipoprotein, or LDL, more commonly known as “bad cholesterol” and a major contributor to heart disease. This protein, called apoB100, had not previously been mappable due to its large size and complex interactions with other proteins. But two scientists at the University of Missouri combined an imaging method — cryogenic electron microscopy — with AlphaFold’s predictions to find the structure of apoB100. This, in turn, may help scientists find better treatments for high cholesterol.

Other scientists have used AlphaFold to discover the structure of vitellogenin, a protein that plays a key role in the honeybee’s immune system. The hope is that knowing the structure of the protein may help scientists better understand the collapse of honey bee populations globally and perhaps come up with genetic modifications that could produce more disease-resistant bee species.

The overall accuracy of AlphaFold predictions varies depending on the protein type. But AlphaFold also provides a confidence score that gives scientists some indication of whether they should trust the AI’s predictions about the structure of that particular part of the protein. For human proteins, about 36% of predictions are highly confident, while for E.coli, AlphaFold has a high confidence score for the structure in about 73% of cases.

Some proteins contain regions that are called “inherently disordered” because their shape varies greatly depending on the materials and other proteins surrounding them. Neither experimental imaging methods nor AI-based models provide good information about what these turbulent regions will look like. (Alpha Fold 3(A more powerful AI model debuting at Google DeepMind in 2024 can sometimes — but not always — predict how these disordered regions will bind to a protein or other molecule.)

The impact of AlphaFold on drug discovery has not yet been proven

AlphaFold is likely to eventually have a significant impact on drug discovery, although it is difficult so far to assess how much of a difference the AI model has made. In one case, scientists used AlphaFold to find two existing FDA-approved drugs that could be repurposed to treat Chagas disease, a tropical parasitic disease that infects up to 7 million people annually and results in more than 10,000 deaths annually.

To some extent, AlphaFold 2’s subsequent AI models are more likely to play a direct role in drug discovery than the original structure prediction tool, Jumper said. For example, AlphaFold 3 predicts not only protein structures, but several important aspects of how proteins bind to each other and to small molecules. This is essential because most drugs are either small molecules that bind to a target site on a protein to change its function, or in some cases are proteins themselves. Meanwhile, AlphaFold Multimer, an extension of AlphaFold 2, predicts protein-protein interactions that can also aid in drug design.

Google DeepMind has spun off a sister company called symmetric Which uses AlphaFold 3 and other tools to design drugs. It has partnerships with Novartis and Eli LillyAlthough it has not yet announced which drug candidates it is working on. AlphaFold 3 is available to academic researchers for free, but commercial entities outside of Isomorphic and Google are not permitted to use the software.

Google DeepMind has also created an AI model called AlphaProteo that can design new proteins with specific binding properties. The AI lab created a system called AlphaMissense that can predict how harmful single-point gene mutations are, which could help scientists understand the root cause of many diseases and potentially find treatments, including potential gene therapies.

Jumper said he’s personally interested in exploring whether large language models, like Google’s Gemini AI, can play a role in science. Some AI startups have begun experimenting with LLMs that allow scientists to determine the function of a protein and then LLM issues the DNA recipe for that protein. (It still needs to be experimentally tested to see if it actually works.) But Jumper said he’s somewhat skeptical about how successful these types of LLMs will be at designing very new proteins. Jumper said he also knew that some people had created chatbot front ends for AlphaFold, but said that “wasn’t very interesting.”

Instead, he said what excites him is the idea of using the power of LLMs to develop new hypotheses and design new experiments to test them. DeepMind has created a prototype “AI scientist” based on Gemini that can do some of this. But Jumper said he believes the concept has much greater potential. “The really exciting data set and the really large data set is the entirety of the scientific literature,” he said.

Share this content: