Deep learning for protein structure discovery

July 4, 2023
Duration: 00:51:48
Number of views 2
Number of favorites 0

Lecture given by Thibaut Very, IDRIS


Proteins are associated with almost all biological processes. For instance, they can carry oxygen, make muscles move, perform chemical reactions, or serve as keys to enter cells for viruses. Chemically, up to thousands of small molecules (aminoacids) from a set of 22 react together to form long chains. An analogy with Natural Language Processing can be drawn if we view aminoacids as a sequence of letters forming a sentence. As with a sentence, we need to find how the parts of the sequence interact to get the meaning. For proteins, we have physical interactions driving the folding of the aminoacids into a 3D structure.

Only by finding the structure can we decipher the exact role of a protein. Researchers have been working on this since the 1920s. Several experimental setups are available to get this information. However, keep in mind that preparing the proteins is difficult due to requirements of the experimental methods. With the increased power of computers, it became possible to use them to find the structures. The toolbox of numerical methods includes machine learning.

Every other year, a competition assesses the quality of numerical methods on unknown structures. For the 2018 edition, DeepMind (Alphabet) proposed a Deep Learning model based on Transformers: Alphafold. Alphafold beat the other models thanks to an increase in the quality of the predictions. The real breakthrough came in 2020 when Alphafold2 version entered the competition. The results were impressive because the quality of many predictions was comparable to experimental ones.

This presentation introduces the concepts needed to understand how to discover protein structures. We will focus on Alphafold2 model to understand how it gets such good results.


Thibaut Véry received a PhD of Université de Lorraine in theoretical chemistry in 2012 on modelisation of the photochemistry of complex biological systems. Since 2016 he is member of the user support team of the french supercomputing center IDRIS where he is in charge of the atomistic simulation both for High Performance computing and Artificial Intelligence topics. His job is to help users to get the best performance from the supercomputer thanks to several actions such as training courses, software management, documentation, etc.

Tags: deep nets ia information retrieval informatique jsalt linear algebra nlp workshop