And Group of researchers has developed in Spainspecifically at the Bioengineering Institute of Catalonia (IBEC) in collaboration with the Laboratory of the Genomin Regulation Center (CRG), A tool to decipher the “secret” language used by proteins to decide whether or not they attach sticky. The presence of these is related to Alzheimer’s and another fifty types of diseases of humans. This new tool, which they have called Caneis designed to explain their decisions, revealing the specific chemical patterns that drive or prevent the harmful aggregation of proteins.
The discovery, in which the Cold Spring Harbor Laboratory (CSHL) and the Welcome Sanger Institute have also been published in the Science Advances magazine. It has been possible to do it thanks to the availability of the highest data set created so far on protein aggregation. This phenomenon, protein agglomeration, or amyloid aggregation, alters the normal function of cells, which is a health danger. If certain parts of the proteins stick to each other, they become dense masses with pathological consequences.
As expected, the study has certain implications in the acceleration of the investigations of certain neurodegenerative diseases, although its most immediate impact will be in biotechnology. Many drugs are protein, and on certain occasions, the unwanted aggregations of them hinder the function of drugs.
Protein aggregations are formed from a language that is not yet well known. Its language has twenty different letters, instead of the usual four letters that make up the language of DNA. The combination of this twenty letters form words or reasons that mark, among other things, whether or not they adhere.
The size of the data set, key to the study
For a long time, it has been tried to decipher which combinations of these letters generate the glue of proteins, and what others allows proteins to fold without failures. Thanks to the AI tools dealing with amino acids such as the alphabet of a secret language, you can help identify these specific words or motifs. But so far the quality and volume of protein aggregation data needed to feed the AI models that facilitate the process have been scarce or restricted to very small fragments of proteins.
To overcome this difficulty, the study has carried out various large -scale experiments. The authors of the same, with the DoTora Benedetta Bolognesi At the head, they created more than 100,000 fragments of completely random proteins from scratch. Each of them with a length of 20 amino acids.
Likewise, the capacity of each synthetic fragment to be added was tested in living yeast cells. In this way, if a specific fragment triggers the formation of protein aggregates, yeast cells would grow in a particular way, which can be measured to determine the cause of this formation, as well as its effect.
The study reflected that about one in five of the fragments used caused agglomeration. The rest did not, and although several previous studies have been able to follow the operation of a group of sequences, the new data set has achieved many more variants of different proteins capable of producing amyloid aggregation.
Canya training
With these data generated with the experiments, it has trained Canecreated from the principles of explainable, which leads to Its decision -making processes are transparent and understandable to humans. Of course, this made him lose part of his predictive power, but despite them Canya proved to have 15% more precision than existing models.
Canya is a model considered “of convolution-noction«. This means that it takes its functions from two different areas of AI. The convolution models, such as those used in image recognition, scan photos to locate specific characteristics. In the same way, Canya reviews the protein chain to detect significant characteristics, which in this case are not images, but words or motifs.
The second area of which Canya takes its functions is that of language translation tools, which use AI models to identify key phrases in a sentence before deciding which is the best translation. With the incorporation of this technique, Canya is able to detect what reasons are the most important of all the generating scale protein. This combination allows Canya to see in detail the reasons at home, and detect its large -scale importance.
This information can be used not only to predict what reasons for the protein chain form the agglomeration, block or cause an intermediate stadium. Also to understand why each type of action happens.
The tool has demonstrated that the small regions of water repellent amino acids are more likely to cause agglomeration. In addition, some reasons have more impact on agglomeration if they are towards the beginning of a protein sequence and not towards the end. Canya located several rules that direct the aggregation of PRTOTEINS. Unlike what was believed, he clarified that amino acids can promote agglomeration.
As conceived today, Canya explains above all the aggregation of proteins under a yes or no, so it works as a classifier. Looking ahead, the team that has developed it wants to adjust the system so that it can predict and compare the aggregation speeds, instead of only the probability of aggregation.
In this way, the prediction of which protein variants form aggregates more quickly and what others do it more slowly, something very important in neurodegenerative diseases.
Imagen Opening: Benedetta Bolognesi