A group of scientists from the European Laboratory of Molecular Biology (EMBL) and the German Oncological Research Center (DKFZ) They have developed a generative the AI model which can analyze large -scale medical records and estimate how humans evolves over time. As a result, it is able to predict both the risk and the probable moment of appearance of more than a thousand diseases.
To train this system, the researchers who have developed it used anonymous data of more than 400,000 patients from the UK Biobank, and their capacity with information of 1.9 million people from the Denmark Patients Registry was validated. All data used in the process were used anonymized under strict ethical standards.
UK Biobank participants gave their informed consent, and the Danish data were analyzed according to national regulations, without leaving the country. The researchers used safe virtual systems to ensure privacy and compliance with ethical standards.
As its creators have commented, the launch of this model is one of the most complete demonstrations made so far of how generative AI can model the progression of human diseases in different health systems.
The work detailing the development and training of the model, published in the journal Nature, is the result of a collaboration between the DKFZ, the EMBL and the University of Copenhagen. This AI is based on principles similar to those of large language models. Learn from health data, to represent medical records as sequences of events, such as diagnoses or lifestyle factors such as smoking, which occur in a certain order and with time intervals between them.
The system, so far it works very well according to its developers in diseases with consistent development patterns. As certain types of cancer, heart attacks or sepsis. As with weather predictions, it offers that probabilities, not certainties. For example, it is able to estimate the risk that a person can develop cardiovascular disease in next year, expressed as percentages over time.
Some events, such as the risk of infarction hospitalization can be anticipated with greater precision with the model, but others present more uncertainty. In any case, their short -term predictions are more reliable than those made in the long term.
Another example shows that in cohorts of the Biobank of the United Kingdom between 50 and 55 years, the risk of infarction varies from a probability of 1 between 10,000 per year to 1 between 100, according to previous diagnoses and lifestyle. Women have a minor risk, but with a similar distribution. The model also establishes that in general, probability increases with age, and a systematic evaluation showed that the calculated risks of the model correspond well with the cases observed.
The model is calibrated to produce accurate estimates at a populations scale, but has limitations. Thus, the data of the UK Biobank focuses on people between 40 and 60 years old, which leaves out the pediatric diseases and those of adolescents or the elderly. It also has demographic biases due to the lack of diversity in the data and has a sub -presentation of certain ethnic groups.
This, among other things, means that it is not yet ready for clinical application, although it already allows researchers to study how diseases develop, explore the impact of lifestyle and medical history on long -term risk. Also simulate health results with artificial data in contexts where real data are inaccessible. In the future it is expected that other similar models trained with more representative data can help identify high -risk patients and better plan health resources.
Ewan Birney, interim general director of the EMBLhe commented that this model of AI «It is a proof of concept: it shows that it is possible to learn from our long -term health patterns and use this information to generate valuable predictions. If we model how diseases develop over time, we can begin to explore when certain risks begin to emerge and this allows us to plan preventive interventions. It is a great step towards a personalized health system and towards preventive medicine«.
Tom Fitzgerald, EMBL researcherhe commented that «Medical events often follow predictable patterns. Our AI model learns those patterns and can predict health results. It provides us with a way to explore what could happen to a person based on their medical history and other key factors. Obviously prediction is not a certainty, but an estimate of potential risks«.
As to Moritz Gerstung, director of the DIVOLOGY DIVISION OF DKFZstressed that «This is the principle of a new way of understanding human health and the development of diseases. Someday, generative models like ours could help customize assistance already anticipate large -scale health needs. By learning from great populations, these models offer a powerful perspective on how diseases develop and, in the long run, they could help make preventive and more personalized interventions«.