David Baker’s lab at the University of Washington is announcing two major leaps in the field of AI-powered protein design. The first is a souped-up version of its existing RFdiffusion2 tool that can now design enzymes with performance nearly on par with those found in nature. The second is the release of a new, general-purpose version of its model, named RFdiffusion3, which the researchers are calling their most powerful and versatile protein engineering technology to date.
Last year, Baker received the Nobel Prize in Chemistry for his pioneering work in protein science, which includes a deep-learning model called RFdiffusion. The tool allows scientists to design novel proteins that have never existed. These machine-made proteins hold immense promise, from developing medicines for previously untreatable diseases to solving knotty environmental challenges.
Baker leads the UW’s Institute for Protein Design, which released the first version of the core technology in 2023, followed by RFdiffusion2 earlier this year. The second model was fine-tuned for creating enzymes — proteins that orchestrate the transformation of molecules and dramatically speed up chemical reactions.
The latest accomplishments are being shared today in publications in the leading scientific journals Nature and Nature Methods, as well as a preprint last month on bioRxiv.
A better model for enzyme construction
In the improved version of RFdiffusion2, the researchers took a more hands-off approach to guiding the technology, giving it a specific enzymatic task to perform but not specifying other features. Or as the team described it in a press release, the tool produces “blueprints for physical nanomachines that must obey the laws of chemistry and physics to function.”
“You basically let the model have all this space to explore and … you really allow it to search a really wide space and come up with great, great solutions,” said Seth Woodbury, a graduate student in Baker’s lab and author on both papers publishing today.
In addition to UW scientists, researchers from MIT and Switzerland’s ETH Zurich contributed to the work.
The new approach is remarkable for quickly generating higher-performing enzymes. In a test of the tool, it was able to solve 41 out of 41 difficult enzyme design challenges, compared to only 16 for the previous version.
“When we designed enzymes, they’re always an order of magnitude worse than native enzymes that evolution has taken billions of years to find,” said Rohith Krishna, a postdoctoral fellow and lead developer of RFdiffusion2. “This is one of the first times that we’re not one of the best enzymes ever, but we’re in the ballpark of native enzymes.”
The researchers successfully used the model to create proteins calls metallohydrolases, which accelerate difficult reactions using a precisely positioned metal ion and an activated water molecule. The engineered enzymes could have important applications, including the destruction of pollutants.
The promise of rapidly designed catalytic enzymes could unleash wide-ranging applications, Baker said.
“The first problem we really tackled with AI, it was largely therapeutics, making binders to drug targets,” he said. “But now with catalysis, it really opens up sustainability.”
The researchers are also working with the Gates Foundation to figure out lower-cost ways to build what are known as small molecule drugs, which interact with proteins and enzymes inside cells, often by blocking or enhancing their function to effect biological processes.
The most powerful model to date
While RFdiffusion2 is fine-tuned to make enzymes, the Institute for Protein Design researchers were also eager to build a tool with wide-ranging functionality. RFdiffusion3 is that new AI model. It can create proteins that interact with virtually every type of molecule found in cells, including the ability to bind DNA, other proteins and small molecules, in addition to enzyme-related functions.
“We really are excited about building more and more complex systems, so we didn’t want to have bespoke models for each application. We wanted to be able to combine everything into one foundational model,” said Krishna, a lead developer of RFdiffusion3.
Today the team is publicly releasing the code for the new machine learning tool.
“We’re really excited to see what everyone else builds on it,” Krishna said.
And while the steady stream of model upgrades, breakthroughs and publications in top-notch journals seems to continue unabated from the Institute for Protein Design, there are plenty of behind-the-scenes stumbles, Baker said.
“It all sounds beautiful and simple at the end when it’s done,” he said. “But along the way, there’s always the moments when it seems like it won’t work.”
But the researchers keep at it, and so far at least, they keep finding a path forward. And the institute continues minting new graduates and further training postdocs who go on to launch companies or establish their own academic labs.
“I don’t surf, but I sort of feel like we’re riding a wave and it’s just fun,” Baker said. “I mean, it’s so many, so many problems are getting solved. And yeah, it’s really exhilarating, honestly.”
The Nature paper, titled “Computational design of metallohydrolases,” was authored by Donghyo Kim, Seth Woodbury, Woody Ahern, Doug Tischer, Alex Kang, Emily Joyce, Asim Bera, Nikita Hanikel, Saman Salike, Rohith Krishna, Jason Yim, Samuel Pellock, Anna Lauko, Indrek Kalvet, Donald Hilvert and David Baker.
The Nature Methods paper, titled “Atom-level enzyme active site scaffolding using RFdiffusion2,” was authored by Woody Ahern, Jason Yim, Doug Tischer, Saman Salike, Seth Woodbury, Donghyo Kim, Indrek Kalvet, Yakov Kipnis, Brian Coventry, Han Raut Altae-Tran, Magnus Bauer, Regina Barzilay, Tommi Jaakkola, Rohith Krishna and David Baker.
