A deep learning model based on artificial intelligence, ProtGPS, can predict how proteins are arranged within cells. This breakthrough not only reveals previously hidden layers of cellular organization but also offers new opportunities in drug development and biotechnology.
The spatial arrangement of proteins within cells plays a critical role in their function. Until now, applications of artificial intelligence in biology have primarily focused on predicting protein structures. The Nobel Prize-winning AI model AlphaFold was able to determine the three-dimensional shape of proteins. However, a protein’s structure alone is not always sufficient to fully understand its function within the cell.
ProtGPS bridges this gap: it can predict not only a protein’s structure but also its precise location within the cell. This capability allows scientists to better target and position proteins, which could represent a major advancement in drug discovery.
A new piece of the cellular map puzzle
Researchers have long known that proteins destined for specific cellular compartments, such as the nucleus or mitochondria, carry special tags. These tiny molecular markers serve as guides, ensuring that proteins reach the correct location. However, a significant portion of the cell functions as an open space, where proteins organize themselves into biomolecular condensates based on more subtle signals. These dynamic, fluid-like clusters regulate gene activity, help cells cope with stress, and play a role in the development of certain diseases.
ProtGPS can detect hidden amino acid sequence patterns that direct proteins to their proper cellular destinations. This capability enables the design of proteins that do not naturally exist but have specific, engineered localizations.
How is AI taught the language of proteins?
ProtGPS is a so-called protein language model, functioning similarly to AI-based language models like ChatGPT. Instead of analyzing words and sentences, it learns from the amino acid sequences of proteins, where each amino acid is represented by a specific letter combination. Therefore, rather than being a generative language model like ChatGPT, ProtGPS is a generative biology model.
The model utilizes a deep learning framework called Evolutionary Scale Modeling (ESM), originally developed by Meta to predict protein structure and function. The uniqueness of ESM lies in its approach: while AlphaFold performs detailed physical calculations, ESM relies on sequence-based learning, allowing it to operate much faster and on larger datasets. This has enabled ProtGPS to rapidly and efficiently decode the principles governing protein localization within cells.
A new tool for drug development and disease research
One of the most exciting applications of ProtGPS lies in disease research and drug development. The model can predict how specific mutations affect the compartmentalization, or localization, of proteins within the cell. This capability is particularly valuable in understanding diseases such as cancer and genetic disorders, where protein mislocalization plays a critical role.
The biotechnology company Dewpoint Therapeutics has already integrated ProtGPS into its drug discovery processes, aiming to develop new therapies for diseases in which proteins aggregate into abnormal condensates. Other researchers also see great potential in this tool, particularly in fields where targeted protein modifications could help combat disease.
A new perspective in Biology
ProtGPS is not just a new biotechnological tool—it represents a shift in scientific perspective. For decades, biology has primarily focused on molecular structures, but it is becoming increasingly clear that spatial organization within the cell is equally important. Just as the mere presence of furniture is not enough to create a well-designed home—its placement also matters—precise molecular organization is essential within cells.
The hidden patterns uncovered by ProtGPS open up new possibilities in biology and drug development. For the first time, scientists can manipulate and precisely target proteins within cells, potentially leading to the development of new drugs and therapies. In the future, artificial intelligence could provide even deeper insights into cellular processes, revolutionizing our understanding of life.