One of the biggest challenges in drug development is finding lead compounds beyond the clinical phase, as 90% of candidates fail in the initial trial phase. In this context, TxGemma—an open model collection developed on top of Google's DeepMind Gemma and its family of modern, lightweight open models—represents a breakthrough. TxGemma aims to harness the power of large language models to improve the efficiency of therapeutic discovery, from identifying promising targets to predicting clinical trial outcomes.
TxGemma is the Successor to Tx-LLM
Launched last October, Tx-LLM was trained to perform a range of therapeutic tasks in the drug development process. The model generated significant interest, so the developers quickly fine-tuned it based on user feedback, resulting in TxGemma. The model is available in three different sizes—2B, 9B, and 27B—each with a "predict" version optimized specifically for narrow therapeutic tasks, such as predicting the toxicity of a molecule or its ability to cross the blood-brain barrier.
TxGemma is based on millions of practical examples that enable the model to excel in various tasks—classification, regression, and generation. The largest predict version, at 27B, outperforms or at least keeps pace with the previous Tx-LLM model in almost all tasks tested, and even outperforms many models optimized for specific tasks. Based on detailed performance data, TxGemma produced similar or better results in 64 out of 66 tasks and performed better than the previous model in 45 tasks.
Chat Capabilities and Further Fine-Tuning Options
The developers focused not only on raw predictive capabilities but also integrated chat features into the models. This allows the models to answer complex questions, justify their decisions, and provide feedback in multi-step conversations. For example, a researcher can ask why a particular molecule was classified as toxic, and the model can justify its answer by referring to the molecule’s structure.
The release of TxGemma offers not only an end product but also a customizable platform for developers and researchers. With the included Colab notebook, it is easy to fine-tune the model based on your own therapeutic data and tasks—such as predicting adverse events in clinical trials. Additionally, TxGemma can be integrated with the Agentic-Tx system, which includes 18 advanced tools, such as PubMed and Wikipedia search, as well as molecular, gene, and protein tools. This solution helps combine everyday research workflows with the multi-step inference capabilities provided by agent systems.
Availabilty
TxGemma is available in the Vertex AI Model Garden and on the Hugging Face platform, so anyone interested can explore how the system works, try out its inference and fine-tuning features, and experiment with the complex workflows offered by Agentic-Tx. As an open model, TxGemma also offers the possibility for further development, since researchers can tailor it to their specific therapeutic development needs using their own data.
The advent of TxGemma could open a new chapter in drug development, significantly shortening the process from the laboratory to the patient’s bedside and reducing development costs.