MedGemma: Our most capable open models for health AI development – Google Research

We strive to create an environment conducive to many different types of research across many different time scales and levels of risk.
Our researchers drive advancements in computer science through both fundamental and applied research.
We regularly open-source projects with the broader research community and apply our developments to Google products.
Publishing our work allows us to share ideas and work collaboratively to advance the field of computer science.
We make products, tools, and datasets available to everyone with the goal of building a more collaborative ecosystem.
Supporting the next generation of researchers through a wide range of programming.
Participating in the academic research community through meaningful engagement with university faculty.
Connecting with the broader research community through events is essential for creating progress in every aspect of our work.
July 9, 2025
Daniel Golden, Engineering Manager, and Rory Pilgrim, Product Manager, Google Research
We’re announcing new multimodal models in the MedGemma collection, our most capable open models for health AI development.
Healthcare is increasingly embracing AI to improve workflow management, patient communication, and diagnostic and treatment support. It’s critical that these AI-based systems are not only high-performing, but also efficient and privacy-preserving. It’s with these considerations in mind that we built and recently released Health AI Developer Foundations (HAI-DEF). HAI-DEF is a collection of lightweight open models designed to offer developers robust starting points for their own health research and application development. Because HAI-DEF models are open, developers retain full control over privacy, infrastructure and modifications to the models. In May of this year, we expanded the HAI-DEF collection with MedGemma, a collection of generative models based on Gemma 3 that are designed to accelerate healthcare and lifesciences AI development.
Today, we’re proud to announce two new models in this collection. The first is MedGemma 27B Multimodal, which complements the previously-released 4B Multimodal and 27B text-only models by adding support for complex multimodal and longitudinal electronic health record interpretation. The second new model is MedSigLIP, a lightweight image and text encoder for classification, search, and related tasks. MedSigLIP is based on the same image encoder that powers the 4B and 27B MedGemma models.
MedGemma and MedSigLIP are strong starting points for medical research and product development. MedGemma is useful for medical text or imaging tasks that require generating free text, like report generation or visual question answering. MedSigLIP is recommended for imaging tasks that involve structured outputs like classification or retrieval. All of the above models can be run on a single GPU, and MedGemma 4B and MedSigLIP can even be adapted to run on mobile hardware.
Full details of MedGemma and MedSigLIP development and evaluation can be found in the MedGemma technical report.
The MedGemma collection includes variants in 4B and 27B sizes, both of which now accept image and text inputs and produce text outputs.
On MedQA, MedGemma 4B and 27B are among the best performing models of their size. Note that in this plot, cost estimates are made based on legacy.lmarena.ai price analysis and together.ai/pricing. For models not present on the leaderboard, we used price data from the models from which they were derived.
Based on review by a US board-certified cardiothoracic radiologist, we found that 81% of MedGemma chest X-ray reports would lead to similar patient management compared to the original radiologist reports.
We developed these models by training a medically optimized image encoder (independently released as MedSigLIP, described below), followed by training the corresponding 4B and 27B versions of the Gemma 3 model on medical data. We took care to retain the general (non-medical) capabilities of Gemma throughout this process. This allows MedGemma to perform well on tasks that mix medical and non-medical information and preserve instruction-following and capabilities in non-English languages.
A key aspect of these models is their adaptability. For instance, after fine-tuning, MedGemma 4B is able to achieve state-of-the-art performance on chest X-ray report generation, with a RadGraph F1 score of 30.3. The straightforward ability for developers to improve performance on their target applications highlights the value of MedGemma as a starting point for developers looking to build AI for healthcare.
MedSigLIP is a lightweight image encoder of only 400M parameters that uses the Sigmoid loss for Language Image Pre-training (SigLIP) architecture. MedSigLIP was adapted from SigLIP via tuning with diverse medical imaging data, including chest X-rays, histopathology patches, dermatology images, and fundus images, allowing the model to learn nuanced features specific to these modalities. Importantly, we also took care to ensure that MedSigLIP retains strong performance on the natural images on which the original SigLIP model was trained, maintaining its versatility.
MedSigLIP is designed to bridge the gap between medical images and medical text by encoding them into a common embedding space. MedSigLIP achieves similar or improved classification performance compared to task-specific vision embedding models while being far more versatile across medical imaging domains.
MedSigLIP is ideal for:
Because the MedGemma collection is open, the models can be downloaded, built upon, and fine-tuned to support developers’ specific needs. Particularly in the medical space, this open approach offers several distinct advantages over API-based models:
To ensure broad accessibility and ease of use, our Hugging Face collection offers MedSigLIP and MedGemma in the popular Hugging Face safetensors format.
Researchers and developers have been exploring the MedGemma models for their use cases and have found the models adept at solving some crucial problems. Developers at DeepHealth in Massachusetts, USA have been exploring MedSigLIP to improve their chest X-ray triaging and nodule detection. Researchers at Chang Gung Memorial Hospital in Taiwan noted that MedGemma works well with traditional Chinese-language medical literature and can respond well to medical staff questions. Developers at Tap Health in Gurgaon, India, remarked on MedGemma’s superior medical grounding, noting its reliability on tasks that require sensitivity to clinical context, such as summarizing progress notes or suggesting guideline-aligned nudges.
We’re excited to continue to learn about these and other use cases from developers as they create the next generation of Health AI tools with MedGemma and MedSigLIP.
To help developers get started, we’ve provided detailed notebooks on GitHub for MedGemma and MedSigLIP that demonstrate how to create instances of MedSigLIP and MedGemma for both inference and fine-tuning on Hugging Face. When developers are ready to scale, MedGemma and MedSigLIP can be seamlessly deployed in Vertex AI as dedicated endpoints, and we provide examples in GitHub of how to run inference on these endpoints. We’ve also added a new demo to our HAI-DEF Hugging Face demo collection that shows how MedGemma can be built into an application to streamline pre-visit information gathering ahead of a patient appointment.
This demo illustrates how MedGemma can be built into an application to streamline pre-visit information gathering ahead of a patient appointment. Code for the demo is available on its Hugging Face site.
Refer to the following table to understand which model from the MedGemma family is ideal for your use case.
* For pathology-specific applications that do not require language alignment, Path Foundation provides high performance for data-efficient classification and lower compute requirements.
** Fast Healthcare Interoperability Resources (FHIR) records are text-based, but have a unique structure. Electronic health record data was included in the training of the MedGemma 27B multimodal model only.
Please visit the HAI-DEF site for these resources and to learn more about the MedGemma collection and other Health AI Developer Foundations models. The HAI-DEF forum is available for questions or feedback.
Models were trained on a mix of public and private de-identified datasets. Google and its partners utilize datasets that have been rigorously anonymized or de-identified to ensure the protection of individual research participants and patient privacy.
MedGemma and MedSigLIP are intended to be used as a starting point that enables efficient development of downstream healthcare applications involving medical text and images. MedGemma and MedSigLIP are not intended to be used without appropriate validation, adaptation and/or making meaningful modification by developers for their specific use case. The outputs generated by these models are not intended to directly inform clinical diagnosis, patient management decisions, treatment recommendations, or any other direct clinical practice applications. Performance benchmarks highlight baseline capabilities on relevant benchmarks, but even for image and text domains that constitute a substantial portion of training data, inaccurate model output is possible. All model outputs should be considered preliminary and require independent verification, clinical correlation, and further investigation through established research and development methodologies.
July 10, 2025
June 30, 2025
June 27, 2025
Follow us

source

You might also like

Follow us on Facebook