Google today announced the launch of Gemma, a new family of lightweight open-weight models, barely a week after launching the latest version of its Gemini models. Gemma 2B and Gemma 7B are the first of these new models, which were “inspired by Gemini” and are available for commercial and research use.
Google did not give us a detailed paper on how these models compare to similar models from Meta and Mistral, for example, and only said that they are “state-of-the-art.” The company did mention that these are dense decoder-only models, though, which is the same architecture it used for its Gemini models (and its earlier PaLM models) and that we will see the benchmarks later today on Hugging Face’s leaderboard.
To start with Gemma, developers can access ready-to-use Colab and Kaggle notebooks, as well as integrations with Hugging Face, MaxText and Nvidia’s NeMo. Once pre-trained and tuned, these models can then run anywhere.
While Google emphasizes that these are open models, it’s important to note that they are not open-source. In fact, in a press briefing before today’s announcement, Google’s Janine Banks highlighted the company’s commitment to open source but also noted that Google is very careful about how it refers to the Gemma models.
Banks said, “[Open models] are now very common in the industry.” She continued, “And it often means open weights models, where there is wide access for developers and researchers to customize and fine-tune models but, at the same time, the terms of use — things like redistribution, as well as ownership of those variants that are developed — depend on the model’s own specific terms of use. And so we see some difference between what we would normally call open source and we decided that it made the most sense to call our Gemma models open models.”
That means developers can use the model for inferencing and fine-tune them as they wish and Google’s team argues that these model sizes are a good fit for many use cases.
Google DeepMind product management director Tris Warkentin said, “The generation quality has improved significantly in the last year.” He added, “things that previously would have required extremely large models are now possible with state-of-the-art smaller models. This opens up completely new ways of developing AI applications that we’re very excited about, including being able to run inference and do tuning on your local developer desktop or laptop with your RTX GPU or on a single host in GCP with Cloud TPUs, as well.”
That is also true of the open models from Google’s rivals in this space, so we’ll have to see how the Gemma models perform in real-world situations.
In addition to the new models, Google is also launching a new responsible generative AI toolkit to provide “guidance and essential tools for creating safer AI applications with Gemma,” as well as a debugging tool.