Google DeepMind Revives Encoder-Decoder LLMs with T5Gemma

Google DeepMind is reminding the industry of the enduring potential of the encoder-decoder Large Language Model (LLM) architecture. While most attention has gravitated towards decoder-only models, the classic encoder-decoder structure, exemplified by T5 (The Text-to-Text Transfer Transformer), remains highly relevant for numerous real-world applications. Encoder-decoder models frequently demonstrate superior performance in tasks such as summarization, translation, and question answering. This effectiveness stems from their efficient inference capabilities, flexible design, and the encoder's richer representation for comprehending input data. Despite these advantages, this powerful architecture has historically received less spotlight. Today, Google DeepMind introduces T5Gemma, a new suite of encoder-decoder LLMs developed by adapting pre-trained decoder-only models. T5Gemma is built upon the Gemma 2 framework and incorporates adapted Gemma 2 2B and 9B models, alongside new T5-sized models including Small, Base, Large, and XL variants. Google is releasing both pre-trained and instruction-tuned T5Gemma models to the community, aiming to foster new avenues for research and development. The T5Gemma initiative specifically investigates the creation of advanced encoder-decoder models by leveraging pre-trained decoder-only models through an adaptation technique. The core methodology involves initializing the parameters of a new model with the weights of an existing pre-trained decoder-only model, followed by subsequent fine-tuning. This approach allows businesses to explore the benefits of encoder-decoder architectures without starting from scratch, potentially accelerating development for specific text-generation and understanding tasks. The availability of these models, particularly the instruction-tuned versions, means businesses can more readily experiment with advanced summarization or translation tools tailored to their specific needs. This move by Google DeepMind suggests a strategic effort to broaden the utility of their LLM offerings beyond the current decoder-only trend, providing developers with more architectural choices. The practical implication for enterprises is the potential to enhance existing NLP pipelines or build new ones that capitalize on the distinct strengths of encoder-decoder models, possibly achieving better efficiency and accuracy in complex language processing scenarios.

Source: Google DeepMind News →

Rate this material

★ ★ ★ ★ ★

Large Language ModelsGenerative AIGoogle DeepMindAI ToolsFine-tuning

T5Gemma: Google DeepMind's New Encoder-Decoder LLMs