Gemini Robotics-ER 1.6: Autonomous AI Agents for Robotics

Google DeepMind has introduced Gemini Robotics-ER 1.6, an updated embodied reasoning model designed to serve as a high-level cognitive layer for robotic systems. The DeepMind team focused on enabling full autonomy in task execution, allowing robots to utilize a comprehensive toolkit—ranging from Google Search to specialized vision-language-action (VLA) models—to understand context and plan actions. Testing confirms that this new architecture significantly improves spatial navigation and a machine's ability to accurately assess the results of its own manipulations.

The primary technological breakthrough lies in perception accuracy. According to DeepMind’s report, version 1.6 substantially outperforms Gemini Robotics-ER 1.5 and Gemini 1.5 Flash in object recognition, counting, and verifying successful task completion. A standout feature is the model's ability to interact with precision instruments, such as pressure gauges and level meters—a functionality developed in collaboration with Boston Dynamics. The model supports agentic data processing: the robot can autonomously zoom in on images, use object-pointing functions, and execute code to calculate scales and proportions. In the final stage, the system applies general world knowledge to interpret the readings. Boston Dynamics' Spot robot is already utilizing these capabilities to conduct technical inspections.

By making the model available via the Gemini API and Google AI Studio, and providing examples in Colab, Google is giving developers the tools to build advanced intelligent systems. This marks a significant shift in the industry: the integration of high-level cognitive abilities allows robots to independently plan complex tasks and interpret ambiguous visual data in real-world environments.

Source: The Decoder →

Rate this material

★ ★ ★ ★ ★

RoboticsAI AgentsComputer VisionDigital TransformationGoogle DeepMind