OpenAI Launches WebSockets in Responses API for Faster Agents

The standard HTTP request cycle has become the primary bottleneck for the next generation of autonomous agents. According to Brian Yu and Ashwin Nathan of OpenAI’s technical team, the industry has reached a tipping point where model inference is no longer the slowest link in the chain. With the release of GPT-5.3-Codex-Spark—which pumps out over 1,000 tokens per second on Cerebras hardware—the archaic overhead of constant network handshakes has become an unaffordable luxury. When an agent like Codex performs a complex task—scanning a codebase, running tests, and fixing bugs—it traditionally undergoes dozens of data exchange iterations. This structural inefficiency has forced users to wait minutes for results that processors handle in mere seconds.

From Discrete Requests to Persistent Streams

To break this cycle, OpenAI has migrated its Responses API from synchronous HTTP calls to persistent connections via WebSockets. Previously, at GPT-5 speeds (roughly 65 tokens per second), API latency could be masked. However, with the explosive growth in compute performance, the cumulative drag of validating every step and processing context became unbearable. Shifting to WebSockets allows OpenAI to cache conversation states in memory for the duration of a session. There is no longer a need to re-transmit and re-process the entire chat history for every subsequent action the agent takes.

This architectural maneuver has delivered a 40% speedup in agentic loops. The move to WebSockets was driven by the need for incremental data transmission. In our view, this isn't just a cosmetic upgrade; it is the foundation of 'Agentomics'—the infrastructure required to make high-speed models like Codex-Spark feel instantaneous rather than iterative.

By streamlining network nodes and bypassing intermediate service calls, the OpenAI team is directly improving agent responsiveness in real-world environments.

The Infrastructure of Agentomics

Betting on persistent connections is also a strategic move to deepen client reliance on OpenAI’s optimized stack. While the interface may feel familiar, architects will need to overhaul their middleware to handle streaming states and asynchronous locking. As part of this technological marathon launched in November 2025, OpenAI has also deployed accelerated safety classifiers to ensure that performance gains do not compromise security. For businesses, this marks a paradigm shift: a lagging agent that leaves a user staring at a loading indicator is now a sign of technical debt, not the industry norm.

CTOs and tech leads should immediately re-evaluate their current orchestration layers. If your system still forces a full context reload at every step, you have already lost the race for speed. Migrating high-frequency loops to streaming protocols is becoming a baseline requirement for survival in a world where users expect the instant feedback of GPT-5.3-Codex-Spark.

Source: OpenAI Blog →

Rate this material

★ ★ ★ ★ ★

Artificial IntelligenceAI AgentsAutomationOpenAI

OpenAI Adopts WebSockets to Eliminate Latency in 'Agentomics' Era

From Discrete Requests to Persistent Streams

The Infrastructure of Agentomics