While the rest of the industry chases the next 'revolutionary' NoSQL mirage, OpenAI is doubling down on the classics. The company has successfully scaled PostgreSQL to support 800 million ChatGPT users and millions of queries per second (QPS). According to Bohan Zhang, a member of the technical staff at OpenAI, the platform’s load has surged over 10x in a year. Yet, instead of an expensive migration to a new paradigm, the engineering team relies on a single primary Azure PostgreSQL flexible server instance, flanked by nearly 50 global read replicas.
The Engineering Reality: Optimization Over Migration
The technical strain on PostgreSQL’s design becomes apparent during high-write spikes, which Zhang notes can trigger a 'vicious cycle' of retries and resource saturation. To prevent a total meltdown, OpenAI implemented a brutal regime of connection pooling, workload isolation, and aggressive rate limiting. By isolating critical workloads, they ensure that a spike in one feature doesn't bring down the entire primary node. This isn't just about survival; it’s a masterclass in extracting maximum performance from a monolithic setup before giving in to the complexity of sharding.
"The real story isn't about finding a silver-bullet database, but about extracting maximum value from existing stacks through single-point-of-failure mitigation."
For technical leads, the lesson is clear: your TCO (Total Cost of Ownership) is often lower when you fix what you have rather than pivoting to a 'distributed' hype-train. OpenAI’s decision to remain unsharded suggests that even for the world’s most famous AI product, the primary bottleneck isn't the relational model—it's how you manage the schema and traffic. Before greenlighting a migration to a new database, audit your primary node. You likely have more runway than the vendor marketing teams want you to believe. Scalability is earned through rigorous performance tuning, not bought through new licenses.