By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: OpenAI Scales Single Primary Postgresql to Millions of Queries per Second for ChatGPT
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > OpenAI Scales Single Primary Postgresql to Millions of Queries per Second for ChatGPT
News

OpenAI Scales Single Primary Postgresql to Millions of Queries per Second for ChatGPT

News Room
Last updated: 2026/02/12 at 10:06 AM
News Room Published 12 February 2026
Share
OpenAI Scales Single Primary Postgresql to Millions of Queries per Second for ChatGPT
SHARE

OpenAI outlined how it scaled PostgreSQL to handle millions of queries per second for ChatGPT and its API platform, serving hundreds of millions of users globally. The effort highlights how far a single-primary PostgreSQL instance can be pushed before write-intensive workloads require additional distributed solutions, emphasizing design trade-offs and operational guardrails needed for a low-latency, globally available service.

As PostgreSQL load grew more than tenfold in the past year, OpenAI worked with Azure to optimize its deployment on Azure Database for PostgreSQL, enabling the system to serve 800 million ChatGPT users while maintaining a single-primary instance with sufficient headroom. Optimizations spanned both the application and database layers, including scaling up instance size, refining query patterns, and scaling out with additional read replicas. Redundant writes were reduced through application-level tuning, and new write-heavy workloads were directed to sharded systems such as Azure Cosmos DB, reserving PostgreSQL for relational workloads requiring strong consistency.

The primary PostgreSQL instance is supported by nearly 50 geo-distributed read replicas on Azure Database for PostgreSQL. Reads are distributed across replicas to maintain p99 latency in the low double-digit milliseconds, while writes remain centralized with measures to limit unnecessary load. Lazy writes and application-level optimizations further reduce pressure on the primary instance, ensuring consistent performance even under global traffic spikes.

PostgreSQL cascading replication (Source: OpenAI Blog Post)

Operational challenges emerged as traffic scaled. Cache-miss storms, multi-table join patterns often generated by ORMs, and service-wide retry loops were identified as common failure modes. To address these, OpenAI moved some computation to the application layer, enforced stricter timeouts on idle and long-running transactions, and refined query structures to reduce interference with autovacuum processes.

Reducing write pressure was a key strategy. PostgreSQL’s MVCC model increases CPU and storage overhead under heavy updates due to version churn and vacuum costs. OpenAI mitigated this by migrating shardable workloads to distributed systems, rate-limiting backfills and high-volume updates, and maintaining disciplined operational policies to avoid cascading overloads.

In a LinkedIn post, Microsoft Corporate Vice President Shireesh Thota noted that

Every database is optimized differently and needs the right tuning to get it to work at scale.

Connection pooling and workload isolation were also critical. PostgreSQL’s connection limits were managed by PgBouncer in transaction-pooling mode, reducing connection setup latency and preventing spikes in client connections. Critical and non-critical workloads were isolated to avoid noisy neighbor effects during peak demand.

Kubernetes deployment running multiple PgBouncer pods (Source: OpenAI Blog Post)

Scalability constraints also arise from read replication. As the number of replicas increases, the primary must stream the WAL to each replica, adding CPU and network overhead. OpenAI is experimenting with cascading replication, where intermediate replicas relay WAL downstream, reducing load on the primary while supporting future growth. These strategies allow PostgreSQL to sustain extremely large-scale, read-heavy AI workloads across geo-distributed regions, while sharded systems handle write-intensive operations to maintain stability and performance.

OpenAI has indicated it continues to evaluate ways to extend PostgreSQL’s scalability envelope, including sharded PostgreSQL deployments and alternative distributed systems, to balance strong consistency guarantees with rising global traffic and increasingly diverse workloads as the platform grows.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Claude’s free tier now connects to apps, creates documents, more Claude’s free tier now connects to apps, creates documents, more
Next Article The 0k-a-Day Machine: Why Top Creators are Wearing Crocs The $100k-a-Day Machine: Why Top Creators are Wearing Crocs
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Video: Why Tech Giants Are Accused of Causing Social Media Addiction
Video: Why Tech Giants Are Accused of Causing Social Media Addiction
News
YouTube finally lands on Apple Vision Pro after long wait
YouTube finally lands on Apple Vision Pro after long wait
Mobile
Lazarus Campaign Plants Malicious Packages in npm and PyPI Ecosystems
Lazarus Campaign Plants Malicious Packages in npm and PyPI Ecosystems
Computing
It will take a large coalition to scale up AI adoption. The AI Impact Summit in New Delhi is the place to start building it.
It will take a large coalition to scale up AI adoption. The AI Impact Summit in New Delhi is the place to start building it.
News

You Might also Like

Video: Why Tech Giants Are Accused of Causing Social Media Addiction
News

Video: Why Tech Giants Are Accused of Causing Social Media Addiction

0 Min Read
It will take a large coalition to scale up AI adoption. The AI Impact Summit in New Delhi is the place to start building it.
News

It will take a large coalition to scale up AI adoption. The AI Impact Summit in New Delhi is the place to start building it.

8 Min Read
T-Mobile just got in trouble over misleading free in-flight Wi-Fi claims
News

T-Mobile just got in trouble over misleading free in-flight Wi-Fi claims

3 Min Read
YouTube is coming to the Apple Vision Pro
News

YouTube is coming to the Apple Vision Pro

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?