Reddit has completed a major rebuild of its comment backend, migrating from a legacy Python system to a domain-specific Go microservice to improve performance and reliability. The change addresses long-standing challenges with latency and scalability in one of Reddit’s highest-write systems, while laying the groundwork for modernizing other core models.
The migration followed a multi-phase strategy designed to preserve correctness and minimize user disruption. Reddit engineers first implemented all comment read endpoints in Go and validated them using a tap-compare testing approach. In this method, a portion of live traffic is sent to the new service, and its responses are compared against the legacy Python system. In contrast, only the original responses are returned to users. This allowed engineers to detect discrepancies safely in production before fully switching traffic.
Writes were more complex because comment creation touches multiple datastores: PostgreSQL for persistence, Memcached for caching, and Redis for change data capture (CDC) events. To prevent conflicts with production data, Reddit deployed sister data stores for the Go service during tap-compare tests. These stores mirrored the production schema but handled only test writes, enabling the team to validate behavior without risking live data corruption. In total, Reddit tested three write endpoints across three datastores, creating 18 separate validation paths.
Dual write during tap comparison for comment backend( Source: Reddit Engineering Post)
The migration uncovered several edge cases, for eg, early serialization mismatches meant Python services could not initially deserialize data written by Go, which was resolved by validating CDC event consumers against the legacy service. Differences in data access also caused database pressure: the Python monolith used an ORM, whereas Go wrote directly, producing higher load under write amplification, which was mitigated with query-level optimizations. Additionally, race conditions arose during tap-compare when Python writes to production occurred between Go writes and compare reads. Engineers addressed these by improving local testing with production-derived data before running tap-compare.

Tap compare validation after dual writes ( Source: Reddit Engineering Post)
According to the Reddit team, new architecture simplifies the comment system’s dependency chain while maintaining full event delivery guarantees for downstream systems. Moving to a domain-specific microservice also positions the platform for further decomposition of other core services, out of four core models that power almost all use cases: Comments, Accounts, Posts, and Subreddits. As of this writing, two of these models, Comments and Accounts, have been fully migrated from the Python monolith, while migrations for Posts and Subreddits are in progress. Once complete, all four core models will be modernized under the new microservice architecture.
Katie Shannon, senior software engineer at Reddit, summarized the outcomes of the rewrite:
p99 latency for critical write operations, create, update, and increment endpoints, was halved compared to the legacy Python system, which had previously experienced spikes of up to 15 seconds.
Community feedback highlighted faster comment creation and reduced downtime during peak traffic. Shannon noted careful management of data consistency and schema evolution, addressing concurrency and Go-specific questions. The Reddit infrastructure team added that Go’s concurrency allowed fewer pods to achieve higher throughput than Python, making it the preferred choice.
