During QCon San Francisco 2025, Jeremy Edberg and Qian Li from DBOS presented a non-conventional architectural approach to workflow orchestration: treating PostgreSQL not just as a data store, but as the orchestration layer itself. Their talk addressed a persistent problem in distributed systems: workflows frequently fail, recovery mechanisms are complex, and visibility into workflow state remains challenging.
According to the speakers, modern applications routinely write workflows, but current solutions struggle with fundamental challenges. Failures occur frequently, existing orchestration tools lack adequate visibility into what’s happening, and coordination logic becomes scattered across multiple systems. “Your database is all you need,” Li noted, pointing to the reality that most teams already have most of the infrastructure to implement workflow support themselves, given an application-level “workflow wrapper” library.
The challenges of external orchestration
The DBOS Transact approach, available as an open-source library under an MIT license for Python, TypeScript, Go, and Java, inverts the traditional architecture stack. Instead of building orchestration layers on top of databases, workflows translate directly into database operations.
The Transact approach – an App Server with a library performing distributed orchestration using the database
The library relies on a checkpoint system. Before executing any workflow step, the system records the input to the database, and after each step execution, it checkpoints the output. When interruptions occur, workflows can resume from the last successful checkpoint rather than restarting from scratch. This approach leverages PostgreSQL’s ACID properties to guarantee exactly-once execution semantics without requiring separate orchestration infrastructure.
The database-backed approach enables several practical capabilities that address everyday operational challenges. Workflow management becomes possible through standard SQL queries, allowing teams to list, search, cancel, and resume workflows directly through database operations. The presenters also demonstrated a “fork” capability for debugging production issues: teams can restart a workflow from a specific step by copying the original inputs and outputs up to that step into a new workflow, and replay execution with updated code. This process greatly simplifies fixing bugs and replaying events to apply the fix.
Fixing Bugs with Forks on Transact
The speakers also acknowledged several challenges inherent to this approach. Lock contention emerges as a primary concern when multiple workers pull tasks from the same queue, potentially degrading performance. The team addresses this through PostgreSQL’s “FOR UPDATE SKIP LOCKED” clause, which allows each worker to select and lock only unlocked workflow rows, enabling efficient concurrent processing.
Decentralized cron scheduling presents another challenge: rather than maintaining a central scheduler, each worker runs the same cron scheduler and uses the scheduled time as a unique workflow identifier to ensure idempotent execution. To prevent thundering herd problems when workers wake simultaneously, the library applies random jitter to sleep intervals, distributing load and checking for workflow existence before execution.
Testing workflows also becomes more straightforward in this model. The presenters emphasized that workflows work identically in local development and production environments. Unit testing workflows is simpler because the checkpoint mechanism allows for easier mocking and state management.
This architectural pattern has historical precedents. Windows Workflow Foundation, introduced nearly two decades ago with .NET Framework 3.0, similarly used SQL Server persistence to maintain workflow state across failures. However, that approach relied on DSL-based workflow definitions and required substantial configuration overhead, limiting adoption primarily to Microsoft’s own products. The DBOS approach differs in that it uses lightweight code annotations in mainstream languages rather than a separate workflow definition language.
