Alex Seaton, Staff Software Engineer at Man Group, presented How to Build a Database Without a Server at QCon London 2025, where he discussed the challenges of replacing a high-maintenance MongoDB server farm in a hedge fund trading application with serverless object storage. Seaton stated that object storage has been consistently improving with the ability to use various concurrency models.
Seaton kicked off his presentation with a graphical representation of a typical hedge fund trading system.
The Man Group’s original implementation of the above hedge fund trading system was 100% Python, using Pandas DataFrames as a data interchange format, a MongoDB cluster farm for data storage, and versioned data because vendors would send data corrections after having initially delivered it. Despite the success of this implementation, the cluster farm had grown over time and started getting out of control. Also, a cascade effect with the versioned data required tracking modifications to maintain calculations.
To address these issues, the team questioned the use of the MongoDB cluster farm. Preferences for an improved hedge fund trading system without the cluster farm included: managing storage as opposed to storing it; the ability to easily expand and upgrade; and read speeds limited by the storage bandwidth.
A search on serverless databases yields numerous definitions. The most general of which states that a serverless database is one that adjusts database capacity based on the demands of an application. Other such definitions may be found from MongoDB and Cockroach Labs. However, Seaton preferred a simpler definition: a software library and object storage. For the Man Group hedge fund application, this meant using ArcticDB with a combination of Python and C+, connecting to object storage. Seaton provided code examples on how to create a database and query for stock information.
Object storage consists of a structure as a tree of objects using the same file format. Only the very top layer is mutable. In terms of performance, processing in the native layer (C++) and mapping to Python can be expensive. Seaton provided multiple graphical examples of a data hierarchy with stock data.
Even with the simplest approach to managing the global state, there are still problems. Clock drift is real and can easily be tens of seconds. Seaton suggested using either the system clock or the storage clock. However, the system clock suffers from drift, and the storage clock suffers from latency effects.
A Conflict-Free Replicated Data Type (CRDT) is defined as any replica that can be updated without any coordination, with inconsistencies that are automatically resolved. There must be convergence, and =>
operations must be commutative. Examples of some known CRDTs include: a Grow-Only Set (G-Set), a set that does not only allow data to be deleted; a Two-Phase Set (2P-Set), a G-Set that allows data to be deleted; a Last-Write-Wins-Element-Set (LWW-Element-Set), a 2P-Set with “add sets” and “remove sets” with timestamps for the data; and an Observed-Remove Set (OR-Set), a LWW-Element-Set with unique tags instead of timestamps. For managing the global state, Seaton maintained that we should respect the consistency of the LWW-Element-Set.
Using the same stock data, Seaton provided multiple graphical examples using CRDTs.
Despite their advantages, Seaton explained that there are practical challenges with CRDTs, such as: garbage collection in terms of an accumulation of tiny objects; and compaction where there is a need to compact while keeping a consistent view for clients.
Seaton also discussed distributed locking in terms of transactions, garbage collection and compaction.
In closing, Seaton provided these key takeaways: object storage is consistently improving (improved concurrency models are now possible); distributed locking is subtle; it is possible to build a useful system, but be aware that clocks don’t act like clocks, sets don’t act like sets and locks do not provide mutual exclusion; and CRDTs are useful, but are very subtle things as developers can never correctly model a set.