The free embedded database LMDB (Lightning Memory-Mapped Database) has reached version 1.0. With the first major release, the developers re-document the API and behavior of the library and provide updated documentation. There is a separate migration guide from the 0.9 series for existing applications.
Read more after the ad
LMDB is an application-embedded key/value database library for local data storage. It is aimed at developers who need a transaction-secure database without a separate server process, for example for directory services, caches or metadata. Unlike classic database systems, LMDB does not load data into its own buffer cache, but maps the entire database into the process’s virtual address space using memory mapping. Read access occurs directly to the mapped memory areas, without additional memory allocations or copy operations.
Memory mapping instead of database cache
The basic principle of LMDB remains unchanged with version 1.0. The library uses a B-tree as a data structure and leaves caching entirely to the operating system. Since data records are read directly from the memory image, intermediate steps such as malloc() or memcpy() when reading out data. This reduces administrative overhead and can increase performance, especially for read-intensive workloads. The basis for this is the memory mapping function of the operating system, in which file contents are transparently displayed in the virtual memory.
The transaction model also remains. LMDB offers ACID properties and relies on Multi-Version Concurrency Control (MVCC). New data is written using copy-on-write, so existing pages are never overwritten. This means that a reader always sees a consistent database, while writing processes can be prepared in parallel. A typical example is a service that continuously reads configuration data while a management tool writes changes: readers continue to work without locking and are not blocked by the writing process.
One writer, many readers
LMDB’s concurrency model is different from many other databases. Any number of processes or threads can read at the same time, but write transactions are completely serialized. Only one write transaction can be active at any time. This means the system eliminates deadlocks between competing writers. Readers do not block writers, nor do readers have to wait for ongoing writes.
The developers also deliberately avoid a write-ahead log or an append-only protocol. Instead of regularly merging log files or compressing databases, LMDB maintains free pages within the database itself and reuses them for later writes. This means that the database does not grow indefinitely during normal operation, as can happen with log-based processes without maintenance.
Read more after the ad
Instructions for operation
The release notes on GitHub and the documentation point out some limitations: Long-running read transactions can prevent already released pages from being reused. This can cause the database file to grow unnecessarily. Developers should therefore avoid long-running transactions and regularly check aborted processes for orphaned reader entries. This is, among other things, the function mdb_reader_check as well as the tool mdb_stat available.
It is also not recommended to use it on network file systems. Since LMDB relies on memory mapping and file locks of the operating system, synchronization problems can occur there. According to the documentation, opening the same database multiple times within a process is also considered problematic, according to the documentation.
(fo)
