A recent article reports that an Oracle patent on a fast sorting method has expired, allowing open source databases to use it freely. Mark Callaghan, the inventor behind the sorting algorithm, shows how this 20-year-old approach can speed up sorting similar data and could make database systems faster and more efficient.
Granted in 2010 to Oracle Corporation, patent US7680791B2 covers a method for sorting data using common prefix bytes and a sort algorithm, which the inventor suggests calling “Orasort”, that addresses the inefficiency caused by repeatedly comparing the same leading parts of similar keys during sorting.
The algorithm combines several techniques: skipping common prefixes when comparing keys, adaptively switching between quicksort and radix sort, caching key substrings to reduce cache misses, and producing results before the sort completes.
Because the data is split into smaller groups during sorting, the keys often share longer prefixes. The algorithm remembers these shared parts, skips them during comparisons, switches to a faster method when helpful, and preloads the next needed bytes to reduce wasted work and speed things up. Callaghan, a database expert formerly at Oracle, Google, and Facebook, explains how his patent came to life and why it matters now:
I invented this while at Oracle and it landed in 10gR2 with claims of ~5X better performance vs the previous sort algorithm used by Oracle. I hope for an open-source implementation one day. The patent has a good description of the algorithm, it is much easier to read than your typical patent. Thankfully the IP lawyer made good use of the functional and design docs that I wrote.
The community has shown immediate interest in implementing it and in improving databases such as MySQL and PostgreSQL. Varun Singh, CTO at Flooid.in and formerly founder at ScaleArc, suggests:
That is so detailed, you could drop it into an AI agent, along with the patent document, and start implementing. Mark is epic.
In a separate thread, Hannu Krosing, database engineer at Google, tried to implement it in Python, C, and C++ using Gemini. According to the article, the in-memory sort algorithm at Oracle delivered about 5x the performance of the previous approach and earned a thank-you email from Larry Ellison. Callaghan remembers:
Once I had this implemented within the Oracle DBMS I was able to compare it with the old sort. The new sort was often about 5 times faster than the old sort. I then compared it with SyncSort. I don’t remember whether they had a DeWitt Clause so I won’t share the results but I will say that the new sort in Oracle looked great in comparison.
Charles Thayer comments:
I’ve never considered the complexity of when a sort algorithm can ship the first result item, to start the response stream and minimize latency. (Quicksort should be relatively good at this). Interesting work.
With over 52000 patents overall, Oracle still holds numerous patents related to database technology, including self-managing database architectures and methods for optimizing database performance, covering various aspects of database management such as automatic tuning and efficient data storage techniques.
