Introduction
COBOL (Common Business Oriented Language) still plays a significant role in the world’s codebase, particularly in critical business and financial systems. While exact figures vary, recent estimates suggest that the amount of COBOL code still in use is substantial:
- There are
775-850 billion lines of COBOL code in daily use worldwide. - COBOL systems handle approximately
$3 trillion worth of daily transactions. 43% of banking systems still use COBOL.95% of ATM swipes in the US and 80% of in-person credit card transactions are processed using COBOL systems.- Approximately
70% to 80% of the world’s business transactions are processed in COBOL.
COBOL has been all over the news these days for various reasons. Here are some recent links for you to enjoy before we dive into the meat of what is happening.
While COBOL may not be as prevalent in new development, it remains a critical component of many legacy systems, particularly in banking, finance, insurance, and government industries. Despite its age,
What is the COBOL Streamhouse?
For decades, COBOL has been the backbone of critical systems in industries like finance, insurance, and government. Despite COBOL’s reputation for reliability, its resistance to modernization has left many organizations grappling with a paradox: how do you preserve the value of battle-tested legacy code while unlocking the agility and scalability of today’s data-driven world? The COBOL Streamhouse project is our answer—a bold initiative to integrate the modern data stack with COBOL, starting with streaming and data lakehouse capabilities that promise to breathe new life into these stalwart systems.
The purpose of COBOL Streamhouse is straightforward yet ambitious: to bring the benefits of real-time data processing and advanced analytics to applications that have, until now, been marooned in a batch-processing past. We’re not here to rip and replace—COBOL’s enduring presence is proof of its worth—but to extend its capabilities, ensuring it can thrive alongside cloud-native technologies. By meeting COBOL on its own terms, we’re crafting a bridge between yesterday’s code and tomorrow’s possibilities, empowering organizations to leverage their existing investments without the disruption of a complete rewrite.
Our journey begins with two foundational phases that tackle the most pressing needs in modern data architectures: streaming and data management. The first phase introduces streaming to COBOL via an Apache Kafka (CobKa) implementation explicitly tailored for this legacy language. Kafka, renowned for its ability to handle high-throughput, real-time data streams, is a cornerstone of the modern data stack. Bringing it to COBOL means enabling these systems to process transactions, logs, and events as they happen—not just in nightly batches. Imagine a banking system that can analyze real-time payment flows or an insurance platform that adjusts risk models on the fly. Our CobKa implementation retains the language’s native strengths—reliability, precision, and compatibility—while embedding the low-latency, event-driven power of streaming.
Complementing this, the second phase introduces Apache Iceberg, reimagined in COBOL as CobBerg, to create a robust data lakehouse framework. Iceberg’s appeal lies in its ability to manage massive datasets with features like schema evolution, partitioning, and ACID transactions—capabilities that are light-years ahead of the flat-file approaches common in legacy COBOL environments. By implementing Iceberg in COBOL, we’re equipping these systems to handle structured and semi-structured data at scale, all while maintaining transactional integrity. This isn’t just about storing data; it’s about making it queryable, governable, and ready for modern analytics tools, from SQL engines to machine learning frameworks. Together, Kafka and Iceberg in COBOL lay the groundwork for a data ecosystem that’s both legacy-friendly and future-ready.
These initial phases are just the beginning. We’re addressing the most immediate gaps between COBOL and contemporary architectures by starting with streaming and data lakehouse functionality. The Kafka implementation unlocks real-time insights, while Iceberg provides a foundation for scalable, flexible data management. Moreover, we’re doing this without forcing developers to abandon COBOL’s syntax or paradigms—our implementations are designed to feel native, not bolted-on. This approach preserves the expertise of COBOL programmers, many of whom have spent decades mastering the language, while inviting a new generation to see its potential in a modern context.
COBOL Streamhouse isn’t about rewriting history; it’s about rewriting the future of legacy systems. As we roll out these capabilities, we’re proving that modernization doesn’t have to mean migration. With streaming and data lakehouse technologies now within reach, COBOL can step confidently into the era of real-time, data-centric computing—without losing the reliability that made it indispensable in the first place.
Summary
If you want to learn more about COBOL in the context of a language you understand, I wrote
Make sure to check out the COBOL Streamhouse