Here’s a look at 10 open-source software tools – including software for building AI applications or managing huge volumes of data – that are already widely used or are gaining in popularity.
Open To New Ideas
Open-source software tools continue to increase in popularity because of the multiple advantages they provide including lower upfront software and hardware costs, lower total-cost-of-ownership, lack of vendor lock-in, simpler license management and support from active communities.
In the following slides, as part of the CRN 2024 Year In Review project, we take a look at some of the most popular open-source software products that have caught our attention this year. Some of these have been around for some time and are already widely used while others are relatively new – a couple just making their debut in the last year or so – but show early signs of momentum.
Not surprisingly, the wave of AI and generative AI application development is a major driver for open-source software adoption. Some of the products on this list are in the software development space or help answer the need to manage the huge volumes of data that feed AI systems.
These products are available under open-source licenses such as the MIT License, Apache 2.0 License, GNU GPL and others. Some are products developed by startups that have received financial investments from Y Combinator, the startup accelerator and venture capital firm.
Air exchange
Airbyte is a fast-growing data integration and data movement platform for ETL/ELT data pipelines that connect applications, APIs, databases and files to data warehouses, data lakes and other destinations. Airbyte can also be used to move unstructured and semi-structured data into vector databases and large language model frameworks for AI applications.
The core Airbyte Open Source is already used by more than 40,000 companies. The software is available under multiple open-source licenses including the MIT License and Elastic License 2.0.
Airbyte’s namesake developer, headquartered in San Francisco, also provides a number of commercial products and services around the platform. The company launched a partner program in May, including a certification course, to help technology service providers and resellers work with the Airbyte software.
Apache DataFusion
The Apache Software Foundation describes DataFusion as “a fast, extensible query engine for building high quality, data-centric systems” such as database, dataframe libraries, machine learning, and streaming applications.
DataFusion can be used as an embedded SQL engine or customized and used as a foundation for building new systems with a focus on high-throughput, low-latency analytical, streaming and transaction workloads.
DataFusion leverages the technology capabilities of Apache Arrow, a language-agnostic framework for building data analytics applications that process columnar data, and the Rust programming language.
In June the Apache Software Foundation, which has been developing DataFusion since 2019 as part of the Apache Arrow project, said DataFusion is now designated as a Top-Level Project “to provide more focused governance capacity for continued growth.”
DataFusion is available for download from the Apache Software Foundation website, GitHub, and other sites under the Apache 2.0 License. The latest source release is 41.0.0.
Answer
Danswer offers an open-source AI assistant and enterprise search application that connects all of a company’s tools, applications and documents, making it easier to find information throughout an organization, according to the company’s website.
Danswer says a way to think about its software is ChatGPT – but with access to an organization’s own information, data and documents – and so no hallucinations. The software already offers more than 40 turnkey integrations, such as with Slack and Google Docs, “with more being built every day,” according to the company
Danswer software is self-hosted either within a company’s data center or on a cloud platform.
Founded in 2023, Danswer is backed by the Y Combinator. The software, available under the MIT License, is available from the company and GitHub.
DuckDB
DuckDB is a high-performance, in-process database that’s designed to support online analytical processing (OLAP) query workloads.
The relational (table-oriented) database supports SQL and utilizes a columnar-vectorized query execution engine that can process large batches of values in one operation as a vector, according to the Database of Databases website. The database is designed to run embedded within a host process – there is no server database to install.
DuckDB was originally developed at the Centrum Wiskunde & Informatica, the national research institute for mathematics and computer science in the Netherlands, in 2018.
DuckDB and its core extensions are open sourced under the MIT License and the entire source code is freely available on GitHub. DuckDB version 1.0.0 was just released in June and is available through the DuckDB.org website and GitHub.
One reason DuckDB has been gaining attention is the cloud analytics software developed by startup MotherDuck that runs on DuckDB.
Grafana Observability Tools
Grafana is an open-source observability and data visualization platform used to collect and visualize metric, trace and log data from many data sources. It is frequently used as a component in IT/OT monitoring systems.
Grafana is developed by Grafana Labs and is available under the AGPL-3.0 open-source license. In April the company debuted Grafana 11.0 with a new Explore Metrics root cause analysis feature, improved visualizations, simpler alerting and support for additional data sources.
In addition to its flagship software, Grafana Labs develops additional open-source software including Grafana Loki, a multi-tenant log aggregation system; Grafana Tempo, back-end software for high-scale distributed tracing; and Grafana Mimir, a scalable backend metrics storage and analysis tool. Grafana Labs also sells commercial enterprise editions of its software.
LangChain
LangChain is an open-source orchestration framework for developing generative AI applications powered by large language models (LLMs) that connect with external data sources, according to the Python.Langchain.com website and a description on IBM’s website.
Businesses and organization can derive more value from GenAI if they have a way to load their own proprietary data into the LLMs, a potentially difficult task due to data preparation and LLM tuning complexities and data security concerns.
LangChain simplifies every stage of the LLM application lifecycle including development and deploying applications into production. Specific tools include LangGraph for building stateful agents, LangSmith for inspecting and monitoring chains, and open-source building blocks, components and third-party integrations.
Specific LangChain tools are available at GitHub, including the framework itself under the MIT License.
MindsDB
MindsDB is an open-source virtual database and development platform that automates workflows that connect real-time data to AI systems. The software makes it easier to build, train and deploy machine learning models using SQL queries.
MindsDB, the software’s developer, was founded in 2017 and is based in San Francisco. The company says its mission with its open-source software is to democratize machine learning, according to the company’s website. With that goal in mind, in September 2023 the company launched the MindsDB AI Collective, a network of AI startups and developers that are advancing opens-source machine learning and AI projects and providing connections to investors, technical assistance and talent.
The company is one of many open-source technology startups funded by the Y Combinator, including several on this list.
The MindsDB software is available under the open-source MIT License while MindsDB Core, the core component of the software, specifically uses the Elastic License v2.
OpenFoundry
The OpenFoundry platform provides developer infrastructure for open-source AI projects. The technology helps engineers build, deploy and scale their open-source AI “stack” 10-times faster and ship open-source, AI-powered products more quickly, according to the company’s website.
OpenFoundry was just launched this year by CEO Tyler Lehman, previously a product manager at Meta, and CTO Arthur Chi, a software engineer at Slack. The company is another open-source technology startup funded by the Y Combinator.
The OpenFoundry page on the Y Combinator website pitches the startup as an open-source alternative to the Hugging Face machine learning and data science platform. OpenFoundry is available on GitHub under the MIT License.
OpenZiti
OpenZiti is a free and open-source project focused on bring zero trust networking principles directly into any application, according to the www.openziti.io website. The platform provides all the components needed to implement a zero trust overlay network and provides all the tools that developers need to integrate zero trust into their applications.
The OpenZiti project “believes the principles of zero trust shouldn’t stop at your network, those ideas belong in your application,” according to the site.
OpenZiti is available under the Apache 2.0 license and can be downloaded through the OpenZiti.io web site and GitHub.
The components of OpenZiti include The Fabric, a scalable overlay networking mesh with built-in smart routing; The Edge, components that provide secure entry points into the overlay network; SDKs that developers to embed zero trust principles into applications; and Tunneling, a bridge for applications that can’t have zero trust built in.
Twenty
Startup Twenty is pursuing the audacious task of developing an open-source, SaaS-based CRM application that’s designed to offer a modern alternative to application giant Salesforce.
On its website Twenty says its software provides an operating system for managing customer data along with all the features of a leading CRM system including tasks and “kanbans views” workflow visualizations.
The application is still in early “alpha” development, but is available (under the GNU Affero General Public License) from the company and GitHub for those wanting to check it out.
The latest iteration, version 0.32.0, was released on Nov. 3 with a number of additions and enhancements including more powerful search, a webhooks filter and webhooks multi-object filtering, advanced settings and a new settings layout, a soft delete feature, and a new array field type to store non-predefined values.
Founded in 2023 and headquartered in San Francisco, Twenty received funding from the Y Combinator.