This Is How DuckDB | Computer Week

Query data with DuckDB

Once you’ve loaded data into DuckDB, you can query it using SQL expressions. The format does not differ from “normal” queries:

SELECT * FROM users WHERE ID>1000 ORDER BY Name DESC LIMIT 5;

If you want to use a client API for DuckDB queries, there are two options: You can pass SQL strings via the API – or use the client’s relational interface to build data queries programmatically. Specifically, this could look like this in Python with a JSON file:

import duckdb

file = duckdb.read_json("users.json")

file.select("*").filter("ID>1000").order("Name").limit(5)

In the case of Python, you also have the option of using the PySpark API to query DuckDB directly. It should be noted that the implementation does not (yet) support the full range of functions.

DuckDB’s SQL dialect also contains some analytics-related additional elements. For example, if you only want to query a subset of data within a table, you can do this using the SAMPLE clause. The resulting query runs significantly faster, but may also be less accurate. In addition, DuckDB also supports, among other things:

the PIVOT-Keyword to create corresponding tables,
Window functions as well
QUALIFY-Clauses to filter these.

DuckDB-Extensions

As mentioned earlier, DuckDB is not limited to the built-in data formats and behaviors. It is also possible to integrate various addons via the Extension API. For example, extensions are available for:

This article originally appeared at our sister publication Infoworld.com.

This is how DuckDB | Computer Week

Query data with DuckDB

DuckDB-Extensions

Leave a Reply

Query data with DuckDB

DuckDB-Extensions

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply