Skip to content

Latest commit

 

History

History
404 lines (237 loc) · 19.7 KB

faq.md

File metadata and controls

404 lines (237 loc) · 19.7 KB
layout title
docu
Frequently Asked Questions

Overview

Who makes DuckDB?

DuckDB is maintained by Dr. Mark Raasveldt & Prof. Dr. Hannes Mühleisen along with many other contributors from all over the world. Mark and Hannes have set up the DuckDB Foundation that collects donations and funds development and maintenance of DuckDB. Mark and Hannes are also co-founders of DuckDB Labs, which provides commercial services around DuckDB. Several other DuckDB contributors are also affiliated with DuckDB Labs.

DuckDB's initial development took place at the Database Architectures Group at the Centrum Wiskunde & Informatica (CWI) in Amsterdam, The Netherlands.

Why call it DuckDB?

Ducks are amazing animals. They can fly, walk and swim. They can also live off pretty much everything. They are quite resilient to environmental challenges. A duck's song will bring people back from the dead and inspires database research. They are thus the perfect mascot for a versatile and resilient data management system. Also the logo designs itself.

Is DuckDB open-source?

DuckDB is fully open-source under the MIT license and its development takes place on GitHub in the duckdb/duckdb repository. All components of DuckDB are available in the free version under this license: there is no “enterprise version” of DuckDB.

How are DuckDB, the DuckDB Foundation, DuckDB Labs, and MotherDuck related?

DuckDB is the name of the MIT licensed open-source project.

The [DuckDB Foundation]({% link foundation/index.html %}) is a non-profit organization that holds the intellectual property of the DuckDB project. Its statutes also ensure DuckDB remains open-source under the MIT license in perpetuity. Donations to the DuckDB Foundation directly fund DuckDB development.

DuckDB Labs is a company based in Amsterdam that provides commercial support services for DuckDB. DuckDB Labs employs the core contributors of the DuckDB project.

MotherDuck is a venture-backed company creating a hybrid cloud/local platform using DuckDB. MotherDuck contracts with DuckDB Labs for development services, and DuckDB Labs owns a portion of MotherDuck. See the partnership announcement for details. To learn more about MotherDuck, see the CIDR 2024 paper on MotherDuck and the MotherDuck documentation.

Where do I find the DuckDB logo?

You can download the DuckDB Logo here:

Inverted variants for dark backgrounds:

The DuckDB logo & website were designed by Jonathan Auch & Max Wohlleber.

Where do I find DuckDB trademark use guidelines?

Please consult the [trademark guidelines for DuckDB™]({% link trademark_guidelines.md %}).

I would like feature X to be implemented in DuckDB. How do I proceed?

Features in DuckDB can be implemented in different ways: in the main DuckDB project, as a [core extension]({% link docs/extensions/core_extensions.md %}) or a [community extension]({% link docs/extensions/community_extensions.md %}). We recommend following these guidelines for feature requests:

  • If you would like a feature to be implemented in DuckDB, please raise and issue in the Ideas section in DuckDB's GitHub Discussions forum. The DuckdB team monitors these ideas and, over time, implements the frequently requested features. For example, we recently published the [Avro Community Extension]({% link community_extensions/extensions/avro.md %}) to support reading Avro files, which was the most requested feature in the issue tracker.
  • If you would like to implement a feature in the main DuckDB project, please discuss it with the DuckDB team on GitHub Discussions or on our Discord server. The team can verify whether the idea and the proposed implementation line up with the project's long-term vision.
  • If you would like to implement a feature as an extension, consider submitting it to the [Community Extensions repository]({% link docs/extensions/community_extensions.md %}).

Please note that DuckDB Labs, the company that employs the main DuckDB contributors, provides consultancy services for DuckDB, which can include implementing features in DuckDB or as DuckDB extensions.

Working with DuckDB

Can DuckDB save data to disk?

DuckDB supports [persistent storage]({% link docs/connect/overview.md %}#persistent-database) and stores the database as a single file, which includes all tables, views, indexes, macros, etc. present in the database. DuckDB's [storage format]({% link docs/internals/storage.md %}) uses a compressed columnar representation, which is compact but allows for efficient bulk updates. DuckDB can also run in [in-memory mode]({% link docs/connect/overview.md %}#in-memory-database), where no data is persisted to disk.

What type of storage should I run DuckDB on (e.g., local disks, network-attached storage)?

The type of storage used to run DuckDB has a [significant performance impact]({% link docs/guides/performance/environment.md %}#disk). In general, using SSDs (SATA or NVMe SSDs) leads to superior performance compared to HDDs.

The location of the storage varies greatly depending the workload:

  • For read-only workloads, the DuckDB database can be stored on local disks and remote endpoints such as [HTTPS]({% link docs/extensions/httpfs/https.md %}) and cloud object storage such as [AWS S3]({% link docs/extensions/httpfs/s3api.md %}) and similar providers.
  • For read-write workloads, storing the database on instance-attached storage yields the best performance. Network-attached cloud storage such as AWS EBS also works and its performance can be fine-tuned with the guaranteed IOPS settings. Based on our experience, we advise against running read-write DuckDB workloads on on-premises network-attached storage (NAS). These setups are often slow and result in spurious failures that are difficult to troubleshoot.

Is DuckDB an in-memory database?

It is a common misconception that DuckDB is an in-memory database. While DuckDB can work in-memory, it is not an in-memory database. DuckDB can make use of available memory for caching, it also fully supports disk-based persistence and [offloading larger-than-memory operations]({% link docs/guides/performance/how_to_tune_workloads.md %}#larger-than-memory-workloads-out-of-core-processing) to disk.

Is DuckDB built on Arrow?

DuckDB does not use the Apache Arrow format internally. However, DuckDB supports reading from / writing to Arrow using the [arrow extension]({% link docs/extensions/arrow.md %}). It can also run SQL queries directly on Arrow using [pyarrow]({% link docs/guides/python/sql_on_arrow.md %}).

Are DuckDB's database files portable between different DuckDB versions and clients?

Since version 0.10.0 (released in February 2024), DuckDB is backwards-compatible when reading database files, i.e., newer versions of DuckDB are always able to read database files created with an older version of DuckDB. DuckDB also provides partial forwards-compatibility on a best-effort basis. See the [storage page]({% link docs/internals/storage.md %}) for more details. Compatibility is also guaranteed between different DuckDB clients (e.g., Python and R): a database file created with one client can be read with other clients.

How does DuckDB handle concurrency? Can multiple processes write to DuckDB?

See the documentation on [handling concurrency]({% link docs/connect/concurrency.md %}#handling-concurrency) and the section on [“Writing to DuckDB from Multiple Processes”]({% link docs/connect/concurrency.md %}#writing-to-duckdb-from-multiple-processes).

Is there an official DuckDB Docker image available?

There is no official DuckDB Docker image available. DuckDB uses an [in-process deployment model]({% link why_duckdb.md %}#simple), where the client application and DuckDB are running in the same process. Additionally to the DuckDB clients for Python, R, and other programming languages, DuckDB is also available as a standalone command-line client. This client is available on a [wide range of platforms]({% link docs/installation/index.html %}?version=stable&environment=cli) and is portable without containerization, making it unnecessary to containerize the process for most deployments.

Performance

Does DuckDB use SIMD?

DuckDB does not use explicit SIMD (single instruction, multiple data) instructions because they greatly complicate portability and compilation. Instead, DuckDB uses implicit SIMD, where we go to great lengths to write our C++ code in such a way that the compiler can auto-generate SIMD instructions for the specific hardware. As an example why this is a good idea, it took 10 minutes to port DuckDB to the Apple Silicon architecture.

I would like to benchmark DuckDB against another system. How do I proceed?

We welcome experiments comparing DuckDB's performance to other systems. To ensure fair comparison, we have a few recommendations. First, try to use the [latest DuckDB version available as a nightly build]({% link docs/installation/index.html %}), which often has significant performance improvements compared to the last stable release. Second, consider consulting our DBTest 2018 paper Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing for guidelines on how to avoid common issues in benchmarks. Third, study the DuckDB [Performance Guide]({% link docs/guides/performance/overview.md %}), which has best practices for ensuring optimal performance. Finally, please report the DuckDB version (for stable verison, the version number, for nightly builds, the commit hash).

Use Cases for DuckDB

Is DuckDB intended for data science or data engineering workloads?

DuckDB was designed with both data science and data engineering workloads in mind. Therefore, you can use DuckDB's SQL syntax to be highly flexible, or very precise, depending on your needs.

For data science users, who often run queries in an interactive fashion, DuckDB offers several mechanisms for quickly exploring data sets. For example, CSV files can be loaded by [auto-inferring their schema]({% link docs/data/csv/auto_detection.md %}) using CREATE TABLE tbl AS FROM 'input.csv'. Moreover, there numerous SQL shorthands known as [“friendly SQL”]({% link docs/sql/dialect/friendly_sql.md %}) for more concise expressions, e.g., the [GROUP BY ALL clause]({% link docs/sql/query_syntax/groupby.md %}#group-by-all).

For data engineering use cases, DuckDB allows full control over the loading process, so it is possible to define the precise schema using a CREATE TABLE tbl ⟨schema⟩ statement and populate it using a [COPY statement]({% link docs/sql/statements/copy.md %}) that specifies the CSV's dialect (delimiter, quotes, etc.). Most friendly SQL extensions are simple to rewrite to SQL queries that are fully compatible with PostgreSQL. For example, the GROUP BY ALL clause can be replaced with a GROUP BY clause and an explicit list of columns.

What are typical use cases for DuckDB?

DuckDB's use cases can be split into roughly three major categories. Namely, DuckDB can be used for interactive data analysis by a user (“data science”) and as pipeline component for automated data processing (“data enginereering”). DuckDB can also be deployed in novel architectures, where one traditionally couldn't run an analytical database management system but DuckDB is available thanks to its portability. These architectures include running DuckDB in browsers (using the WebAssembly client) and on smartphones. Additionally, DuckDB's extensions unlock use cases such as geospatial analysis and deep integration with other database systems. And finally, in some cases, DuckDB doesn't even need data to be a database.

Releases and Development

When is the next version going to be released?

Please check the [release calendar]({% link docs/dev/release_calendar.md %}) for the planned release date of the next stable version of DuckDB.

Is there a development roadmap for DuckDB?

Currently, we do not maintain a public development roadmap. We discuss planned developments at DuckCon events (typically held twice a year). See the most recent overview talk at DuckCon #5.

How can I contribute to the DuckDB documentation?

The DuckDB Website is hosted by GitHub Pages, its repository is at duckdb/duckdb-web. When the documentation is browsed from a desktop computer, every page has a “Page Source” button on the top that navigates you to its Markdown source file. Pull requests to fix issues or to expand the documentation section on DuckDB's features are very welcome. Before opening a pull request, please consult our Contributor Guide.