Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Show external table uri when materilized in s3 buckets with dbt-duckdb #532

Open
01100100 opened this issue Oct 18, 2024 · 1 comment
Labels
enhancement New feature or request triage

Comments

@01100100
Copy link

Describe the feature

I would like dbt-docs to display the S3 URI for externally materialized tables in the "Relation" field, similar to how relations are shown for other adapters.

For example, given a model models/user.sql with the following profile and model configuration, the data will be written to https://fly.storage.tigris.dev/bucket-xxx/modelled/user.json. I would like this URI to be visible in the docs, ideally within the "relation" section, for quick reference.

Example Configuration

factory:
  target: dev
  outputs:
    dev:
      threads: 4
      type: duckdb
      extensions: ['httpfs']
      path: dbt.duckdb
      secrets:
        - type: s3
          region: "{{ env_var('AWS_REGION') }}"
          key_id: "{{ env_var('AWS_ACCESS_KEY_ID') }}"
          secret: "{{ env_var('AWS_SECRET_ACCESS_KEY') }}" 
          endpoint: "{{ env_var('AWS_ENDPOINT_URL_S3') | replace('https://', '') }}"
      external_root: s3://bucket-xxx/modelled
      default:
export AWS_ENDPOINT_URL_S3=fly.storage.tigris.dev
models:
  factory:
    +materialized: external
    user:
      +format: json

In this case, the model models/user.sql will write the external table to https://fly.storage.tigris.dev/bucket-xxx/modelled/user.json. I would like this path to be included in the docs.

Additional context

Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.

This feature is specific to the dbt-duckdb adapter and applies when writing to external files.

The external location path is set in this macro:

If the location argument is specified, it must be a filename (or S3 bucket/path), and dbt-duckdb will attempt to infer the format argument from the file extension of the location if the format argument is unspecified (this functionality was added in version 1.4.1.)

If the location argument is not specified, then the external file will be named after the model.sql (or model.py) file that defined it with an extension that matches the format argument (parquet, csv, or json). By default, the external files are created relative to the current working directory, but you can change the default directory (or S3 bucket/prefix) by specifying the external_root setting in your DuckDB profile.

Who will this benefit?

This feature will be valuable for:

  • Developers who need to quickly query external data without manually looking up the S3 URI.
    Example: Users can easily use the URI with an in-memory DuckDB instance.
  • App builders who want to integrate external table locations into their applications.
    Example: Developers building web applications with plots or data visualizations can access the external table URI directly.

Additionally, this could pave the way for a more interactive exploration of model data directly within the dbt docs by linking to the external data location. 🤔 CLOUD NATIVE DATA FORMATS + WASM INMEMORY DATABASE ⚡

Are you interested in contributing this feature?

Yes 🧔‍♂️

@01100100
Copy link
Author

@jtcohen6 Tagging you here as maintainer. 👨‍⚕️

I think I did something wrong because I got notified that this is failing: https://github.com/dbt-labs/dbt-docs/actions/runs/11401080110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

1 participant