From bb3231020eba01e859b057ea37c145348153213f Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 1 Oct 2024 11:55:23 +0100 Subject: [PATCH 01/33] add new spec to spanpshot page --- website/docs/docs/build/snapshots.md | 633 +++++++++++++++------------ 1 file changed, 354 insertions(+), 279 deletions(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 82b5104fcef..a7484a3c53d 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -18,215 +18,195 @@ Snapshots implement [type-2 Slowly Changing Dimensions](https://en.wikipedia.org | id | status | updated_at | | -- | ------ | ---------- | -| 1 | pending | 2019-01-01 | +| 1 | pending | 2024-01-01 | Now, imagine that the order goes from "pending" to "shipped". That same record will now look like: | id | status | updated_at | | -- | ------ | ---------- | -| 1 | shipped | 2019-01-02 | +| 1 | shipped | 2024-01-02 | This order is now in the "shipped" state, but we've lost the information about when the order was last in the "pending" state. This makes it difficult (or impossible) to analyze how long it took for an order to ship. dbt can "snapshot" these changes to help you understand how values in a row change over time. Here's an example of a snapshot table for the previous example: | id | status | updated_at | dbt_valid_from | dbt_valid_to | | -- | ------ | ---------- | -------------- | ------------ | -| 1 | pending | 2019-01-01 | 2019-01-01 | 2019-01-02 | -| 1 | shipped | 2019-01-02 | 2019-01-02 | `null` | +| 1 | pending | 2024-01-01 | 2024-01-01 | 2024-01-02 | +| 1 | shipped | 2024-01-02 | 2024-01-02 | `null` | -In dbt, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory). You'll also need to configure your snapshot to tell dbt how to detect record changes. - +## Configuring snapshots - +:::info Previewing or compiling snapshots in IDE not supported -```sql -{% snapshot orders_snapshot %} - -{{ - config( - target_database='analytics', - target_schema='snapshots', - unique_key='id', +It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud. Instead, [run the `dbt snapshot` command](#how-snapshots-work) in the IDE. - strategy='timestamp', - updated_at='updated_at', - ) -}} +::: -select * from {{ source('jaffle_shop', 'orders') }} + -{% endsnapshot %} -``` +- To configure snapshots in versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). These versions use an older syntax where snapshots are defined within a snapshot block in a `.sql` file, typically located in your `snapshots` directory. +- Note that defining multiple resources in a single file can significantly slow down parsing and compilation. For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or the [latest version of dbt Core](/docs/dbt-versions/core). - - - -```sql -{% snapshot orders_snapshot %} - -{{ - config( - unique_key='id', - schema='snapshots', - strategy='timestamp', - updated_at='updated_at', - ) -}} - -select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} +In dbt versions 1.9 and later, snapshots are configurations defined in YAML files (typically in your snapshots directory). You'll configure your snapshot to tell dbt how to detect record changes. + + + +```yaml +snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + database: analytics + unique_key: id + strategy: timestamp + updated_at: updated_at ``` - - -:::info Preview or Compile Snapshots in IDE - -It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud. Instead, run the `dbt snapshot` command in the IDE by completing the following steps. - -::: - -When you run the [`dbt snapshot` command](/reference/commands/snapshot): -* **On the first run:** dbt will create the initial snapshot table — this will be the result set of your `select` statement, with additional columns including `dbt_valid_from` and `dbt_valid_to`. All records will have a `dbt_valid_to = null`. -* **On subsequent runs:** dbt will check which records have changed or if any new records have been created: - - The `dbt_valid_to` column will be updated for any existing records that have changed - - The updated record and any new records will be inserted into the snapshot table. These records will now have `dbt_valid_to = null` +The following table outlines the configurations available for snapshots: -Snapshots can be referenced in downstream models the same way as referencing models — by using the [ref](/reference/dbt-jinja-functions/ref) function. - -## Example +| Config | Description | Required? | Example | +| ------ | ----------- | --------- | ------- | +| [database](/reference/resource-configs/database) | Specify a custom database for the snapshot | No | analytics | +| [schema](/reference/resource-configs/schema) | Specify a custom schema for the snapshot | No | snapshots | +| [alias](/reference/resource-configs/alias) | Specify an alias for the snapshot | No | your_custom_snapshot | +| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. Valid values: `timestamp` or `check` | Yes | timestamp | +| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | +| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | +| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source and set `dbt_valid_to` to current time if the record no longer exists | No | True | -To add a snapshot to your project: +- In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. Developers can still set a custom location with [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, consistent with other resource types. +- A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs). +- You can configure snapshots from both the `dbt_project.yml` file and a `config` block. For more information, refer to the [configuration docs](/reference/snapshot-configs). -1. Create a file in your `snapshots` directory with a `.sql` file extension, e.g. `snapshots/orders.sql` -2. Use a `snapshot` block to define the start and end of a snapshot: - +### Add a snapshot to your project -```sql -{% snapshot orders_snapshot %} +To add a snapshot to your project follow these steps. For users on versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). -{% endsnapshot %} -``` +1. Create a YAML file in your `snapshots` directory: `snapshots/orders_snapshot.yml` and add your configuration details. You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). - + -3. Write a `select` statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can use `sources` and `refs` here. + ```yaml + snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + database: analytics + unique_key: id + strategy: timestamp + updated_at: updated_at - + ``` + -```sql -{% snapshot orders_snapshot %} +2. Since snapshots focus on configuration, the transformation logic is minimal. Typically, you'd select all data from the source. If you need to apply transformations (like filters, deduplication), it's best practice to define an ephemeral model and reference it in your snapshot configuration. -select * from {{ source('jaffle_shop', 'orders') }} + ```yaml + -- models/ephemeral_orders.sql + {{ config(materialized='ephemeral') }} -{% endsnapshot %} -``` + select * from {{ source('jaffle_shop', 'orders') }} + ``` - +3. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. -4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. +4. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. -5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). + ``` + $ dbt snapshot + Running with dbt=1.9.0 - + 15:07:36 | Concurrency: 8 threads (target='dev') + 15:07:36 | + 15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] + 15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] + 15:07:36 | + 15:07:36 | Finished running 1 snapshots in 0.68s. - + Completed successfully -```sql -{% snapshot orders_snapshot %} + Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 + ``` -{{ - config( - target_database='analytics', - target_schema='snapshots', - unique_key='id', +5. Inspect the results by selecting from the table dbt created (`analytics.snapshots.orders_snapshot`). After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described later on. - strategy='timestamp', - updated_at='updated_at', - ) -}} +6. Run the `dbt snapshot` command again and inspect the results. If any records have been updated, the snapshot should reflect this. -select * from {{ source('jaffle_shop', 'orders') }} +7. Select from the `snapshot` in downstream models using the `ref` function. -{% endsnapshot %} -``` + - + ```sql + select * from {{ ref('orders_snapshot') }} + ``` + -6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. +8. Snapshots are only useful if you run them frequently — schedule the `dbt snapshot` command to run regularly. - - - +### Configuration best practices -```sql -{% snapshot orders_snapshot %} + -{{ - config( - schema='snapshots', - unique_key='id', - strategy='timestamp', - updated_at='updated_at', - ) -}} +This strategy handles column additions and deletions better than the `check` strategy. -select * from {{ source('jaffle_shop', 'orders') }} + -{% endsnapshot %} -``` + - +The unique key is used by dbt to match rows up, so it's extremely important to make sure this key is actually unique! If you're snapshotting a source, I'd recommend adding a uniqueness test to your source ([example](https://github.com/dbt-labs/jaffle_shop/blob/8e7c853c858018180bef1756ec93e193d9958c5b/models/staging/schema.yml#L26)). + -6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. + - + -``` -$ dbt snapshot -Running with dbt=1.8.0 +Snapshots cannot be rebuilt. As such, it's a good idea to put snapshots in a separate schema so end users know they are special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to. -15:07:36 | Concurrency: 8 threads (target='dev') -15:07:36 | -15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] -15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] -15:07:36 | -15:07:36 | Finished running 1 snapshots in 0.68s. + + -Completed successfully + -Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 -``` + -7. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described below. +Snapshots can't be rebuilt. Because of this, it's a good idea to put snapshots in a separate schema so end users know they're special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to. -8. Run the `snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. + -9. Select from the `snapshot` in downstream models using the `ref` function. + - + If you need to clean or transform your data before snapshotting, create an ephemeral model (or a staging model) that applies the necessary transformations. Then, reference this model in your snapshot configuration. This approach keeps your snapshot definitions clean and allows you to test and run transformations separately. -```sql -select * from {{ ref('orders_snapshot') }} -``` + + - +### How snapshots work -10. Schedule the `snapshot` command to run regularly — snapshots are only useful if you run them frequently. +When you run the [`dbt snapshot` command](/reference/commands/snapshot): +* **On the first run:** dbt will create the initial snapshot table — this will be the result set of your `select` statement, with additional columns including `dbt_valid_from` and `dbt_valid_to`. All records will have a `dbt_valid_to = null`. +* **On subsequent runs:** dbt will check which records have changed or if any new records have been created: + - The `dbt_valid_to` column will be updated for any existing records that have changed + - The updated record and any new records will be inserted into the snapshot table. These records will now have `dbt_valid_to = null` +Snapshots can be referenced in downstream models the same way as referencing models — by using the [ref](/reference/dbt-jinja-functions/ref) function. ## Detecting row changes -Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt — `timestamp` and `check`. +Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt: +- [Timestamp](#timestamp-strategy-recommended) — Uses an `updated_at` column to determine if a row has changed. +- [Check](#check-strategy) — Compares a list of columns between their current and historical values to determine if a row has changed. ### Timestamp strategy (recommended) The `timestamp` strategy uses an `updated_at` field to determine if a row has changed. If the configured `updated_at` column for a row is more recent than the last time the snapshot ran, then dbt will invalidate the old record and record the new one. If the timestamps are unchanged, then dbt will not take any action. @@ -266,27 +246,19 @@ The `timestamp` strategy requires the following configurations: - - -```sql -{% snapshot orders_snapshot_timestamp %} - - {{ - config( - schema='snapshots', - strategy='timestamp', - unique_key='id', - updated_at='updated_at', - ) - }} - - select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} + + +```yaml +snapshots: + - name: orders_snapshot_timestamp + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at ``` - - ### Check strategy @@ -298,15 +270,12 @@ The `check` strategy requires the following configurations: | ------ | ----------- | ------- | | check_cols | A list of columns to check for changes, or `all` to check all columns | `["name", "email"]` | - - :::caution check_cols = 'all' The `check` snapshot strategy can be configured to track changes to _all_ columns by supplying `check_cols = 'all'`. It is better to explicitly enumerate the columns that you want to check. Consider using a to condense many columns into a single column. ::: - **Example Usage** @@ -336,23 +305,19 @@ The `check` snapshot strategy can be configured to track changes to _all_ column - - -```sql -{% snapshot orders_snapshot_check %} - - {{ - config( - schema='snapshots', - strategy='check', - unique_key='id', - check_cols=['status', 'is_cancelled'], - ) - }} - - select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} + + +```yaml +snapshots: + - name: orders_snapshot_check + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: check + check_cols: + - status + - is_cancelled ``` @@ -397,112 +362,42 @@ For this configuration to work with the `timestamp` strategy, the configured `up - - -```sql -{% snapshot orders_snapshot_hard_delete %} - - {{ - config( - schema='snapshots', - strategy='timestamp', - unique_key='id', - updated_at='updated_at', - invalidate_hard_deletes=True, - ) - }} - - select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} + + +```yaml +snapshots: + - name: orders_snapshot_hard_delete + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + invalidate_hard_deletes: true ``` -## Configuring snapshots -### Snapshot configurations -There are a number of snapshot-specific configurations: - - - -| Config | Description | Required? | Example | -| ------ | ----------- | --------- | ------- | -| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics | -| [target_schema](/reference/resource-configs/target_schema) | The schema that dbt should render the snapshot table into | Yes | snapshots | -| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | -| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | -| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | -| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | -| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | - -A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). - -Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. - -Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. - - - - - -| Config | Description | Required? | Example | -| ------ | ----------- | --------- | ------- | -| [database](/reference/resource-configs/database) | Specify a custom database for the snapshot | No | analytics | -| [schema](/reference/resource-configs/schema) | Specify a custom schema for the snapshot | No | snapshots | -| [alias](/reference/resource-configs/alias) | Specify an alias for the snapshot | No | your_custom_snapshot | -| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. Valid values: `timestamp` or `check` | Yes | timestamp | -| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | -| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | -| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | -| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source and set `dbt_valid_to` to current time if the record no longer exists | No | True | - -In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot into across users and environments. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, support was added for environment-aware snapshots by making `target_schema` optional. Snapshots, by default with no `target_schema` or `target_database` config defined, now resolve the schema or database to build the snapshot into using the `generate_schema_name` or `generate_database_name` macros. Developers can optionally define a custom location for snapshots to build to with the [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, as is consistent with other resource types. - -A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs). - -You can configure snapshots from both the `dbt_project.yml` file and a `config` block. For more information, refer to the [configuration docs](/reference/snapshot-configs). - - - -### Configuration best practices -#### Use the `timestamp` strategy where possible -This strategy handles column additions and deletions better than the `check` strategy. - -#### Ensure your unique key is really unique -The unique key is used by dbt to match rows up, so it's extremely important to make sure this key is actually unique! If you're snapshotting a source, I'd recommend adding a uniqueness test to your source ([example](https://github.com/dbt-labs/jaffle_shop/blob/8e7c853c858018180bef1756ec93e193d9958c5b/models/staging/schema.yml#L26)). - - - -#### Use a `target_schema` that is separate to your analytics schema -Snapshots cannot be rebuilt. As such, it's a good idea to put snapshots in a separate schema so end users know they are special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to. - - - - - -#### Use a schema that is separate to your models' schema -Snapshots can't be rebuilt. Because of this, it's a good idea to put snapshots in a separate schema so end users know they're special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to. - - - ## Snapshot query best practices -#### Snapshot source data. -Your models should then select from these snapshots, treating them like regular data sources. As much as possible, snapshot your source data in its raw form and use downstream models to clean up the data +This section outlines some best practices for writing snapshot queries: + +- #### Snapshot source data + Your models should then select from these snapshots, treating them like regular data sources. As much as possible, snapshot your source data in its raw form and use downstream models to clean up the data -#### Use the `source` function in your query. -This helps when understanding data lineage in your project. +- #### Use the `source` function in your query + This helps when understanding data lineage in your project. -#### Include as many columns as possible. -In fact, go for `select *` if performance permits! Even if a column doesn't feel useful at the moment, it might be better to snapshot it in case it becomes useful – after all, you won't be able to recreate the column later. +- #### Include as many columns as possible + In fact, go for `select *` if performance permits! Even if a column doesn't feel useful at the moment, it might be better to snapshot it in case it becomes useful – after all, you won't be able to recreate the column later. -#### Avoid joins in your snapshot query. -Joins can make it difficult to build a reliable `updated_at` timestamp. Instead, snapshot the two tables separately, and join them in downstream models. +- #### Avoid joins in your snapshot query + Joins can make it difficult to build a reliable `updated_at` timestamp. Instead, snapshot the two tables separately, and join them in downstream models. -#### Limit the amount of transformation in your query. -If you apply business logic in a snapshot query, and this logic changes in the future, it can be impossible (or, at least, very difficult) to apply the change in logic to your snapshots. +- #### Limit the amount of transformation in your query + If you apply business logic in a snapshot query, and this logic changes in the future, it can be impossible (or, at least, very difficult) to apply the change in logic to your snapshots. Basically – keep your query as simple as possible! Some reasonable exceptions to these recommendations include: * Selecting specific columns if the table is wide. @@ -526,30 +421,30 @@ For the `timestamp` strategy, the configured `updated_at` column is used to popu
Details for the timestamp strategy -Snapshot query results at `2019-01-01 11:00` +Snapshot query results at `2024-01-01 11:00` | id | status | updated_at | | -- | ------- | ---------------- | -| 1 | pending | 2019-01-01 10:47 | +| 1 | pending | 2024-01-01 10:47 | Snapshot results (note that `11:00` is not used anywhere): | id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | | -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2019-01-01 10:47 | 2019-01-01 10:47 | | 2019-01-01 10:47 | +| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | | 2024-01-01 10:47 | -Query results at `2019-01-01 11:30`: +Query results at `2024-01-01 11:30`: | id | status | updated_at | | -- | ------- | ---------------- | -| 1 | shipped | 2019-01-01 11:05 | +| 1 | shipped | 2024-01-01 11:05 | Snapshot results (note that `11:30` is not used anywhere): | id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | | -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2019-01-01 10:47 | 2019-01-01 10:47 | 2019-01-01 11:05 | 2019-01-01 10:47 | -| 1 | shipped | 2019-01-01 11:05 | 2019-01-01 11:05 | | 2019-01-01 11:05 | +| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | +| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | | 2024-01-01 11:05 |
@@ -560,7 +455,7 @@ For the `check` strategy, the current timestamp is used to populate each column.
Details for the check strategy -Snapshot query results at `2019-01-01 11:00` +Snapshot query results at `2024-01-01 11:00` | id | status | | -- | ------- | @@ -570,9 +465,9 @@ Snapshot results: | id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | | -- | ------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2019-01-01 11:00 | | 2019-01-01 11:00 | +| 1 | pending | 2024-01-01 11:00 | | 2024-01-01 11:00 | -Query results at `2019-01-01 11:30`: +Query results at `2024-01-01 11:30`: | id | status | | -- | ------- | @@ -582,11 +477,191 @@ Snapshot results: | id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | | --- | ------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2019-01-01 11:00 | 2019-01-01 11:30 | 2019-01-01 11:00 | -| 1 | shipped | 2019-01-01 11:30 | | 2019-01-01 11:30 | +| 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | +| 1 | shipped | 2024-01-01 11:30 | | 2024-01-01 11:30 |
+## Configure snapshots in versions 1.8 and earlier + + + +This section is for users on dbt versions 1.8 and earlier. To configure snapshots in versions 1.9 and later, refer to [Configuring snapshots](#configuring-snapshots). The latest versions use an updated snapshot configuration syntax that optimizes performance. + + + + + +- In dbt versions 1.8 and earlier, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory). You'll also need to configure your snapshot to tell dbt how to detect record changes. +- The earlier dbt versions use an older syntax that allows for defining multiple resources in a single file. This syntax can significantly slow down parsing and compilation. +- For faster and more efficient management, consider[ upgrading to Versionless](/docs/dbt-versions/versionless-cloud) or the [latest version of dbt Core](/docs/dbt-versions/core), which introduces an updated snapshot configuration syntax that optimizes performance. + +The following example shows how to configure a snapshot: + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +The following table outlines the configurations available for snapshots in versions 1.8 and earlier: + +| Config | Description | Required? | Example | +| ------ | ----------- | --------- | ------- | +| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics | +| [target_schema](/reference/resource-configs/target_schema) | The schema that dbt should render the snapshot table into | Yes | snapshots | +| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | +| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | +| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | +| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | + +- A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). +- Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. +- Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. + +### Configuration example + +To add a snapshot to your project: + +1. Create a file in your `snapshots` directory with a `.sql` file extension, e.g. `snapshots/orders.sql` +2. Use a `snapshot` block to define the start and end of a snapshot: + + + +```sql +{% snapshot orders_snapshot %} + +{% endsnapshot %} +``` + + + +3. Write a `select` statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can use `sources` and `refs` here. + + + +```sql +{% snapshot orders_snapshot %} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. + +5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). + + + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. + + + + + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + schema='snapshots', + unique_key='id', + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. + + + +``` +$ dbt snapshot +Running with dbt=1.8.0 + +15:07:36 | Concurrency: 8 threads (target='dev') +15:07:36 | +15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] +15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] +15:07:36 | +15:07:36 | Finished running 1 snapshots in 0.68s. + +Completed successfully + +Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 +``` + +7. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described earlier. + +8. Run the `dbt snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. + +9. Select from the `snapshot` in downstream models using the `ref` function. + + + +```sql +select * from {{ ref('orders_snapshot') }} +``` + + + +10. Snapshots are only useful if you run them frequently — schedule the `snapshot` command to run regularly. + + + ## FAQs From 49e0c87ab2357dc23cc6087e6da01fb685b01422 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 1 Oct 2024 11:57:22 +0100 Subject: [PATCH 02/33] add space --- .../faqs/Snapshots/snapshot-target-is-not-a-snapshot-table.md | 1 + 1 file changed, 1 insertion(+) diff --git a/website/docs/faqs/Snapshots/snapshot-target-is-not-a-snapshot-table.md b/website/docs/faqs/Snapshots/snapshot-target-is-not-a-snapshot-table.md index 5ce8f380008..0175588bf6f 100644 --- a/website/docs/faqs/Snapshots/snapshot-target-is-not-a-snapshot-table.md +++ b/website/docs/faqs/Snapshots/snapshot-target-is-not-a-snapshot-table.md @@ -27,3 +27,4 @@ A snapshot must have a materialized value of 'snapshot' This tells you to change your `materialized` config to `snapshot`. But when you make that change, you might encounter an error message saying that certain fields like `dbt_scd_id` are missing. This error happens because, previously, when dbt treated snapshots as tables, it didn't include the necessary [snapshot meta-fields](/docs/build/snapshots#snapshot-meta-fields) in your target table. Since those meta-fields don't exist, dbt correctly identifies that you're trying to create a snapshot in a table that isn't actually a snapshot. When this happens, you have to start from scratch — re-snapshotting your source data as if it was the first time by dropping your "snapshot" which isn't a real snapshot table. Then dbt snapshot will create a new snapshot and insert the snapshot meta-fields as expected. + From 3e52968e0847b0c8d2a980a114b6d111419eaacb Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 1 Oct 2024 12:09:43 +0100 Subject: [PATCH 03/33] add snapshot rn --- website/docs/docs/dbt-versions/release-notes.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/website/docs/docs/dbt-versions/release-notes.md b/website/docs/docs/dbt-versions/release-notes.md index 7c2614b2c10..cef9020b579 100644 --- a/website/docs/docs/dbt-versions/release-notes.md +++ b/website/docs/docs/dbt-versions/release-notes.md @@ -20,6 +20,9 @@ Release notes are grouped by month for both multi-tenant and virtual private clo ## September 2024 +- **New**: In dbt Cloud Versionless, [Snapshots](/docs/build/snapshots) have been updated to use YAML configuration files instead of SQL snapshot blocks. This new feature simplifies snapshot management and improves performance, and will soon be released in dbt Core 1.9. + - Who does this affect? New user on Versionless can define snapshots using the new YAML specification. Users upgrading to Versionless who use snapshots need to migrate their snapshot definitions to YAML. + - Users on dbt 1.8 and earlier: No action needed; existing snapshots will continue to work as before. However, we recommend upgrading to Versionless to take advantage of the new snapshot features. - **Enhancement**: You can now run [Semantic Layer commands](/docs/build/metricflow-commands) commands in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). The supported commands are `dbt sl list`, `dbt sl list metrics`, `dbt sl list dimension-values`, `dbt sl list saved-queries`, `dbt sl query`, `dbt sl list dimensions`, `dbt sl list entities`, and `dbt sl validate`. - **New**: Microsoft Excel, a dbt Semantic Layer integration, is now generally available. The integration allows you to connect to Microsoft Excel to query metrics and collaborate with your team. Available for [Excel Desktop](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationId=4132ecd1-425d-982d-efb4-de94ebc83f26) or [Excel Online](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationid=4132ecd1-425d-982d-efb4-de94ebc83f26&isWac=True). For more information, refer to [Microsoft Excel](/docs/cloud-integrations/semantic-layer/excel). - **New**: [Data health tile](/docs/collaborate/data-tile) is now generally available in dbt Explorer. Data health tiles provide a quick at-a-glance view of your data quality, highlighting potential issues in your data. You can embed these tiles in your dashboards to quickly identify and address data quality issues in your dbt project. From c5a1522422368c5e1289f7050491da4e1e401adf Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 1 Oct 2024 12:32:18 +0100 Subject: [PATCH 04/33] Update website/docs/docs/build/snapshots.md --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index a7484a3c53d..3145dfcce62 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -45,7 +45,7 @@ It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud - To configure snapshots in versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). These versions use an older syntax where snapshots are defined within a snapshot block in a `.sql` file, typically located in your `snapshots` directory. -- Note that defining multiple resources in a single file can significantly slow down parsing and compilation. For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or the [latest version of dbt Core](/docs/dbt-versions/core). +- Note that defining multiple resources in a single file can significantly slow down parsing and compilation. For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). From b70d801a553bf122a4387ab4c9a157b238074fcf Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 1 Oct 2024 12:42:59 +0100 Subject: [PATCH 05/33] update configs --- website/docs/reference/snapshot-configs.md | 236 ++++++++++++++------- 1 file changed, 155 insertions(+), 81 deletions(-) diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index 5afe429cfb4..2c3e6b665b1 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -20,7 +20,6 @@ Parts of a snapshot: --> ## Available configurations -### Snapshot-specific configurations @@ -80,8 +79,36 @@ snapshots: + + **Note:** Required snapshot properties _will not_ work when defined in `config` YAML blocks. We recommend that you define these in `dbt_project.yml` or a `config()` block within the snapshot `.sql` file. +For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or the [latest version of dbt Core](/docs/dbt-versions/core). + + + + + +Refer to [configuring snapshots](/docs/build/snapshots#configuring-snapshots) for the available configurations. + + + +```yaml +snapshots: + - name: snapshot_name + relation: source('my_source', 'my_table') + config: + schema: string + database: string + unique_key: column_name_or_expression + strategy: timestamp | check + updated_at: column_name # Required if strategy is 'timestamp' + check_cols: [column_name] | 'all' # Required if strategy is 'check' + invalidate_hard_deletes: true | false +``` + + + @@ -160,6 +187,7 @@ snapshots: + ```yaml @@ -178,6 +206,29 @@ snapshots: ``` + + + + + +```yaml +version: 2 + +snapshots: + - name: [] + relation: source('my_source', 'my_table') + config: + [enabled](/reference/resource-configs/enabled): true | false + [tags](/reference/resource-configs/tags): | [] + [alias](/reference/resource-configs/alias): + [pre-hook](/reference/resource-configs/pre-hook-post-hook): | [] + [post-hook](/reference/resource-configs/pre-hook-post-hook): | [] + [persist_docs](/reference/resource-configs/persist_docs): {} + [grants](/reference/resource-configs/grants): {} +``` + + + @@ -206,98 +257,121 @@ snapshots: ## Configuring snapshots Snapshots can be configured in one of three ways: -1. Using a `config` block within a snapshot -2. Using a `config` [resource property](/reference/model-properties) in a `.yml` file -3. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. +1. Defined in YAML files, typically in your snapshots directory (available in [Versionless](/docs/dbt-versions/versionless-cloud) or and dbt Core v1.9 and higher). +2. Using a `config` block within a snapshot +3. Using a `config` [resource property](/reference/model-properties) in a `.yml` file +4. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. Snapshot configurations are applied hierarchically in the order above. ### Examples -#### Apply configurations to all snapshots -To apply a configuration to all snapshots, including those in any installed [packages](/docs/build/packages), nest the configuration directly under the `snapshots` key: +The following examples demonstrate how to configure snapshots using the `dbt_project.yml` file, a `config` block within a snapshot, and a `.yml` file. - - -```yml - -snapshots: - +unique_key: id -``` - - +- #### Apply configurations to all snapshots + To apply a configuration to all snapshots, including those in any installed [packages](/docs/build/packages), nest the configuration directly under the `snapshots` key: + -#### Apply configurations to all snapshots in your project -To apply a configuration to all snapshots in your project only (for example, _excluding_ any snapshots in installed packages), provide your project name as part of the resource path. + ```yml -For a project named `jaffle_shop`: + snapshots: + +unique_key: id + ``` - + -```yml +- #### Apply configurations to all snapshots in your project + To apply a configuration to all snapshots in your project only (for example, _excluding_ any snapshots in installed packages), provide your project name as part of the resource path. -snapshots: - jaffle_shop: - +unique_key: id -``` + For a project named `jaffle_shop`: - - -Similarly, you can use the name of an installed package to configure snapshots in that package. - -#### Apply configurations to one snapshot only - -We recommend using `config` blocks if you need to apply a configuration to one snapshot only. - - - -```sql -{% snapshot orders_snapshot %} - {{ - config( - unique_key='id', - strategy='timestamp', - updated_at='updated_at' - ) - }} - -- Pro-Tip: Use sources in snapshots! - select * from {{ source('jaffle_shop', 'orders') }} -{% endsnapshot %} -``` - - + -You can also use the full resource path (including the project name, and subdirectories) to configure an individual snapshot from your `dbt_project.yml` file. + ```yml -For a project named `jaffle_shop`, with a snapshot file within the `snapshots/postgres_app/` directory, where the snapshot is named `orders_snapshot` (as above), this would look like: - - - -```yml -snapshots: - jaffle_shop: - postgres_app: - orders_snapshot: + snapshots: + jaffle_shop: +unique_key: id - +strategy: timestamp - +updated_at: updated_at -``` - - - -You can also define some common configs in a snapshot's `config` block. We don't recommend this for a snapshot's required configuration, however. - - - -```yml -version: 2 - -snapshots: - - name: orders_snapshot - config: - persist_docs: - relation: true - columns: true -``` - - + ``` + + + + Similarly, you can use the name of an installed package to configure snapshots in that package. + +- #### Apply configurations to one snapshot only + + + We recommend using `config` blocks if you need to apply a configuration to one snapshot only. + + + + ```sql + {% snapshot orders_snapshot %} + {{ + config( + unique_key='id', + strategy='timestamp', + updated_at='updated_at' + ) + }} + -- Pro-Tip: Use sources in snapshots! + select * from {{ source('jaffle_shop', 'orders') }} + {% endsnapshot %} + ``` + + + + + + + + ```yaml + snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + unique_key: id + strategy: timestamp + updated_at: updated_at + persist_docs: + relation: true + columns: true + ``` + + Pro-tip: Use sources in snapshots: `select * from {{ source('jaffle_shop', 'orders') }}` + + + You can also use the full resource path (including the project name, and subdirectories) to configure an individual snapshot from your `dbt_project.yml` file. + + For a project named `jaffle_shop`, with a snapshot file within the `snapshots/postgres_app/` directory, where the snapshot is named `orders_snapshot` (as above), this would look like: + + + + ```yml + snapshots: + jaffle_shop: + postgres_app: + orders_snapshot: + +unique_key: id + +strategy: timestamp + +updated_at: updated_at + ``` + + + + You can also define some common configs in a snapshot's `config` block. We don't recommend this for a snapshot's required configuration, however. + + + + ```yml + version: 2 + + snapshots: + - name: orders_snapshot + config: + persist_docs: + relation: true + columns: true + ``` + + From 66f65b5191a80fddf52578aaa015631264e39fd7 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 1 Oct 2024 12:51:30 +0100 Subject: [PATCH 06/33] add properites adn configs --- website/docs/reference/snapshot-configs.md | 4 +- website/docs/reference/snapshot-properties.md | 51 ++++++++++++++++++- 2 files changed, 52 insertions(+), 3 deletions(-) diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index 2c3e6b665b1..f7005021940 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -83,7 +83,7 @@ snapshots: **Note:** Required snapshot properties _will not_ work when defined in `config` YAML blocks. We recommend that you define these in `dbt_project.yml` or a `config()` block within the snapshot `.sql` file. -For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or the [latest version of dbt Core](/docs/dbt-versions/core). +For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt v1.9 and later](/docs/dbt-versions/core).
@@ -208,7 +208,7 @@ snapshots:
- + ```yaml diff --git a/website/docs/reference/snapshot-properties.md b/website/docs/reference/snapshot-properties.md index 49769af8f6d..54c1083e4b4 100644 --- a/website/docs/reference/snapshot-properties.md +++ b/website/docs/reference/snapshot-properties.md @@ -3,12 +3,60 @@ title: Snapshot properties description: "Read this guide to learn about using source properties in dbt." --- + + +In Versionless and dbt v1.9 and later, snapshots are defined and configured in YAML files within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). Snapshot properties are declared within these YAML files, allowing you to define both the snapshot configurations and properties in one place. + + + + + Snapshots properties can be declared in `.yml` files in: -- your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)) +- your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). - your `models/` directory (as defined by the [`model-paths` config](/reference/project-configs/model-paths)) + + We recommend that you put them in the `snapshots/` directory. You can name these files `whatever_you_want.yml`, and nest them arbitrarily deeply in subfolders within the `snapshots/` or `models/` directory. + + + + +```yml +version: 2 + +snapshots: + - name: + [description](/reference/resource-properties/description): + [meta](/reference/resource-configs/meta): {} + [docs](/reference/resource-configs/docs): + show: true | false + node_color: # Use name (such as node_color: purple) or hex code with quotes (such as node_color: "#cd7f32") + [config](/reference/resource-properties/config): + [](/reference/snapshot-configs): + [tests](/reference/resource-properties/data-tests): + - + - ... + columns: + - name: + [description](/reference/resource-properties/description): + [meta](/reference/resource-configs/meta): {} + [quote](/reference/resource-properties/quote): true | false + [tags](/reference/resource-configs/tags): [] + [tests](/reference/resource-properties/data-tests): + - + - ... # declare additional tests + - ... # declare properties of additional columns + + - name: ... # declare properties of additional snapshots + +``` + + + + + ```yml @@ -41,3 +89,4 @@ snapshots: ``` + From c6996093fc272cb1d969399d2055e1018fbc64fc Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 1 Oct 2024 14:17:28 +0100 Subject: [PATCH 07/33] update all content under snapshots --- .../docs/reference/configs-and-properties.md | 11 +- .../reference/resource-configs/check_cols.md | 76 ++++++++++++- .../invalidate_hard_deletes.md | 48 ++++++++ .../resource-configs/pre-hook-post-hook.md | 6 +- .../resource-configs/snapshot_name.md | 51 ++++++++- .../reference/resource-configs/strategy.md | 105 ++++++++++++++++++ .../reference/resource-configs/unique_key.md | 101 ++++++++++++++++- .../reference/resource-configs/updated_at.md | 89 ++++++++++++++- website/docs/reference/snapshot-configs.md | 2 +- 9 files changed, 482 insertions(+), 7 deletions(-) diff --git a/website/docs/reference/configs-and-properties.md b/website/docs/reference/configs-and-properties.md index 20d762b7462..b3f23584a4a 100644 --- a/website/docs/reference/configs-and-properties.md +++ b/website/docs/reference/configs-and-properties.md @@ -26,9 +26,18 @@ Whereas you can use **configurations** to: Depending on the resource type, configurations can be defined in the dbt project and also in an installed package by: + + +1. Using a [`config` property](/reference/resource-properties/config) in a `.yml` file in the `models/`, `snapshots/`, or `tests/` directory +2. From the [`dbt_project.yml` file](dbt_project.yml), under the corresponding resource key (`models:`, `snapshots:`, `tests:`, etc) + + + + 1. Using a [`config()` Jinja macro](/reference/dbt-jinja-functions/config) within a `model`, `snapshot`, or `test` SQL file -2. Using a [`config` property](/reference/resource-properties/config) in a `.yml` file +2. Using a [`config` property](/reference/resource-properties/config) in a `.yml` file in the `models/`, `snapshots/`, or `tests/` directory. 3. From the [`dbt_project.yml` file](dbt_project.yml), under the corresponding resource key (`models:`, `snapshots:`, `tests:`, etc) + ### Config inheritance diff --git a/website/docs/reference/resource-configs/check_cols.md b/website/docs/reference/resource-configs/check_cols.md index bd187409379..b8e7ae8398f 100644 --- a/website/docs/reference/resource-configs/check_cols.md +++ b/website/docs/reference/resource-configs/check_cols.md @@ -3,6 +3,31 @@ resource_types: [snapshots] description: "Read this guide to understand the check_cols configuration in dbt." datatype: "[column_name] | all" --- + + + + + ```yml + snapshots: + - name: snapshot_name + relation: source('jaffle_shop', 'orders') + config: + schema: string + unique_key: column_name_or_expression + strategy: check + check_cols: + - column_name + ``` + + + + + + +:::info Use the latest snapshot syntax + +In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). +::: ```jinja2 @@ -14,7 +39,7 @@ datatype: "[column_name] | all" ``` - + @@ -42,6 +67,30 @@ No default is provided. ### Check a list of columns for changes + + + + +```yaml +snapshots: + - name: orders_snapshot_check + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: check + check_cols: + - status + - is_cancelled +``` + + +To select from this snapshot in a downstream model: `select * from {{ source('jaffle_shop', 'orders') }}` + + + + + ```sql {% snapshot orders_snapshot_check %} @@ -58,8 +107,32 @@ No default is provided. {% endsnapshot %} ``` + + ### Check all columns for changes + + + + +```yaml +snapshots: + - name: orders_snapshot_check + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: check + check_cols: + - all + ``` + + +To select from this snapshot in a downstream model: `select * from {{ source('jaffle_shop', 'orders') }}` + + + + ```sql {% snapshot orders_snapshot_check %} @@ -75,3 +148,4 @@ No default is provided. {% endsnapshot %} ``` + diff --git a/website/docs/reference/resource-configs/invalidate_hard_deletes.md b/website/docs/reference/resource-configs/invalidate_hard_deletes.md index ba5b37c5d71..94fa40ade9d 100644 --- a/website/docs/reference/resource-configs/invalidate_hard_deletes.md +++ b/website/docs/reference/resource-configs/invalidate_hard_deletes.md @@ -4,6 +4,32 @@ description: "Invalidate_hard_deletes - Read this in-depth guide to learn about datatype: column_name --- + + + + + +```yaml +snapshots: + - name: snapshot + relation: source('my_source', 'my_table') + [config](/reference/snapshot-configs): + strategy: timestamp + invalidate_hard_deletes: true | false +``` + + + + + + + + +:::info Use the latest snapshot syntax + +In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). +::: + ```jinja2 @@ -17,6 +43,7 @@ datatype: column_name ``` + @@ -39,6 +66,26 @@ By default the feature is disabled. ## Example + + + +```yaml +snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + database: analytics + unique_key: id + strategy: timestamp + updated_at: updated_at + invalidate_hard_deletes: true + ``` + + + + + ```sql @@ -60,3 +107,4 @@ By default the feature is disabled. ``` + diff --git a/website/docs/reference/resource-configs/pre-hook-post-hook.md b/website/docs/reference/resource-configs/pre-hook-post-hook.md index e1e7d67f02e..cde914fd639 100644 --- a/website/docs/reference/resource-configs/pre-hook-post-hook.md +++ b/website/docs/reference/resource-configs/pre-hook-post-hook.md @@ -109,6 +109,8 @@ snapshots: + + ```sql @@ -125,13 +127,15 @@ select ... ``` + ```yml snapshots: - name: [] - config: + [config](/reference/resource-properties/config): + [](/reference/snapshot-configs): [pre_hook](/reference/resource-configs/pre-hook-post-hook): | [] [post_hook](/reference/resource-configs/pre-hook-post-hook): | [] ``` diff --git a/website/docs/reference/resource-configs/snapshot_name.md b/website/docs/reference/resource-configs/snapshot_name.md index bb4826a116b..a3ce6cbd63b 100644 --- a/website/docs/reference/resource-configs/snapshot_name.md +++ b/website/docs/reference/resource-configs/snapshot_name.md @@ -2,6 +2,27 @@ description: "Snapshot-name - Read this in-depth guide to learn about configurations in dbt." --- + + + +```yaml +snapshots: + - name: snapshot_name + relation: source('my_source', 'my_table') + config: + schema: string + database: string + unique_key: column_name_or_expression + strategy: timestamp | check + updated_at: column_name # Required if strategy is 'timestamp' + +``` + + + + + + ```jinja2 @@ -13,9 +34,16 @@ description: "Snapshot-name - Read this in-depth guide to learn about configurat +:::info Use the latest snapshot syntax + +In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). +::: + + + ## Description -The name of a snapshot, as defined in the `{% snapshot %}` block header. This name is used when selecting from a snapshot using the [`ref` function](/reference/dbt-jinja-functions/ref) +The name of a snapshot, which is used when selecting from a snapshot using the [`ref` function](/reference/dbt-jinja-functions/ref) This name must not conflict with the name of any other "refable" resource (models, seeds, other snapshots) defined in this project or package. @@ -24,6 +52,26 @@ The name does not need to match the file name. As a result, snapshot filenames d ## Examples ### Name a snapshot `order_snapshot` + + + + +```yaml +snapshots: + - name: order_snapshot + relation: source('my_source', 'my_table') + config: + schema: string + database: string + unique_key: column_name_or_expression + strategy: timestamp | check + updated_at: column_name # Required if strategy is 'timestamp' +``` + + + + + ```jinja2 @@ -35,6 +83,7 @@ The name does not need to match the file name. As a result, snapshot filenames d + To select from this snapshot in a downstream model: diff --git a/website/docs/reference/resource-configs/strategy.md b/website/docs/reference/resource-configs/strategy.md index b67feb64fbd..f55b29703f9 100644 --- a/website/docs/reference/resource-configs/strategy.md +++ b/website/docs/reference/resource-configs/strategy.md @@ -4,6 +4,14 @@ description: "Strategy - Read this in-depth guide to learn about configurations datatype: timestamp | check --- + + +:::info Use the latest snapshot syntax + +In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). +::: + + + + + + + ```yaml + snapshots: + - [name: snapshot_name](/reference/resource-configs/snapshot_name): + relation: source('my_source', 'my_table') + config: + strategy: timestamp + updated_at: column_name + ``` + + + + + ```jinja2 @@ -30,6 +55,7 @@ select ... ``` + @@ -47,6 +73,23 @@ snapshots: + + + + + ```yaml + snapshots: + - [name: snapshot_name](/reference/resource-configs/snapshot_name): + relation: source('my_source', 'my_table') + config: + strategy: check + check_cols: + - [column_name] | "all" + ``` + + + + ```jinja2 @@ -62,6 +105,7 @@ snapshots: ``` + @@ -88,7 +132,25 @@ This is a **required configuration**. There is no default value. ## Examples ### Use the timestamp strategy + + + +```yaml +snapshots: + - name: orders_snapshot_timestamp + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + strategy: timestamp + unique_key: id + updated_at: updated_at + +``` + + + + ```sql @@ -109,10 +171,33 @@ This is a **required configuration**. There is no default value. ``` + ### Use the check strategy + + + +```yaml +# snapshots/check_example.yml +snapshots: + - name: orders_snapshot_check + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: check + check_cols: + - status + - is_cancelled + +``` + + + + + ```sql {% snapshot orders_snapshot_check %} @@ -129,6 +214,7 @@ This is a **required configuration**. There is no default value. {% endsnapshot %} ``` + ### Advanced: define and use custom snapshot strategy Behind the scenes, snapshot strategies are implemented as macros, named `snapshot__strategy` @@ -140,6 +226,24 @@ It's possible to implement your own snapshot strategy by adding a macro with the 1. Create a macro named `snapshot_timestamp_with_deletes_strategy`. Use the existing code as a guide and adjust as needed. 2. Use this strategy via the `strategy` configuration: + + + +```yaml +snapshots: + - name: my_custom_snapshot + relation: source('my_source', 'my_table') + config: + strategy: timestamp_with_deletes + updated_at: updated_at_column + unique_key: id + schema: snapshots +``` + + + + + ```jinja2 @@ -155,3 +259,4 @@ It's possible to implement your own snapshot strategy by adding a macro with the ``` + diff --git a/website/docs/reference/resource-configs/unique_key.md b/website/docs/reference/resource-configs/unique_key.md index 9ad3417fd5e..ac2e08ec61a 100644 --- a/website/docs/reference/resource-configs/unique_key.md +++ b/website/docs/reference/resource-configs/unique_key.md @@ -4,6 +4,30 @@ description: "Unique_key - Read this in-depth guide to learn about configuration datatype: column_name_or_expression --- + + + + + +```yaml +snapshots: + - name: orders_snapshot + relation: source('my_source', 'my_table') + [config](/reference/snapshot-configs): + unique_key: id + +``` + + + + + + +:::info Use the latest snapshot syntax + +In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). +::: + ```jinja2 @@ -12,8 +36,8 @@ datatype: column_name_or_expression ) }} ``` - + @@ -29,6 +53,8 @@ snapshots: ## Description A column name or expression that is unique for the inputs of a snapshot. dbt uses this to match records between a result set and an existing snapshot, so that changes can be captured correctly. +In Versionless and dbt v1.9 and later, [snapshots](/docs/build/snapshots) are defined and configured in YAML files within your `snapshots/` directory. The `unique_key` is specified within the `config` block of your snapshot YAML file. + :::caution Providing a non-unique key will result in unexpected snapshot results. dbt **will not** test the uniqueness of this key, consider [testing](/blog/primary-key-testing#how-to-test-primary-keys-with-dbt) the source data to ensure that this key is indeed unique. @@ -41,6 +67,26 @@ This is a **required parameter**. No default is provided. ## Examples ### Use an `id` column as a unique key + + + + + +```yaml +snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + +``` + + + + ```jinja2 @@ -55,7 +101,9 @@ This is a **required parameter**. No default is provided. You can also write this in yaml. This might be a good idea if multiple snapshots share the same `unique_key` (though we prefer to apply this configuration in a config block, as above). + +You can also specify configurations in your `dbt_project.yml` file if multiple snapshots share the same `unique_key`: ```yml @@ -70,6 +118,25 @@ snapshots: ### Use a combination of two columns as a unique key This configuration accepts a valid column expression. As such, you can concatenate two columns together as a unique key if required. It's a good idea to use a separator (e.g. `'-'`) to ensure uniqueness. + + + + +```yaml +snapshots: + - name: transaction_items_snapshot + relation: source('erp', 'transactions') + config: + schema: snapshots + unique_key: "transaction_id || '-' || line_item_id" + strategy: timestamp + updated_at: updated_at + +``` + + + + @@ -93,10 +160,41 @@ from {{ source('erp', 'transactions') }} ``` + Though, it's probably a better idea to construct this column in your query and use that as the `unique_key`: + + + + +```yaml +snapshots: + - name: transaction_items_snapshot + relation: {{ ref('transaction_items_ephemeral') }} + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + +# models/transaction_items_ephemeral.sql +{{ config(materialized='ephemeral') }} + +select + transaction_id || '-' || line_item_id as id, + * +from {{ source('erp', 'transactions') }} + +``` + + + +In this example, we create an ephemeral model `transaction_items_ephemeral` that creates the unique key id, and then references it in our snapshot. + + + ```jinja2 @@ -121,3 +219,4 @@ from {{ source('erp', 'transactions') }} ``` + diff --git a/website/docs/reference/resource-configs/updated_at.md b/website/docs/reference/resource-configs/updated_at.md index c61b04264be..9c15e99c512 100644 --- a/website/docs/reference/resource-configs/updated_at.md +++ b/website/docs/reference/resource-configs/updated_at.md @@ -3,6 +3,30 @@ resource_types: [snapshots] description: "Updated_at - Read this in-depth guide to learn about configurations in dbt." datatype: column_name --- + + + + + + +```yaml +snapshots: + - name: snapshot + relation: source('my_source', 'my_table') + [config](/reference/snapshot-configs): + strategy: timestamp + updated_at: column_name +``` + + + + + +:::info Use the latest snapshot syntax + +In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). +::: + ```jinja2 @@ -14,6 +38,7 @@ datatype: column_name ``` + @@ -37,7 +62,6 @@ You will get a warning if the data type of the `updated_at` column does not matc - ## Description A column within the results of your snapshot query that represents when the record row was last updated. @@ -50,6 +74,25 @@ No default is provided. ## Examples ### Use a column name `updated_at` + + + + +```yaml +snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + +``` + + + + ```sql @@ -72,12 +115,55 @@ select * from {{ source('jaffle_shop', 'orders') }} ``` + ### Coalesce two columns to create a reliable `updated_at` column Consider a data source that only has an `updated_at` column filled in when a record is updated (so a `null` value indicates that the record hasn't been updated after it was created). Since the `updated_at` configuration only takes a column name, rather than an expression, you should update your snapshot query to include the coalesced column. + + + +1. Create an staging model to perform the transformation. + In your `models/` directory, create a SQL file that configures an staging model to coalesce the `updated_at` and `created_at` columns into a new column `updated_at_for_snapshot`. + + + + ```sql + select * coalesce (updated_at, created_at) as updated_at_for_snapshot + from {{ source('jaffle_shop', 'orders') }} + + ``` + + +2. Define the snapshot configuration in a YAML file. + In your `snapshots/` directory, create a YAML file that defines your snapshot and references the `updated_at_for_snapshot` staging model you just created. + + + + ```yaml + snapshots: + - name: orders_snapshot + relation: {{ ref('staging_orders') }} + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at_for_snapshot + + ``` + + +3. Run `dbt snapshot` to execute the snapshot. + +Alternatively, you can also create an ephemeral model to performs the required transformations. Then, you reference this model in your snapshot's `relation` key. + + + + + + ```sql @@ -104,3 +190,4 @@ from {{ source('jaffle_shop', 'orders') }} ``` + diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index f7005021940..bda6da5a26e 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -323,7 +323,7 @@ The following examples demonstrate how to configure snapshots using the `dbt_pro - + ```yaml snapshots: From b81da1590078beab6235372a4521e04707cfa67b Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 1 Oct 2024 14:25:42 +0100 Subject: [PATCH 08/33] Update check_cols.md --- website/docs/reference/resource-configs/check_cols.md | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/website/docs/reference/resource-configs/check_cols.md b/website/docs/reference/resource-configs/check_cols.md index b8e7ae8398f..44aeef96c37 100644 --- a/website/docs/reference/resource-configs/check_cols.md +++ b/website/docs/reference/resource-configs/check_cols.md @@ -22,12 +22,6 @@ datatype: "[column_name] | all" - - -:::info Use the latest snapshot syntax - -In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). -::: ```jinja2 @@ -39,7 +33,6 @@ In Versionless and dbt v1.9 and later, snapshots are defined in an updated synta ``` - @@ -90,6 +83,10 @@ To select from this snapshot in a downstream model: `select * from {{ source('ja +:::info Use the latest snapshot syntax + +In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). +::: ```sql {% snapshot orders_snapshot_check %} From dc6b25e1bb6e5c8d543e84e16d629f24c3199c35 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 1 Oct 2024 14:35:50 +0100 Subject: [PATCH 09/33] Update check_cols.md --- .../docs/reference/resource-configs/check_cols.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/website/docs/reference/resource-configs/check_cols.md b/website/docs/reference/resource-configs/check_cols.md index 44aeef96c37..8a44c58a26a 100644 --- a/website/docs/reference/resource-configs/check_cols.md +++ b/website/docs/reference/resource-configs/check_cols.md @@ -22,6 +22,13 @@ datatype: "[column_name] | all" + ```jinja2 @@ -33,6 +40,7 @@ datatype: "[column_name] | all" ``` + @@ -83,11 +91,6 @@ To select from this snapshot in a downstream model: `select * from {{ source('ja -:::info Use the latest snapshot syntax - -In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). -::: - ```sql {% snapshot orders_snapshot_check %} From 966cf41cd7ed103216b54a3b5f0262d696b3e215 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 1 Oct 2024 14:45:11 +0100 Subject: [PATCH 10/33] Update check_cols.md --- website/docs/reference/resource-configs/check_cols.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/check_cols.md b/website/docs/reference/resource-configs/check_cols.md index 8a44c58a26a..f1fa75d6a46 100644 --- a/website/docs/reference/resource-configs/check_cols.md +++ b/website/docs/reference/resource-configs/check_cols.md @@ -22,7 +22,7 @@ datatype: "[column_name] | all" - :::info Use the latest snapshot syntax From 78706d6ffc06534612a7de5b209927f2a49f7563 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 1 Oct 2024 14:53:58 +0100 Subject: [PATCH 11/33] Update pre-hook-post-hook.md fix spacing --- website/docs/reference/resource-configs/pre-hook-post-hook.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/pre-hook-post-hook.md b/website/docs/reference/resource-configs/pre-hook-post-hook.md index cde914fd639..36964538191 100644 --- a/website/docs/reference/resource-configs/pre-hook-post-hook.md +++ b/website/docs/reference/resource-configs/pre-hook-post-hook.md @@ -134,7 +134,7 @@ select ... ```yml snapshots: - name: [] - [config](/reference/resource-properties/config): + [config](/reference/resource-properties/config): [](/reference/snapshot-configs): [pre_hook](/reference/resource-configs/pre-hook-post-hook): | [] [post_hook](/reference/resource-configs/pre-hook-post-hook): | [] From dcba81a45a86e1ee8ac904e022bf31c64d7900c8 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 1 Oct 2024 15:46:34 +0100 Subject: [PATCH 12/33] Update website/docs/docs/dbt-versions/release-notes.md --- website/docs/docs/dbt-versions/release-notes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-versions/release-notes.md b/website/docs/docs/dbt-versions/release-notes.md index cef9020b579..fb81a4777de 100644 --- a/website/docs/docs/dbt-versions/release-notes.md +++ b/website/docs/docs/dbt-versions/release-notes.md @@ -21,7 +21,7 @@ Release notes are grouped by month for both multi-tenant and virtual private clo ## September 2024 - **New**: In dbt Cloud Versionless, [Snapshots](/docs/build/snapshots) have been updated to use YAML configuration files instead of SQL snapshot blocks. This new feature simplifies snapshot management and improves performance, and will soon be released in dbt Core 1.9. - - Who does this affect? New user on Versionless can define snapshots using the new YAML specification. Users upgrading to Versionless who use snapshots need to migrate their snapshot definitions to YAML. + - Who does this affect? New user on Versionless can define snapshots using the new YAML specification. Users upgrading to Versionless who use snapshots can keep their existing configuration or can choose to migrate their snapshot definitions to YAML. - Users on dbt 1.8 and earlier: No action needed; existing snapshots will continue to work as before. However, we recommend upgrading to Versionless to take advantage of the new snapshot features. - **Enhancement**: You can now run [Semantic Layer commands](/docs/build/metricflow-commands) commands in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). The supported commands are `dbt sl list`, `dbt sl list metrics`, `dbt sl list dimension-values`, `dbt sl list saved-queries`, `dbt sl query`, `dbt sl list dimensions`, `dbt sl list entities`, and `dbt sl validate`. - **New**: Microsoft Excel, a dbt Semantic Layer integration, is now generally available. The integration allows you to connect to Microsoft Excel to query metrics and collaborate with your team. Available for [Excel Desktop](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationId=4132ecd1-425d-982d-efb4-de94ebc83f26) or [Excel Online](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationid=4132ecd1-425d-982d-efb4-de94ebc83f26&isWac=True). For more information, refer to [Microsoft Excel](/docs/cloud-integrations/semantic-layer/excel). From 396c1709cb70dae8265eeea5ba350fa47c519cd1 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 1 Oct 2024 15:46:54 +0100 Subject: [PATCH 13/33] Update website/docs/docs/dbt-versions/release-notes.md --- website/docs/docs/dbt-versions/release-notes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-versions/release-notes.md b/website/docs/docs/dbt-versions/release-notes.md index fb81a4777de..d9f71121c8e 100644 --- a/website/docs/docs/dbt-versions/release-notes.md +++ b/website/docs/docs/dbt-versions/release-notes.md @@ -22,7 +22,7 @@ Release notes are grouped by month for both multi-tenant and virtual private clo - **New**: In dbt Cloud Versionless, [Snapshots](/docs/build/snapshots) have been updated to use YAML configuration files instead of SQL snapshot blocks. This new feature simplifies snapshot management and improves performance, and will soon be released in dbt Core 1.9. - Who does this affect? New user on Versionless can define snapshots using the new YAML specification. Users upgrading to Versionless who use snapshots can keep their existing configuration or can choose to migrate their snapshot definitions to YAML. - - Users on dbt 1.8 and earlier: No action needed; existing snapshots will continue to work as before. However, we recommend upgrading to Versionless to take advantage of the new snapshot features. + - Users on dbt 1.8 and earlier: No action is needed; existing snapshots will continue to work as before. However, we recommend upgrading to Versionless to take advantage of the new snapshot features. - **Enhancement**: You can now run [Semantic Layer commands](/docs/build/metricflow-commands) commands in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). The supported commands are `dbt sl list`, `dbt sl list metrics`, `dbt sl list dimension-values`, `dbt sl list saved-queries`, `dbt sl query`, `dbt sl list dimensions`, `dbt sl list entities`, and `dbt sl validate`. - **New**: Microsoft Excel, a dbt Semantic Layer integration, is now generally available. The integration allows you to connect to Microsoft Excel to query metrics and collaborate with your team. Available for [Excel Desktop](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationId=4132ecd1-425d-982d-efb4-de94ebc83f26) or [Excel Online](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationid=4132ecd1-425d-982d-efb4-de94ebc83f26&isWac=True). For more information, refer to [Microsoft Excel](/docs/cloud-integrations/semantic-layer/excel). - **New**: [Data health tile](/docs/collaborate/data-tile) is now generally available in dbt Explorer. Data health tiles provide a quick at-a-glance view of your data quality, highlighting potential issues in your data. You can embed these tiles in your dashboards to quickly identify and address data quality issues in your dbt project. From d93d65a8115ae52ca9e21f462d939fee8497d2a9 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 1 Oct 2024 18:02:09 +0100 Subject: [PATCH 14/33] Update website/docs/reference/snapshot-configs.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/snapshot-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index bda6da5a26e..3cff857d1b6 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -255,7 +255,7 @@ snapshots: ## Configuring snapshots -Snapshots can be configured in one of three ways: +Snapshots can be configured in multiple ways: 1. Defined in YAML files, typically in your snapshots directory (available in [Versionless](/docs/dbt-versions/versionless-cloud) or and dbt Core v1.9 and higher). 2. Using a `config` block within a snapshot From 5ef956ada2e50fce98d145e880336c857cffd970 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 1 Oct 2024 18:04:08 +0100 Subject: [PATCH 15/33] Update website/docs/reference/snapshot-configs.md --- website/docs/reference/snapshot-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index 3cff857d1b6..179c1d52aed 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -257,7 +257,7 @@ snapshots: ## Configuring snapshots Snapshots can be configured in multiple ways: -1. Defined in YAML files, typically in your snapshots directory (available in [Versionless](/docs/dbt-versions/versionless-cloud) or and dbt Core v1.9 and higher). +1. Defined in YAML files, typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [Versionless](/docs/dbt-versions/versionless-cloud) or and dbt Core v1.9 and higher). 2. Using a `config` block within a snapshot 3. Using a `config` [resource property](/reference/model-properties) in a `.yml` file 4. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. From 7f600bac72660d0e948a43204d01599c5a288fc3 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 1 Oct 2024 20:15:53 +0100 Subject: [PATCH 16/33] update and triple check --- .../docs/reference/configs-and-properties.md | 4 +- .../project-configs/snapshot-paths.md | 11 ++- .../reference/resource-configs/check_cols.md | 2 +- .../reference/resource-configs/strategy.md | 1 - website/docs/reference/snapshot-configs.md | 73 ++++++++++++------- website/docs/reference/snapshot-properties.md | 2 + 6 files changed, 60 insertions(+), 33 deletions(-) diff --git a/website/docs/reference/configs-and-properties.md b/website/docs/reference/configs-and-properties.md index b3f23584a4a..20892898a51 100644 --- a/website/docs/reference/configs-and-properties.md +++ b/website/docs/reference/configs-and-properties.md @@ -28,14 +28,14 @@ Depending on the resource type, configurations can be defined in the dbt project -1. Using a [`config` property](/reference/resource-properties/config) in a `.yml` file in the `models/`, `snapshots/`, or `tests/` directory +1. Using a [`config` property](/reference/resource-properties/config) in a `.yml` file in the `models/`, `snapshots/`, `seeds/`, `analyses`, or `tests/` directory 2. From the [`dbt_project.yml` file](dbt_project.yml), under the corresponding resource key (`models:`, `snapshots:`, `tests:`, etc) 1. Using a [`config()` Jinja macro](/reference/dbt-jinja-functions/config) within a `model`, `snapshot`, or `test` SQL file -2. Using a [`config` property](/reference/resource-properties/config) in a `.yml` file in the `models/`, `snapshots/`, or `tests/` directory. +2. Using a [`config` property](/reference/resource-properties/config) in a `.yml` file in the `models/`, `snapshots/`, `seeds/`, `analyses/`, or `tests/` directory. 3. From the [`dbt_project.yml` file](dbt_project.yml), under the corresponding resource key (`models:`, `snapshots:`, `tests:`, etc) diff --git a/website/docs/reference/project-configs/snapshot-paths.md b/website/docs/reference/project-configs/snapshot-paths.md index 81b2759609d..8319833f1e6 100644 --- a/website/docs/reference/project-configs/snapshot-paths.md +++ b/website/docs/reference/project-configs/snapshot-paths.md @@ -12,7 +12,16 @@ snapshot-paths: [directorypath] ## Definition -Optionally specify a custom list of directories where [snapshots](/docs/build/snapshots) are located. Note that you cannot co-locate models and snapshots. + +Optionally specify a custom list of directories where [snapshots](/docs/build/snapshots) are located. + + +In [Versionless](/docs/dbt-versions/versionless-cloud) and on dbt v1.9 and higher, you can co-locate your snapshots with models if they are [defined using the latest YAML syntax](/docs/build/snapshots). + + + +Note that you cannot co-locate models and snapshots. However, in [Versionless](/docs/dbt-versions/versionless-cloud) and on dbt v1.9 and higher, you can co-locate your snapshots with models if they are [defined using the latest YAML syntax](/docs/build/snapshots). + ## Default By default, dbt will search for snapshots in the `snapshots` directory, i.e. `snapshot-paths: ["snapshots"]` diff --git a/website/docs/reference/resource-configs/check_cols.md b/website/docs/reference/resource-configs/check_cols.md index f1fa75d6a46..b9be47fc2f7 100644 --- a/website/docs/reference/resource-configs/check_cols.md +++ b/website/docs/reference/resource-configs/check_cols.md @@ -10,7 +10,7 @@ datatype: "[column_name] | all" ```yml snapshots: - name: snapshot_name - relation: source('jaffle_shop', 'orders') + relation: source('my_source', 'my_table') config: schema: string unique_key: column_name_or_expression diff --git a/website/docs/reference/resource-configs/strategy.md b/website/docs/reference/resource-configs/strategy.md index f55b29703f9..1aa06a29fab 100644 --- a/website/docs/reference/resource-configs/strategy.md +++ b/website/docs/reference/resource-configs/strategy.md @@ -237,7 +237,6 @@ snapshots: strategy: timestamp_with_deletes updated_at: updated_at_column unique_key: id - schema: snapshots ``` diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index 179c1d52aed..98843c01dac 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -23,15 +23,24 @@ Parts of a snapshot: + + +:::info Use the latest snapshot syntax + +In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). +::: + + + @@ -83,7 +92,7 @@ snapshots: **Note:** Required snapshot properties _will not_ work when defined in `config` YAML blocks. We recommend that you define these in `dbt_project.yml` or a `config()` block within the snapshot `.sql` file. -For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt v1.9 and later](/docs/dbt-versions/core). +For faster and more efficient management, consider the [updated snapshot YAML syntax](/docs/build/snapshots), [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt v1.9 and later](/docs/dbt-versions/core). @@ -111,33 +120,21 @@ snapshots: - + - + -```jinja - -{{ config( - [target_schema](/reference/resource-configs/target_schema)="", - [target_database](/reference/resource-configs/target_database)="", - [unique_key](/reference/resource-configs/unique_key)="", - [strategy](/reference/resource-configs/strategy)="timestamp" | "check", - [updated_at](/reference/resource-configs/updated_at)="", - [check_cols](/reference/resource-configs/check_cols)=[""] | "all" -) }} - -``` +Configurations can be applied to snapshots using the more performant [YAML syntax](/docs/build/snapshots), aavilable in Versionless and dbt v1.9 and higher, in the the `snapshot` directory file. - + ```jinja {{ config( - [schema](/reference/resource-configs/schema)="", - [database](/reference/resource-configs/database)="", - [alias](/reference/resource-configs/alias)="", + [target_schema](/reference/resource-configs/target_schema)="", + [target_database](/reference/resource-configs/target_database)="", [unique_key](/reference/resource-configs/unique_key)="", [strategy](/reference/resource-configs/strategy)="timestamp" | "check", [updated_at](/reference/resource-configs/updated_at)="", @@ -145,7 +142,6 @@ snapshots: ) }} ``` - @@ -162,7 +158,7 @@ snapshots: defaultValue="project-yaml" values={[ { label: 'Project file', value: 'project-yaml', }, - { label: 'Property file', value: 'property-yaml', }, + { label: 'YAML file', value: 'property-yaml', }, { label: 'Config block', value: 'config', }, ] }> @@ -188,6 +184,7 @@ snapshots: + ```yaml @@ -209,6 +206,7 @@ snapshots: + ```yaml @@ -234,6 +232,13 @@ snapshots: + + +Configurations can be applied to snapshots using the more performant [YAML syntax](/docs/build/snapshots), aavilable in Versionless and dbt v1.9 and higher, in the the `snapshot` directory file. + + + + ```jinja @@ -249,18 +254,30 @@ snapshots: ``` + + - ## Configuring snapshots Snapshots can be configured in multiple ways: -1. Defined in YAML files, typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [Versionless](/docs/dbt-versions/versionless-cloud) or and dbt Core v1.9 and higher). + + +1. Defined in YAML files using a `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [Versionless](/docs/dbt-versions/versionless-cloud) or and dbt Core v1.9 and higher). +2. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. + + + + +1. Defined in YAML files using a `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [Versionless](/docs/dbt-versions/versionless-cloud) or and dbt Core v1.9 and higher). 2. Using a `config` block within a snapshot -3. Using a `config` [resource property](/reference/model-properties) in a `.yml` file -4. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. +3. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. + +Note that in Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). + + Snapshot configurations are applied hierarchically in the order above. @@ -301,7 +318,7 @@ The following examples demonstrate how to configure snapshots using the `dbt_pro - #### Apply configurations to one snapshot only - We recommend using `config` blocks if you need to apply a configuration to one snapshot only. + Use `config` blocks if you need to apply a configuration to one snapshot only. diff --git a/website/docs/reference/snapshot-properties.md b/website/docs/reference/snapshot-properties.md index 54c1083e4b4..d940a9f344c 100644 --- a/website/docs/reference/snapshot-properties.md +++ b/website/docs/reference/snapshot-properties.md @@ -15,6 +15,8 @@ Snapshots properties can be declared in `.yml` files in: - your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). - your `models/` directory (as defined by the [`model-paths` config](/reference/project-configs/model-paths)) +Note, in Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). + We recommend that you put them in the `snapshots/` directory. You can name these files `whatever_you_want.yml`, and nest them arbitrarily deeply in subfolders within the `snapshots/` or `models/` directory. From b9c2937968ceb053920f76d1a89d8df12ea22240 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 1 Oct 2024 20:20:03 +0100 Subject: [PATCH 17/33] update --- website/docs/reference/resource-configs/pre-hook-post-hook.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/pre-hook-post-hook.md b/website/docs/reference/resource-configs/pre-hook-post-hook.md index 36964538191..4e8e470be54 100644 --- a/website/docs/reference/resource-configs/pre-hook-post-hook.md +++ b/website/docs/reference/resource-configs/pre-hook-post-hook.md @@ -129,7 +129,7 @@ select ... - + ```yml snapshots: From 73a13b045e0d52c1f0db1505e317f5845019f094 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 1 Oct 2024 20:27:10 +0100 Subject: [PATCH 18/33] update ref to downstream --- website/docs/reference/resource-configs/check_cols.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/docs/reference/resource-configs/check_cols.md b/website/docs/reference/resource-configs/check_cols.md index b9be47fc2f7..9230a0b10ed 100644 --- a/website/docs/reference/resource-configs/check_cols.md +++ b/website/docs/reference/resource-configs/check_cols.md @@ -86,7 +86,7 @@ snapshots: ``` -To select from this snapshot in a downstream model: `select * from {{ source('jaffle_shop', 'orders') }}` +To select from this snapshot in a downstream model: `select * from {{ ref('orders_snapshot_check') }}` @@ -128,7 +128,7 @@ snapshots: ``` -To select from this snapshot in a downstream model: `select * from {{ source('jaffle_shop', 'orders') }}` +To select from this snapshot in a downstream model: `select * from {{{ ref('orders_snapshot_check') }}` From 6de9454295fa73ddf2567ef46e554504dc67a90d Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 1 Oct 2024 21:42:33 +0100 Subject: [PATCH 19/33] Update release-notes.md --- website/docs/docs/dbt-versions/release-notes.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-versions/release-notes.md b/website/docs/docs/dbt-versions/release-notes.md index d9f71121c8e..5c160ac5d93 100644 --- a/website/docs/docs/dbt-versions/release-notes.md +++ b/website/docs/docs/dbt-versions/release-notes.md @@ -18,11 +18,14 @@ Release notes are grouped by month for both multi-tenant and virtual private clo \* The official release date for this new format of release notes is May 15th, 2024. Historical release notes for prior dates may not reflect all available features released earlier this year or their tenancy availability. -## September 2024 +## October 2024 - **New**: In dbt Cloud Versionless, [Snapshots](/docs/build/snapshots) have been updated to use YAML configuration files instead of SQL snapshot blocks. This new feature simplifies snapshot management and improves performance, and will soon be released in dbt Core 1.9. - Who does this affect? New user on Versionless can define snapshots using the new YAML specification. Users upgrading to Versionless who use snapshots can keep their existing configuration or can choose to migrate their snapshot definitions to YAML. - Users on dbt 1.8 and earlier: No action is needed; existing snapshots will continue to work as before. However, we recommend upgrading to Versionless to take advantage of the new snapshot features. + +## September 2024 + - **Enhancement**: You can now run [Semantic Layer commands](/docs/build/metricflow-commands) commands in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). The supported commands are `dbt sl list`, `dbt sl list metrics`, `dbt sl list dimension-values`, `dbt sl list saved-queries`, `dbt sl query`, `dbt sl list dimensions`, `dbt sl list entities`, and `dbt sl validate`. - **New**: Microsoft Excel, a dbt Semantic Layer integration, is now generally available. The integration allows you to connect to Microsoft Excel to query metrics and collaborate with your team. Available for [Excel Desktop](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationId=4132ecd1-425d-982d-efb4-de94ebc83f26) or [Excel Online](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationid=4132ecd1-425d-982d-efb4-de94ebc83f26&isWac=True). For more information, refer to [Microsoft Excel](/docs/cloud-integrations/semantic-layer/excel). - **New**: [Data health tile](/docs/collaborate/data-tile) is now generally available in dbt Explorer. Data health tiles provide a quick at-a-glance view of your data quality, highlighting potential issues in your data. You can embed these tiles in your dashboards to quickly identify and address data quality issues in your dbt project. From 98ad8ac3f45783d2f094556ea85bd8fc641d3acb Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 3 Oct 2024 15:42:49 +0100 Subject: [PATCH 20/33] Update website/docs/reference/resource-configs/strategy.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/resource-configs/strategy.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/website/docs/reference/resource-configs/strategy.md b/website/docs/reference/resource-configs/strategy.md index 1aa06a29fab..daffb4a6eef 100644 --- a/website/docs/reference/resource-configs/strategy.md +++ b/website/docs/reference/resource-configs/strategy.md @@ -83,8 +83,7 @@ snapshots: relation: source('my_source', 'my_table') config: strategy: check - check_cols: - - [column_name] | "all" + check_cols: [column_name] | "all" ``` From b0191c6e2212fe0255fccf9981096fa161e6ae38 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 3 Oct 2024 15:43:30 +0100 Subject: [PATCH 21/33] Update website/docs/reference/snapshot-configs.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/snapshot-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index 1bd89fc3ff8..1f062a1a8f0 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -230,7 +230,7 @@ snapshots: - + Configurations can be applied to snapshots using the more performant [YAML syntax](/docs/build/snapshots), aavilable in Versionless and dbt v1.9 and higher, in the the `snapshot` directory file. From 2076f2690ebf5aa12abd2b619e061a1a1aeb0452 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 3 Oct 2024 15:44:09 +0100 Subject: [PATCH 22/33] Update website/docs/reference/resource-configs/unique_key.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/resource-configs/unique_key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/unique_key.md b/website/docs/reference/resource-configs/unique_key.md index ac2e08ec61a..c27ec2edd75 100644 --- a/website/docs/reference/resource-configs/unique_key.md +++ b/website/docs/reference/resource-configs/unique_key.md @@ -190,7 +190,7 @@ from {{ source('erp', 'transactions') }} -In this example, we create an ephemeral model `transaction_items_ephemeral` that creates the unique key id, and then references it in our snapshot. +In this example, we create an ephemeral model `transaction_items_ephemeral` that creates an `id` column that can be used as the `unique_key` our snapshot configuration. From 18826b78c44cf02dbd2580eba0d740aeea08677c Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 3 Oct 2024 15:44:44 +0100 Subject: [PATCH 23/33] Update website/docs/reference/resource-configs/updated_at.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/resource-configs/updated_at.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/updated_at.md b/website/docs/reference/resource-configs/updated_at.md index 9c15e99c512..0e6ff7c1c79 100644 --- a/website/docs/reference/resource-configs/updated_at.md +++ b/website/docs/reference/resource-configs/updated_at.md @@ -145,7 +145,7 @@ Since the `updated_at` configuration only takes a column name, rather than an ex ```yaml snapshots: - name: orders_snapshot - relation: {{ ref('staging_orders') }} + relation: ref('staging_orders') config: schema: snapshots unique_key: id From d99fb72cbb0b953a74444449f2e1460a69d39aed Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 3 Oct 2024 15:47:19 +0100 Subject: [PATCH 24/33] Update website/docs/reference/snapshot-configs.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/snapshot-configs.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index ba924422eb3..f85fad52cb3 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -383,10 +383,9 @@ The following examples demonstrate how to configure snapshots using the `dbt_pro snapshots: - name: orders_snapshot - config: - persist_docs: - relation: true - columns: true + +persist_docs: + relation: true + columns: true ``` From 23bb87a215f5ce7f2969287a99ed0f7e31b7824a Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 3 Oct 2024 15:50:16 +0100 Subject: [PATCH 25/33] Update website/docs/reference/snapshot-configs.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/snapshot-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index f85fad52cb3..6e5dcee0443 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -277,7 +277,7 @@ Note that in Versionless and dbt v1.9 and later, snapshots are defined in an upd -Snapshot configurations are applied hierarchically in the order above. +Snapshot configurations are applied hierarchically in the order above with higher taking precedence. ### Examples The following examples demonstrate how to configure snapshots using the `dbt_project.yml` file, a `config` block within a snapshot, and a `.yml` file. From 2dbdeedc5d8550f934c4dca1c1e3b1607b72e7f5 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 3 Oct 2024 15:50:56 +0100 Subject: [PATCH 26/33] Update website/docs/reference/snapshot-configs.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/snapshot-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index 6e5dcee0443..c61a94026cd 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -270,7 +270,7 @@ Snapshots can be configured in multiple ways: 1. Defined in YAML files using a `config` [resource property](/reference/model-properties), typically in your [snapshots directory](/reference/project-configs/snapshot-paths) (available in [Versionless](/docs/dbt-versions/versionless-cloud) or and dbt Core v1.9 and higher). -2. Using a `config` block within a snapshot +2. Using a `config` block within a snapshot defined in Jinja SQL 3. From the `dbt_project.yml` file, under the `snapshots:` key. To apply a configuration to a snapshot, or directory of snapshots, define the resource path as nested dictionary keys. Note that in Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). From 4882c6682733923cfff7e7d1dde4edd0fd21dbcf Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 3 Oct 2024 15:51:11 +0100 Subject: [PATCH 27/33] Update website/docs/reference/snapshot-configs.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/snapshot-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index c61a94026cd..559ddc35fac 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -232,7 +232,7 @@ snapshots: -Configurations can be applied to snapshots using the more performant [YAML syntax](/docs/build/snapshots), aavilable in Versionless and dbt v1.9 and higher, in the the `snapshot` directory file. +Configurations can be applied to snapshots using [YAML syntax](/docs/build/snapshots), available in Versionless and dbt v1.9 and higher, in the the `snapshot` directory file. From 7906a09e743202d46faa259005a618416dab9ff4 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 3 Oct 2024 15:55:48 +0100 Subject: [PATCH 28/33] Update website/docs/reference/snapshot-configs.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/snapshot-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index 559ddc35fac..ff8d522075e 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -120,7 +120,7 @@ snapshots: - + Configurations can be applied to snapshots using the more performant [YAML syntax](/docs/build/snapshots), aavilable in Versionless and dbt v1.9 and higher, in the the `snapshot` directory file. From ffc8b3f21577f72e05d3fe796182a22cecae613a Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 3 Oct 2024 15:56:05 +0100 Subject: [PATCH 29/33] Update website/docs/reference/snapshot-configs.md Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> --- website/docs/reference/snapshot-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index ff8d522075e..fed2441cc69 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -122,7 +122,7 @@ snapshots: -Configurations can be applied to snapshots using the more performant [YAML syntax](/docs/build/snapshots), aavilable in Versionless and dbt v1.9 and higher, in the the `snapshot` directory file. +Configurations can be applied to snapshots using [YAML syntax](/docs/build/snapshots), available in Versionless and dbt v1.9 and higher, in the the `snapshot` directory file. From dfa67f9fbc7d358dc7b272f17e0581bf95daedd5 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 3 Oct 2024 16:58:22 +0100 Subject: [PATCH 30/33] doug's feedback --- website/docs/reference/resource-configs/check_cols.md | 5 ++--- .../resource-configs/invalidate_hard_deletes.md | 5 ++--- .../reference/resource-configs/pre-hook-post-hook.md | 1 - .../docs/reference/resource-configs/snapshot_name.md | 5 ++--- website/docs/reference/resource-configs/strategy.md | 6 ++---- website/docs/reference/resource-configs/unique_key.md | 11 +++++++---- website/docs/reference/resource-configs/updated_at.md | 5 ++--- website/docs/reference/snapshot-configs.md | 7 ++++--- website/snippets/_snapshot-yaml-spec.md | 4 ++++ 9 files changed, 25 insertions(+), 24 deletions(-) create mode 100644 website/snippets/_snapshot-yaml-spec.md diff --git a/website/docs/reference/resource-configs/check_cols.md b/website/docs/reference/resource-configs/check_cols.md index 9230a0b10ed..f7c6b85d372 100644 --- a/website/docs/reference/resource-configs/check_cols.md +++ b/website/docs/reference/resource-configs/check_cols.md @@ -24,10 +24,9 @@ datatype: "[column_name] | all" -:::info Use the latest snapshot syntax +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; -In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). -::: + diff --git a/website/docs/reference/resource-configs/invalidate_hard_deletes.md b/website/docs/reference/resource-configs/invalidate_hard_deletes.md index 94fa40ade9d..bdaec7e33a9 100644 --- a/website/docs/reference/resource-configs/invalidate_hard_deletes.md +++ b/website/docs/reference/resource-configs/invalidate_hard_deletes.md @@ -25,10 +25,9 @@ snapshots: -:::info Use the latest snapshot syntax +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; -In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). -::: + diff --git a/website/docs/reference/resource-configs/pre-hook-post-hook.md b/website/docs/reference/resource-configs/pre-hook-post-hook.md index 4e8e470be54..ce818768134 100644 --- a/website/docs/reference/resource-configs/pre-hook-post-hook.md +++ b/website/docs/reference/resource-configs/pre-hook-post-hook.md @@ -135,7 +135,6 @@ select ... snapshots: - name: [] [config](/reference/resource-properties/config): - [](/reference/snapshot-configs): [pre_hook](/reference/resource-configs/pre-hook-post-hook): | [] [post_hook](/reference/resource-configs/pre-hook-post-hook): | [] ``` diff --git a/website/docs/reference/resource-configs/snapshot_name.md b/website/docs/reference/resource-configs/snapshot_name.md index a3ce6cbd63b..62480ac3f84 100644 --- a/website/docs/reference/resource-configs/snapshot_name.md +++ b/website/docs/reference/resource-configs/snapshot_name.md @@ -34,10 +34,9 @@ snapshots: -:::info Use the latest snapshot syntax +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; -In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). -::: + diff --git a/website/docs/reference/resource-configs/strategy.md b/website/docs/reference/resource-configs/strategy.md index daffb4a6eef..e2b2cac1c59 100644 --- a/website/docs/reference/resource-configs/strategy.md +++ b/website/docs/reference/resource-configs/strategy.md @@ -6,10 +6,9 @@ datatype: timestamp | check -:::info Use the latest snapshot syntax +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; -In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). -::: + ```yaml -# snapshots/check_example.yml snapshots: - name: orders_snapshot_check relation: source('jaffle_shop', 'orders') diff --git a/website/docs/reference/resource-configs/unique_key.md b/website/docs/reference/resource-configs/unique_key.md index c27ec2edd75..996e7148292 100644 --- a/website/docs/reference/resource-configs/unique_key.md +++ b/website/docs/reference/resource-configs/unique_key.md @@ -23,10 +23,9 @@ snapshots: -:::info Use the latest snapshot syntax +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; -In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). -::: + @@ -177,8 +176,12 @@ snapshots: unique_key: id strategy: timestamp updated_at: updated_at +``` + + + -# models/transaction_items_ephemeral.sql +```sql {{ config(materialized='ephemeral') }} select diff --git a/website/docs/reference/resource-configs/updated_at.md b/website/docs/reference/resource-configs/updated_at.md index 0e6ff7c1c79..09122859e43 100644 --- a/website/docs/reference/resource-configs/updated_at.md +++ b/website/docs/reference/resource-configs/updated_at.md @@ -22,10 +22,9 @@ snapshots: -:::info Use the latest snapshot syntax +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; -In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). -::: + diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index fed2441cc69..87063ce592e 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -20,15 +20,16 @@ Parts of a snapshot: --> ## Available configurations +### Snapshot-specific configurations -:::info Use the latest snapshot syntax +import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md'; + + -In Versionless and dbt v1.9 and later, snapshots are defined in an updated syntax using a YAML file within your `snapshots/` directory (as defined by the [`snapshot-paths` config](/reference/project-configs/snapshot-paths)). For faster and more efficient management, consider the updated snapshot YAML syntax, [available in Versionless](/docs/dbt-versions/versionless-cloud) or [dbt Core v1.9 and later](/docs/dbt-versions/core). -::: Date: Thu, 3 Oct 2024 17:02:11 +0100 Subject: [PATCH 31/33] add file name --- website/docs/docs/build/snapshots.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 3145dfcce62..6b839387e80 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -112,12 +112,14 @@ To add a snapshot to your project follow these steps. For users on versions 1.8 2. Since snapshots focus on configuration, the transformation logic is minimal. Typically, you'd select all data from the source. If you need to apply transformations (like filters, deduplication), it's best practice to define an ephemeral model and reference it in your snapshot configuration. + + ```yaml - -- models/ephemeral_orders.sql {{ config(materialized='ephemeral') }} select * from {{ source('jaffle_shop', 'orders') }} ``` + 3. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. From ab15fe13ca7eb46c3abe288351c57444c4dc2b87 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 3 Oct 2024 17:09:07 +0100 Subject: [PATCH 32/33] fix close tag --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 6b839387e80..94b722dfe29 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -112,7 +112,7 @@ To add a snapshot to your project follow these steps. For users on versions 1.8 2. Since snapshots focus on configuration, the transformation logic is minimal. Typically, you'd select all data from the source. If you need to apply transformations (like filters, deduplication), it's best practice to define an ephemeral model and reference it in your snapshot configuration. - + ```yaml {{ config(materialized='ephemeral') }} From 271eb62c37114ffa8d08db1f481d2d3a4d3900e7 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 3 Oct 2024 18:15:03 +0100 Subject: [PATCH 33/33] Update website/docs/docs/build/snapshots.md --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 94b722dfe29..bcb65bd7810 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -52,7 +52,7 @@ It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud -In dbt versions 1.9 and later, snapshots are configurations defined in YAML files (typically in your snapshots directory). You'll configure your snapshot to tell dbt how to detect record changes. +In dbt Cloud Versionless and dbt Core v1.9 and later, snapshots are configurations defined in YAML files (typically in your snapshots directory). You'll configure your snapshot to tell dbt how to detect record changes.