Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] hard_deletes: new_record not working as expected #1275

Open
2 tasks done
jeremyyeo opened this issue Dec 13, 2024 · 0 comments
Open
2 tasks done

[Bug] hard_deletes: new_record not working as expected #1275

jeremyyeo opened this issue Dec 13, 2024 · 0 comments
Assignees
Labels
bug Something isn't working triage

Comments

@jeremyyeo
Copy link
Contributor

jeremyyeo commented Dec 13, 2024

Is this a new bug in dbt-snowflake?

  • I believe this is a new bug in dbt-snowflake
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

The new hard_deletes: new_record config does not appear to be working on Snowflake.

Expected Behavior

The new hard_deletes: new_record config works as expected.

Steps To Reproduce

  1. Make sure snapshot doesn't exist yet.
drop table development_jyeo.dbt_jyeo.snappy;
  1. dbt project setup.
# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: "1.0.0"

models:
  my_dbt_project:
    +materialized: table

# models/snapshots.yml
snapshots:
  - name: snappy
    relation: ref('foo')
    config:
      strategy: check
      unique_key: id
      check_cols: all
      hard_deletes: new_record
-- models/foo.sql
{{ config(materialized='table') }}
select 1 id, 'alice' as first_name
  1. Build
$ dbt build
18:36:43  Running with dbt=1.9.0-rc2
18:36:44  Registered adapter: snowflake=1.9.0-rc1
18:36:44  Unable to do partial parsing because profile has changed
18:36:44  Unable to do partial parsing because a project dependency has been added
18:36:44  Found 1 model, 1 snapshot, 468 macros
18:36:44  
18:36:44  Concurrency: 1 threads (target='sf')
18:36:44  
18:36:46  1 of 2 START sql table model dbt_jyeo.foo ...................................... [RUN]
18:36:49  1 of 2 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 2.44s]
18:36:49  2 of 2 START snapshot dbt_jyeo.snappy .......................................... [RUN]
18:36:51  2 of 2 OK snapshotted dbt_jyeo.snappy .......................................... [SUCCESS 1 in 2.67s]
18:36:52  
18:36:52  Finished running 1 snapshot, 1 table model in 0 hours 0 minutes and 7.61 seconds (7.61s).
18:36:52  
18:36:52  Completed successfully
18:36:52  
18:36:52  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
  1. Update foo and "hard delete" record with id = 1:
-- models/foo.sql
{{ config(materialized='table') }}
select 2 id, 'bob' as first_name
  1. Build
$ dbt --debug build
...
18:38:22  2 of 2 START snapshot dbt_jyeo.snappy .......................................... [RUN]
18:38:22  Re-using an available connection from the pool (formerly model.my_dbt_project.foo, now snapshot.my_dbt_project.snappy)
18:38:22  Began compiling node snapshot.my_dbt_project.snappy
18:38:22  Began executing node snapshot.my_dbt_project.snappy
18:38:22  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:22  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
select * from (
        select * from development_jyeo.dbt_jyeo.foo
    ) as __dbt_sbq
    where false
    limit 0
18:38:23  SQL status: SUCCESS 0 in 0.267 seconds
18:38:23  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:23  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
describe table "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY"
18:38:23  SQL status: SUCCESS 7 in 0.215 seconds
18:38:23  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:23  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
describe table "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY"
18:38:23  SQL status: SUCCESS 7 in 0.235 seconds
18:38:23  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:23  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
describe table "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY"
18:38:23  SQL status: SUCCESS 7 in 0.217 seconds
18:38:23  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:23  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
/* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
select * from (
        select * from development_jyeo.dbt_jyeo.foo
    ) as __dbt_sbq
    where false
    limit 0
18:38:24  SQL status: SUCCESS 0 in 0.227 seconds
18:38:24  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:24  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
/* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
select * from (
        select * from development_jyeo.dbt_jyeo.foo
    ) as __dbt_sbq
    where false
    limit 0
18:38:24  SQL status: SUCCESS 0 in 0.241 seconds
18:38:24  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:24  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
create or replace temporary table "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY__dbt_tmp"
         as
        (

    with snapshot_query as (

        select * from development_jyeo.dbt_jyeo.foo

    ),

    snapshotted_data as (

        select *, 
    
        id as dbt_unique_key
    

        from "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY"
        where
            
                dbt_valid_to is null
            

    ),

    insertions_source_data as (

        select *, 
    
        id as dbt_unique_key
    
,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_updated_at,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_valid_from,
            
  
  coalesce(nullif(to_timestamp_ntz(convert_timezone('UTC', current_timestamp())), to_timestamp_ntz(convert_timezone('UTC', current_timestamp()))), null)
  as dbt_valid_to
,
            md5(coalesce(cast(id as varchar ), '')
         || '|' || coalesce(cast(to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as varchar ), '')
        ) as dbt_scd_id

        from snapshot_query
    ),

    updates_source_data as (

        select *, 
    
        id as dbt_unique_key
    
,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_updated_at,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_valid_from,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_valid_to

        from snapshot_query
    ),

    deletes_source_data as (

        select *, 
    
        id as dbt_unique_key
    

        from snapshot_query
    ),
    

    insertions as (

        select
            'insert' as dbt_change_type,
            source_data.*,'False' as dbt_is_deleted

        from insertions_source_data as source_data
        left outer join snapshotted_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

            where 
    
        snapshotted_data.dbt_unique_key is null
    

            or (
    
        snapshotted_data.dbt_unique_key is not null
    
 and ((snapshotted_data."ID" != source_data."ID"
        or
        (
            ((snapshotted_data."ID" is null) and not (source_data."ID" is null))
            or
            ((not snapshotted_data."ID" is null) and (source_data."ID" is null))
        ) or snapshotted_data."FIRST_NAME" != source_data."FIRST_NAME"
        or
        (
            ((snapshotted_data."FIRST_NAME" is null) and not (source_data."FIRST_NAME" is null))
            or
            ((not snapshotted_data."FIRST_NAME" is null) and (source_data."FIRST_NAME" is null))
        )))

        )

    ),

    updates as (

        select
            'update' as dbt_change_type,
            source_data.*,
            snapshotted_data.dbt_scd_id, snapshotted_data.dbt_is_deleted

        from updates_source_data as source_data
        join snapshotted_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

        where (
            (snapshotted_data."ID" != source_data."ID"
        or
        (
            ((snapshotted_data."ID" is null) and not (source_data."ID" is null))
            or
            ((not snapshotted_data."ID" is null) and (source_data."ID" is null))
        ) or snapshotted_data."FIRST_NAME" != source_data."FIRST_NAME"
        or
        (
            ((snapshotted_data."FIRST_NAME" is null) and not (source_data."FIRST_NAME" is null))
            or
            ((not snapshotted_data."FIRST_NAME" is null) and (source_data."FIRST_NAME" is null))
        ))
        )
    )
    ,
    deletes as (

        select
            'delete' as dbt_change_type,
            source_data.*,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_valid_from,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_updated_at,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_valid_to,
            snapshotted_data.dbt_scd_id, snapshotted_data.dbt_is_deleted
        from snapshotted_data
        left join deletes_source_data as source_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

            where 
    
        source_data.dbt_unique_key is null
    

    )
        
    ,
    deletion_records as (

        select
            'insert' as dbt_change_type,snapshotted_data."ID",
            snapshotted_data."FIRST_NAME",
            snapshotted_data.dbt_unique_key as dbt_unique_key,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_valid_from,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_updated_at,
            snapshotted_data.dbt_valid_to as dbt_valid_to,
            snapshotted_data.dbt_scd_id,
            'True' as dbt_is_deleted
        from snapshotted_data
        left join deletes_source_data as source_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

            where 
    
        source_data.dbt_unique_key is null
    

    )

    select * from insertions
    union all
    select * from updates
    union all
    select * from deletes
    union all
    select * from deletion_records

        );
18:38:26  SQL status: SUCCESS 1 in 2.562 seconds
18:38:26  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:26  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
describe table "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY__dbt_tmp"
18:38:27  SQL status: SUCCESS 9 in 0.233 seconds
18:38:27  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:27  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
describe table "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY"
18:38:27  SQL status: SUCCESS 7 in 0.219 seconds
18:38:27  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:27  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
describe table "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY__dbt_tmp"
18:38:27  SQL status: SUCCESS 9 in 0.254 seconds
18:38:27  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:27  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
describe table "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY"
18:38:27  SQL status: SUCCESS 7 in 0.219 seconds
18:38:27  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:27  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
describe table "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY__dbt_tmp"
18:38:28  SQL status: SUCCESS 9 in 0.225 seconds
18:38:28  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:28  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
/* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
select * from (
        

    with snapshot_query as (

        select * from development_jyeo.dbt_jyeo.foo

    ),

    snapshotted_data as (

        select *, 
    
        id as dbt_unique_key
    

        from "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY"
        where
            
                dbt_valid_to is null
            

    ),

    insertions_source_data as (

        select *, 
    
        id as dbt_unique_key
    
,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_updated_at,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_valid_from,
            
  
  coalesce(nullif(to_timestamp_ntz(convert_timezone('UTC', current_timestamp())), to_timestamp_ntz(convert_timezone('UTC', current_timestamp()))), null)
  as dbt_valid_to
,
            md5(coalesce(cast(id as varchar ), '')
         || '|' || coalesce(cast(to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as varchar ), '')
        ) as dbt_scd_id

        from snapshot_query
    ),

    updates_source_data as (

        select *, 
    
        id as dbt_unique_key
    
,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_updated_at,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_valid_from,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_valid_to

        from snapshot_query
    ),

    deletes_source_data as (

        select *, 
    
        id as dbt_unique_key
    

        from snapshot_query
    ),
    

    insertions as (

        select
            'insert' as dbt_change_type,
            source_data.*,'False' as dbt_is_deleted

        from insertions_source_data as source_data
        left outer join snapshotted_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

            where 
    
        snapshotted_data.dbt_unique_key is null
    

            or (
    
        snapshotted_data.dbt_unique_key is not null
    
 and ((snapshotted_data."ID" != source_data."ID"
        or
        (
            ((snapshotted_data."ID" is null) and not (source_data."ID" is null))
            or
            ((not snapshotted_data."ID" is null) and (source_data."ID" is null))
        ) or snapshotted_data."FIRST_NAME" != source_data."FIRST_NAME"
        or
        (
            ((snapshotted_data."FIRST_NAME" is null) and not (source_data."FIRST_NAME" is null))
            or
            ((not snapshotted_data."FIRST_NAME" is null) and (source_data."FIRST_NAME" is null))
        )))

        )

    ),

    updates as (

        select
            'update' as dbt_change_type,
            source_data.*,
            snapshotted_data.dbt_scd_id, snapshotted_data.dbt_is_deleted

        from updates_source_data as source_data
        join snapshotted_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

        where (
            (snapshotted_data."ID" != source_data."ID"
        or
        (
            ((snapshotted_data."ID" is null) and not (source_data."ID" is null))
            or
            ((not snapshotted_data."ID" is null) and (source_data."ID" is null))
        ) or snapshotted_data."FIRST_NAME" != source_data."FIRST_NAME"
        or
        (
            ((snapshotted_data."FIRST_NAME" is null) and not (source_data."FIRST_NAME" is null))
            or
            ((not snapshotted_data."FIRST_NAME" is null) and (source_data."FIRST_NAME" is null))
        ))
        )
    )
    ,
    deletes as (

        select
            'delete' as dbt_change_type,
            source_data.*,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_valid_from,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_updated_at,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_valid_to,
            snapshotted_data.dbt_scd_id, snapshotted_data.dbt_is_deleted
        from snapshotted_data
        left join deletes_source_data as source_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

            where 
    
        source_data.dbt_unique_key is null
    

    )
        
    ,
    deletion_records as (

        select
            'insert' as dbt_change_type,snapshotted_data."ID",
            snapshotted_data."FIRST_NAME",
            snapshotted_data.dbt_unique_key as dbt_unique_key,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_valid_from,
            to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_updated_at,
            snapshotted_data.dbt_valid_to as dbt_valid_to,
            snapshotted_data.dbt_scd_id,
            'True' as dbt_is_deleted
        from snapshotted_data
        left join deletes_source_data as source_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

            where 
    
        source_data.dbt_unique_key is null
    

    )

    select * from insertions
    union all
    select * from updates
    union all
    select * from deletes
    union all
    select * from deletion_records

    ) as __dbt_sbq
    where false
    limit 0
18:38:28  SQL status: SUCCESS 0 in 0.301 seconds
18:38:28  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:28  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
/* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
select * from (
        select to_timestamp_ntz(convert_timezone('UTC', current_timestamp())) as dbt_snapshot_time
    ) as __dbt_sbq
    where false
    limit 0
18:38:28  SQL status: SUCCESS 0 in 0.256 seconds
18:38:28  Writing runtime sql for node "snapshot.my_dbt_project.snappy"
18:38:28  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:28  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
BEGIN
18:38:28  SQL status: SUCCESS 1 in 0.263 seconds
18:38:28  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:28  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
merge into "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY" as DBT_INTERNAL_DEST
    using "DEVELOPMENT_JYEO"."DBT_JYEO"."SNAPPY__dbt_tmp" as DBT_INTERNAL_SOURCE
    on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id

    when matched
     
       and DBT_INTERNAL_DEST.dbt_valid_to is null
     
     and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete')
        then update
        set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to

    when not matched
     and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'
        then insert ("ID", "FIRST_NAME", "DBT_UPDATED_AT", "DBT_VALID_FROM", "DBT_VALID_TO", "DBT_SCD_ID", "DBT_IS_DELETED")
        values ("ID", "FIRST_NAME", "DBT_UPDATED_AT", "DBT_VALID_FROM", "DBT_VALID_TO", "DBT_SCD_ID", "DBT_IS_DELETED")

;
18:38:29  SQL status: SUCCESS 2 in 0.764 seconds
18:38:29  Using snowflake connection "snapshot.my_dbt_project.snappy"
18:38:29  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "sf", "node_id": "snapshot.my_dbt_project.snappy"} */
COMMIT
18:38:30  SQL status: SUCCESS 1 in 0.399 seconds
18:38:30  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'f25f0a3c-a78f-44b4-bcbb-41ccf218a6d2', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x13bf684d0>]}
18:38:30  2 of 2 OK snapshotted dbt_jyeo.snappy .......................................... [SUCCESS 2 in 7.32s]

Truncated logs to only the snapshot.

Check snapshot:

image

^ We can see that there is no new row for id = 1 added with the dbt_is_deleted col set to True - if things worked as expected (as it does on postgres) - then that would be the case.

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11.9
- dbt-core: 1.9.0-rc2
- dbt-snowflake: 1.9.0rc1

Additional Context

This works as expected on dbt-postgres:

image
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants