-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/databricks-delta-incremental-support #130
feature/databricks-delta-incremental-support #130
Conversation
The SQL Server buildkite test is currently failing, but that is due to a permission issue which should hopefully be resolved soon. I will re-kick off the integration tests once that is resolved. I don't imagine SQL Server would be failing for any of these changes; therefore, this should be good to review even with the failing buildkite test. |
integration_tests/tests/consistency/consistency__audit_table.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes lgtm, and I was able to give it a full refresh and incremental run in Databricks. All looks good there, so approved!
Co-authored-by: Avinash Kunnath <[email protected]>
Co-authored-by: Avinash Kunnath <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-joemarkiewicz Looks like you handled all the minor tweaks already, so just one question on the new audit table config before approving.
@@ -1,13 +1,13 @@ | |||
{{ config( | |||
materialized='table' if is_databricks_sql_warehouse(target) else 'incremental', | |||
materialized='incremental' if is_databricks_all_purpose(target) else 'table', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking this logic, since the conditions have been flipped:
if is_databricks_all_purpose(target)
is true, then it'll be materialized as incremental.
If it's false, then it'll be a table. Which makes sense for databricks.
However, what would this entail for the other warehouses? I'm looking at the macro loop and it seems that it would only be true for the case when it's a databricks runtime is all-purpose. But that would be false for the other warehouses, so they would now be materialized as tables instead of incremental. Is that the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-avinash BRILLIANT catch! You are exactly correct and this was not the intention. This will now set the databricks all-purpose cluster to use the incremental strategy and also turn it off for non all-purpose clusters.... BUT ALSO turns the incremental strategy off for all other warehouses 😱.
Extremely thankfully you reviewed this and caught this gap. Let me revisit the code and account for all other warehouses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just made some code updates to account for the above issue. @fivetran-avinash @fivetran-catfritz would you be able to review and let me know if you have any questions or there are any other considerations to take into account?
See below for validations that the materializations are working as expected on each platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-joemarkiewicz New updates look good!
Only small call-out is if we feel there are any warehouses we set up that might not support incremental materialization in the future, we might want to explicitly do an elif target.type in ('bigquery', 'snowflake', etc.)
--> true, else false, just for full coverage. But that is not a present concern and can be revisited if we add more destinations.
A few additional recommended edits in the Changelog but otherwise lgtm.
@fivetran-avinash really really good catch. I ran this only in Databricks so definitely didn't catch that then! @fivetran-joemarkiewicz Tagging on to Avinash's comments, a more future-proof way to handle the logic update might be:
...
materialized='table' if target.type == 'databricks'
and not is_databricks_runtime(or whatever the old version name was)() else 'incremental'
... That way we don't have to list out the other warehouses. What do you think? |
@fivetran-catfritz I like that idea, but the benefit of listing out each of the warehouses is we are explicitly only using the incremental strategy if we know the destination is supported. If it is not in our supported list, then we use the |
@fivetran-joemarkiewicz Makes sense--in that case approved on my end! |
PR Overview
This PR will address the following Issue/Feature: Internal tickets and Issue #128
This PR will result in the following new package version:
v1.8.0
When I tested this locally for Databricks there was actually no error when running without a full refresh. However, the table format did not change. Therefore, a breaking change should be leveraged to ensure a full refresh is ran and the delta table format is applied.
Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:
PR Checklist
Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
Before marking this PR as "ready for review" the following have been applied:
Detailed Validation
Please share any and all of your validation steps:
To validate these changes the validation tests were included and you can see they were successful for the following destinations:
BigQuery
Databricks All Purpose Cluster
Databricks SQL Warehouse
Additionally, I validated that the All Purpose Cluster appropriately runs an incremental strategy and the non All Purpose (SQL Warehouse in this case) does not run an incremental strategy.
Databricks All Purpose Cluster
Databricks SQL Warehouse
Finally, I confirmed that the Delta format runs as expected and without issue on the Databricks All Purpose cluster on incremental runs.
If you had to summarize this PR in an emoji, which would it be?
🌳