-
Notifications
You must be signed in to change notification settings - Fork 50
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement Log Replay for Change Data Feed (#540)
## What changes are proposed in this pull request? This PR introduces the path for replaying the log for TableChanges and resolving cdc, add, and remove actions. At the top level, we introduce `TableChangesScan::scan_data`, which gets `TableChangesScanData` used to read CDF. The stream of scan data requires a log replay. To perform log replay, the `table_changes::LogReplayScanner` is introduced, which processes a single commit. It is responsible for two things: 1. Producing `TableChangesScanData`, which is made up of transformed `EngineData`, a selection vector, and a map `remove_dvs: HashMap<String, DvInfo>`. `remove_dvs` maps from a remove action's path to its deletion vector. 2. The replay scanner also performs schema, protocol, and table property validation to ensure that the Change Data Feed can be processed. The `LogReplayScanner` performs two passes over the actions for each commit in `try_new` and `into_scan_batches` respectively. To perform the operations above, two new visitors are added: `PreparePhaseVisitor`, and `FileActionSelectionVisitor`. To test the changes, a new `LocalMockTable` struct is created for testing. This struct is used to write batches of actions into commits. This is used to verify that LogReplay produces correct output. The physical schema is added to `TableChangesScan`. ## How was this change tested? The following cases are tested: - Valid metadata and protocol processing - Failure due to `delta.enableChangeDataFeed` not being enabled. - Failure due to incompatible schema - Simple add and remove case where there are no shared paths among the actions - A `cdc` action is present and all other actions must be filtered. - A remove and add action with the same path are resolved: The remove action is not selected, but it's registered in the `remove_dv` map. The add action must be selected. - Failure due to incompatible protocol update. - Correctly using default timestamp from the file modification time. - Data skipping works during log replay. The following schema validation cases are tested: - Adding non-nullable column - adding nullable column - commit has wider type than cdf schema - type widening (will eventually be supported) - cdf column is nullable while the commit schema is non-nullable (will eventually be supported) - cdf schema and commit schema have completely incompatible types - cdf schema has an extra nullable column.
- Loading branch information
1 parent
eb95c5b
commit 3b456e4
Showing
13 changed files
with
1,271 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.