-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add typed table configuration #309
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #309 +/- ##
==========================================
+ Coverage 72.51% 74.06% +1.54%
==========================================
Files 43 45 +2
Lines 7783 8217 +434
Branches 7783 8217 +434
==========================================
+ Hits 5644 6086 +442
+ Misses 1768 1754 -14
- Partials 371 377 +6 ☔ View full report in Codecov by Sentry. |
e5500be
to
7cfe7ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey robert thanks so much I'm flushing some comments now and will have another pass soon (after i do some more thinking)! major items: (1) consider naming for the configuration - since these are effectively specifying table properties i'm advocating for (strawman): TableProperties
. (2) We have an interesting pattern here of typed table properties stored in a string/string map and I'm wondering if we can keep the best of both worlds (keep strongly-typed keys but allow for arbitrary key/value pairs) if we do something like introduce a Custom
type in the table property enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe rename to table_properties.rs
? thoughts on clarity?
use std::collections::HashMap; | ||
use std::time::Duration; | ||
|
||
use lazy_static::lazy_static; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: using LazyLock
now instead. (should be a simple change)
/// true for Delta Lake to automatically optimize the layout of the files for this Delta table. | ||
#[strum(serialize = "delta.autoOptimize.autoCompact")] | ||
#[serde(rename = "delta.autoOptimize.autoCompact")] | ||
AutoOptimizeAutoCompact, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AutoOptimizeAutoCompact, | |
AutoCompact, |
/// true for Delta Lake to automatically optimize the layout of the files for this Delta table during writes. | ||
#[strum(serialize = "delta.autoOptimize.optimizeWrite")] | ||
#[serde(rename = "delta.autoOptimize.optimizeWrite")] | ||
AutoOptimizeOptimizeWrite, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AutoOptimizeOptimizeWrite, | |
OptimizeWrite, |
Hash, | ||
)] | ||
#[non_exhaustive] | ||
pub enum DeltaConfigKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we call this DeltaTableProperties
?
pub enum DeltaConfigKey { | |
pub enum DeltaTableProperties { |
pub fn isolation_level(&self) -> IsolationLevel { | ||
self.0 | ||
.get(DeltaConfigKey::IsolationLevel.as_ref()) | ||
.and_then(|v| v.parse().ok()) | ||
.unwrap_or_default() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can these use the macro above?
|
||
impl Default for CheckpointPolicy { | ||
fn default() -> Self { | ||
Self::Classic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default to V2 since classic is being deprecated
Self::Classic | |
Self::V2 |
/// The full path to the field. | ||
pub name: String, | ||
/// The SQL string that must always evaluate to true. | ||
pub expr: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm I need to think about how we will integrate with this (perhaps parsing to Expression
here but that sounds terrible or passing off opaque to engine?)
pub fn config(&self) -> TableConfig<'_> { | ||
self.metadata.config() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flag for rename
@@ -0,0 +1,802 @@ | |||
//! Delta Table configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! Delta Table configuration | |
//! Delta Table properties. Note this module implements per-table | |
//! configuration which governs how table-level capabilities/properties | |
//! are configured (turned on/off etc.). This is orthogonal to protocol-level | |
//! 'table features' which enable or disable reader/writer features (which then | |
//! usually must be enabled/configured by table properties) | |
//! For example (from delta's protocol.md): | |
//! A feature being supported does not imply that it is active. For example, | |
//! a table may have the `appendOnly` feature listed in writerFeatures, but | |
//! it does not have a table property delta.appendOnly that is set to `true`. | |
//! In such a case the table is not append-only, and writers are allowed to | |
//! change, remove, and rearrange data. However, writers must know that | |
//! the table property delta.appendOnly should be checked before writing | |
//! the table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question: are all of these table properties being used in the code today? I doubt that delta.autoOptimize.autoCompact
is, for example.
IMO - we should only add table properties as they come and as they are used. I'm open and curious what the argument would be to include non-used table properties?
/// Interval (number of commits) after which a new checkpoint should be created | ||
#[strum(serialize = "delta.checkpointInterval")] | ||
#[serde(rename = "delta.checkpointInterval")] | ||
CheckpointInterval, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a rust guy, so apologies if this is a naive or unapplicable question: is there a way to define the validation here, too? e.g. for this table property, delta-kernel-java will require and validate that the specified value is greater than 0.
No description provided.