-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency in the last checkpoint metadata field naming: snake_case used instead of camelCase #326
Comments
- Added serde alias for 'size_in_bytes' field in CheckpointMetadata struct - This allows deserialization of both camelCase and snake_case variants - Addresses issue with inconsistent field naming in _last_checkpoint file This is a temporary workaround for the issue described in delta-io#326. The long-term solution will involve aligning the checkpoint writing logic with the Delta protocol specification to use camelCase field names consistently. See delta-io#326 for full details.
My guess: rust clippy enforces snake_case for identifiers, and the json serde will automatically follow that convention unless specifically instructed to convert it to camelCase. |
Confirmed: https://github.com/delta-io/delta-rs/blob/main/crates/core/src/table/mod.rs#L42-L46 /// Metadata for a checkpoint file
#[derive(Serialize, Deserialize, Debug, Default, Clone, Copy)]
pub struct CheckPoint {
...
#[serde(skip_serializing_if = "Option::is_none")]
/// The number of bytes of the checkpoint. This field is optional.
pub(crate) size_in_bytes: Option<i64>,
#[serde(skip_serializing_if = "Option::is_none")]
/// The number of AddFile actions in the checkpoint. This field is optional.
pub(crate) num_of_add_files: Option<i64>,
} |
Corresponding PR on the |
Given this is fixed in delta-rs now, and the alias never landed in kernel, seems we can close this issue and only read the field as |
While investigating snapshot creation for earlier versions (PR #322), I uncovered an inconsistency in how checkpoint metadata fields are named in the
_last_checkpoint
file.Current Behavior:
The
_last_checkpoint
file contains fields in snake_case format. For example:Expected Behavior:
According to the Delta Protocol specification, these fields should be in camelCase. The expected format should be:
Impact:
This inconsistency causes issues with deserialization when the
CheckpointMetadata
struct is annotated with#[serde(rename_all = "camelCase")]
, leading to some fields (likesize_in_bytes
) being incorrectly set toNone
.Current Workaround:
We've temporarily addressed this in PR #322 by adding an alias to the affected field:
Proposed Long-term Solution:
delta-rs
is writing checkpoint metadata in snake_case instead of camelCase.CheckpointMetadata
struct to expect camelCase without needing aliases.Questions to Address:
The text was updated successfully, but these errors were encountered: