v0.6.0 (2024-12-17)
API Changes
Breaking
Scan::execute
takes anArc<dyn EngineData>
now (#553)StructField::physical_name
no longer takes aColumnMapping
argument (#543)- removed
ColumnMappingMode
Default
implementation (#562) - Remove lifetime requirement on
Scan::execute
(#588) scan::Scan::predicate
renamed asphysical_predicate
to eliminate ambiguity (#512)scan::log_replay::scan_action_iter
now takes fewer (and different) params. (#512)Expression::Unary
,Expression::Binary
, andExpression::Variadic
now wrap a struct of the same name containing their fields (#530)- Moved
delta_kernel::engine::parquet_stats_skipping
module todelta_kernel::predicate::parquet_stats_skipping
(#602) - New
Error
variantsError::ChangeDataFeedIncompatibleSchema
andError::InvalidCheckpoint
(#593)
Additions
- Ability to read a table's change data feed with new TableChanges API! See new
table_changes
module as well as the 'read-table-changes' example (#597). Changes include:
- Implement Log Replay for Change Data Feed (#540)
ScanFile
expression and visitor for CDF (#546)- Resolve deletion vectors to find inserted and removed rows for CDF (#568)
- Helper methods for CDF Physical to Logical Transformation (#579)
TableChangesScan::execute
and end to end testing for CDF (#580)TableChangesScan::schema
method to get logical schema (#589)
- Enable relaying log events via FFI (#542)
Implemented enhancements:
- Define an ExpressionTransform trait (#530)
- [chore] appease clippy in rustc 1.83 (#557)
- Simplify column mapping mode handling (#543)
- Adding some more miri tests (#503)
- Data skipping correctly handles nested columns and column mapping (#512)
- Engines now return FileMeta with correct millisecond timestamps (#565)
Fixed bugs:
- don't use std abs_diff, put it in test_utils instead, run tests with msrv in action (#596)
- (CDF) Add fix for sv extension (#591)
- minimal CI fixes in arrow integration test and semver check (#548)
v0.5.0 (2024-11-26)
API Changes
Breaking
Expression::Column(String)
is nowExpression::Column(ColumnName)
#400- delta_kernel_ffi::expressions moved into two modules:
delta_kernel_ffi::expressions::engine
anddelta_kernel_ffi::expressions::kernel
#363 - FFI: removed (hazardous)
impl From
forKernelStringSlize
and addedunsafe
constructor instead #441 - Moved
LogSegment
into its own module (log_segment::LogSegment
) #438 - Renamed
EngineData::length
asEngineData::len
#471 - New
AsAny
trait:AsAny: Any + Send + Sync
required bound on all engine traits #450 - Rename
mod features
tomod table_features
#454 - LogSegment fields renamed:
commit_files
->ascending_commit_files
andcheckpoint_files
->checkpoint_parts
#495 - Added minimum-supported rust version: currenly rust 1.80 #504
- Improved row visitor API: renamed
EngineData::extract
asEngineData::visit_rows
, andDataVisitor
trait renamed asRowVisitor
#481 - FFI: New
mod engine_data
andmod error
(movedError
toerror::Error
) #537 - new error types:
InvalidProtocol
,InvalidCommitInfo
,MissingCommitInfo
,FileAlreadyExists
,Unsupported
,ParseIntervalError
,ChangeDataFeedUnsupported
Additions
- New
ColumnName
,column_name!
,column_expr!
for structured column name parsing. #400 #467 - New
Engine
APIwrite_json_file()
for atomically writing JSON #370 - New
Transaction
API for creating transactions, adding commit info and write metadata, and commiting the transaction to the table. IncludesTable.new_transaction()
,Transaction.write_context()
,Transaction.with_commit_info
,Transaction.with_operation()
,Transaction.with_write_metadata()
, andTransaction.commit()
#370 #393 - FFI: Visitor for converting kernel expressions to engine expressions. See the new example at
ffi/examples/visit-expression/
#363 - FFI: New
TryFromStringSlice
trait andkernel_string_slice
macro #441 - New
DefaultEngine
engine implementation for writing parquet:write_parquet_file()
#393 - Added support for parsing comma-separated column name lists:
ColumnName::parse_column_name_list()
#458 - New
VacuumProtocolCheck
table feature #454 DvInfo
now implementsClone
,PartialEq
, andEq
#468Stats
now implementsDebug
,Clone
,PartialEq
, andEq
#468- Added
Cdc
action support #506 - (early CDF read support) New
TableChanges
type to read CDF from a table between versions #505 - (early CDF read support) Builder for scans on
TableChanges
#521 - New
TableProperties
struct which can parse tables'metadata.configuration
#453 #536
Implemented enhancements:
- FFI examples now use AddressSanitizer #447
ColumnName
now tracks a path of field names instead of a simple string #445- use
ParsedLogPaths
for files inLogSegment
#472 - FFI: added Miri support for tests #470
- check table URI has trailing slash #432
- build
cargo docs
in CI #479 - new
test-utils
crate #477 - added proper protocol validation (both parsing correctness and semantic correctness) #454 #493
- harmonize predicate evaluation between delta stats and parquet footer stats #420
- more log path tests #485
ensure_read_supported
andensure_write_supported
APIs #518- include NOTICE and LICENSE in published crates #520
- FFI: factored out read_table kernel utils into
kernel_utils.h/c
#539 - simplified log replay visitor and avoid materializing Add/Remove actions #494
- simplified schema transform API #531
- support arrow view types in conversion from
ArrowDataType
to kernel'sDataType
#533
Fixed bugs:
- Disabled missing-column row group skipping: The optimization to treat a physically missing column as all-null is unsound, if the schema was not already verified to prove that the table's logical schema actually includes the missing column. We disable it until we can add the necessary validation. #435
- fixed leaks in read_table FFI example #449
- fixed read_table compilation on windows #455
- fixed various predicate eval bugs #420
v0.4.1 (2024-10-28)
API Changes
None.
Fixed bugs:
- Disabled missing-column row group skipping: The optimization to treat a physically missing column as all-null is unsound, if the schema was not already verified to prove that the table's logical schema actually includes the missing column. We disable it until we can add the necessary validation. #435
v0.4.0 (2024-10-23)
API Changes
Breaking
pub ScanResult.mask
field made private and only accessible asScanResult.raw_mask()
method #374- new
ReaderFeatures
enum variant:TypeWidening
andTypeWideningPreview
#335 - new
WriterFeatures
enum variant:TypeWidening
andTypeWideningPreview
#335 - new
Error
enum variant:InvalidLogPath
when kernel is unable to parse the name of a log path #347 - Module moved:
mod delta_kernel::transaction
->mod delta_kernel::actions::set_transaction
#386 - change
default-feature
to be none (removedsync-engine
by default. If downstream users relied on this, turn onsync-engine
feature or specific arrow-related feature flags to pull in the pieces needed) #339 Scan
'sexecute(..)
method now returns a lazy iterator instead of materializing aVec<ScanResult>
. You can trivially migrate to the new API (and force eager materialization by using.collect()
or the like on the returned iterator) #340- schema and expression FFI moved to their own
mod delta_kernel_ffi::schema
andmod delta_kernel_ffi::expressions
#360 - Parquet and JSON readers in
Engine
trait now takeArc<Expression>
(aliased toExpressionRef
) instead ofExpression
#364 StructType::new(..)
now takes animpl IntoIterator<Item = StructField>
instead ofVec<StructField>
#385DataType::struct_type(..)
now takes animpl IntoIterator<Item = StructField>
instead ofVec<StructField>
#385- removed
DataType::array_type(..)
API: there is already animpl From<ArrayType> for DataType
#385 Expression::struct_expr(..)
renamed toExpression::struct_from(..)
#399- lots of expressions take
impl Into<Self>
orimpl Into<Expression>
instead of justSelf
/Expression
now #399 - remove
log_replay_iter
andprocess_batch
APIs inscan::log_replay
#402
Additions
- remove feature flag requirement for
impl GetData
on()
#334 - new
full_mask()
method onScanResult
#374 StructType::try_new(fields: impl IntoIterator<Item = StructField>)
#385DataType::try_struct_type(fields: impl IntoIterator<Item = StructField>)
#385StructField.metadata_with_string_values(&self) -> HashMap<String, String>
to materialize and return our metadata into a hashmap #331
Implemented enhancements:
- support reading tables with type widening in default engine #335
- add predicate to protocol and metadata log replay for pushdown #336 and #343
- support annotation (macro) for nullable values in a container (for
#[derive(Schema)]
) #342 - new
ParsedLogPath
type for better log path parsing #347 - implemented row group skipping for default engine parquet readers and new utility trait for stats-based skipping logic #357, #362, #381
- depend on wider arrow versions and add arrow integration testing #366 and #413
- added semver testing to CI #369, #383, #384
- new
SchemaTransform
trait and usage in column mapping and data skipping #395 and #398 - arrow expression evaluation improvements #401
- replace panics with
to_compiler_error
in macros #409
Fixed bugs:
- output of arrow expression evaluation now applies/validates output schema in default arrow expression handler #331
- add
arrow-buffer
toarrow-expression
feature #332 - fix bug with out-of-date last checkpoint #354
- fixed broken sync engine json parsing and harmonized sync/async json parsing #373
- filesystem client now always returns a sorted list #344
v0.3.1 (2024-09-10)
API Changes
Additions
- Two new binary expressions:
In
andNotIn
, as well as a newScalar::Array
variant to represent arrays in the expression framework #270 NOTE: exact API for these expressions is still evolving.
Implemented enhancements:
- Enabled more golden table tests #301
Fixed bugs:
- Allow kernel to read tables with invalid
_last_checkpoint
#311 - List log files with checkpoint hint when constructing latest snapshot (when version requested is
None
) #312 - Fix incorrect offset value when computing list offsets #327
- Fix metadata string conversion in default engine arrow conversion #328
v0.3.0 (2024-08-07)
API Changes
Breaking
delta_kernel::column_mapping
module moved todelta_kernel::features::column_mapping
#222
Additions
- New deletion vector API
row_indexes
(and accompanying FFI) to get row indexes instead of seletion vector of deleted rows. This can be more efficient for sparse DVs. #215 - Typed table features:
ReaderFeatures
,WriterFeatures
enums andhas_reader_feature
/has_writer_feature
API #222
Implemented enhancements:
- Add
--limit
option to exampleread-table-multi-threaded
#297 - FFI now built with cmake. Move to using the read-test example as an ffi-test. And building on macos. #288
- Golden table tests migrated from delta-spark/delta-kernel java #295
- Code coverage implemented via cargo-llvm-cov and reported with codecov #287
- All tests enabled to run in CI #284
- Updated DAT to 0.3 #290
Fixed bugs:
- Evaluate timestamps as "UTC" instead of "+00:00" for timezone #295
- Make Map arrow type field naming consistent with parquet field naming #299
v0.2.0 (2024-07-17)
API Changes
Breaking
-
The scan callback if using
visit_scan_files
now takes an extraOption<Stats>
argument, holding top level stats for associated scan file. You will need to add this argument to your callback.Likewise, the callback in the ffi code also needs to take a new argument which is a pointer to a
Stats
struct, and which can be null if no stats are present.
Additions
- You can call
scan_builder()
directly on a snapshot, for more convenience. - You can pass a
URL
starting with"hdfs"
or"viewfs"
to the default client to read usinghdfs_native_store
Implemented enhancements:
- Handle nested structs in
schemaString
(allows reading iceberg compat tables) #257 - Expose top level stats in scans #227
- Hugely expanded C-FFI example #203
- Add
scan_builder
function toSnapshot
#273 - Add
hdfs_native_store
support #273 - Proper reading of Parquet files, including only reading requested leaves, type casting, and reordering #271
- Allow building the package if you are behind an https proxy #282
Fixed bugs:
- Don't error if more fields exist than expected in a struct expression #267
- Handle cases where the deletion vector length is less than the total number of rows in the chunk #276
- Fix partition map indexing if column mapping is in effect #278
v0.1.1 (2024-06-03)
Implemented enhancements:
- Support unary
NOT
andIsNull
for data skipping #231 - Add unary visitors to c ffi #247
- Minor other QOL improvements
v0.1.0 (2024-06-12)
Initial public release