Skip to content

Commit

Permalink
allow typeLength to come from opts.column when decoding FIXED_LEN_BYT…
Browse files Browse the repository at this point in the history
…E_ARRAY (#108)

Problem
=======

typeLength is present in column options but decoding is throwing an
error.

`thrown: "missing option: typeLength (required for
FIXED_LEN_BYTE_ARRAY)"`

options object for reference:
```
    {
      type: 'FIXED_LEN_BYTE_ARRAY',
      rLevelMax: 0,
      dLevelMax: 1,
      compression: 'SNAPPY',
      column: {
        name: 'BLOCK_NUMBER',
        primitiveType: 'FIXED_LEN_BYTE_ARRAY',
        originalType: 'DECIMAL',
        path: [ 'BLOCK_NUMBER' ],
        repetitionType: 'OPTIONAL',
        encoding: 'PLAIN',
        statistics: undefined,
        compression: 'UNCOMPRESSED',
        precision: 38,
        scale: 0,
        typeLength: 16,
        rLevelMax: 0,
        dLevelMax: 1
      },
      num_values: { buffer: <Buffer 00 00 00 00 00 00 27 10>, offset: 0 }
    }
```

using `parquet-tools schema` here is the schema for this column:
```
optional fixed_len_byte_array(16) BLOCK_NUMBER (DECIMAL(38,0))
```

The parquet file is a direct export from snowflake and the data type of
the column is `NUMBER(38,0)`.

Solution
========
I traced through the code to find where the decode was erroring and
added the ability to take the `typeLength` from `column` in the column
options when it is not present at the top level.

Change summary:
---------------
see above

Steps to Verify:
----------------
decode a parquet file with this type of field.

---------

Co-authored-by: Wil Wade <[email protected]>
Co-authored-by: Wil Wade <[email protected]>
  • Loading branch information
3 people authored Jan 19, 2024
1 parent 8d34ac1 commit 2622ff1
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 6 deletions.
9 changes: 5 additions & 4 deletions lib/codec/plain.ts
Original file line number Diff line number Diff line change
Expand Up @@ -264,16 +264,17 @@ function decodeValues_FIXED_LEN_BYTE_ARRAY(
opts: Options
) {
let values = [];

if (!opts.typeLength) {
const typeLength =
opts.typeLength ?? (opts.column ? opts.column.typeLength : undefined);
if (!typeLength) {
throw "missing option: typeLength (required for FIXED_LEN_BYTE_ARRAY)";
}

for (let i = 0; i < count; ++i) {
values.push(
cursor.buffer.slice(cursor.offset, cursor.offset + opts.typeLength)
cursor.buffer.slice(cursor.offset, cursor.offset + typeLength)
);
cursor.offset += opts.typeLength;
cursor.offset += typeLength;
}

return values;
Expand Down
2 changes: 0 additions & 2 deletions test/reference-test/read-all.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@ const unsupported = [
'delta_encoding_optional_column.parquet', // DELTA_BINARY_PACKED unsupported
'delta_encoding_required_column.parquet', // DELTA_BINARY_PACKED unsupported
'delta_length_byte_array.parquet', // ZSTD unsupported, DELTA_BINARY_PACKED unsupported
'float16_nonzeros_and_nans.parquet', // missing option: typeLength (required for FIXED_LEN_BYTE_ARRAY)
'float16_zeros_and_nans.parquet', // missing option: typeLength (required for FIXED_LEN_BYTE_ARRAY)
'large_string_map.brotli.parquet', // BUG?
];

Expand Down

0 comments on commit 2622ff1

Please sign in to comment.