Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datafusion downcasts decimal loosing precision #13492

Open
himadripal opened this issue Nov 19, 2024 · 5 comments
Open

Datafusion downcasts decimal loosing precision #13492

himadripal opened this issue Nov 19, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@himadripal
Copy link

Describe the bug

This cast throws an error

select  cast(12345.6789 as decimal(6,2));
Arrow error: Invalid argument error: 1234568 is too large to store in a Decimal128 of precision 6. Max is 999999

But if we cast it to a larger decimal and then cast it down to (6,2), then DataFusion cast works fine and looses precision. As shown below

> select cast( cast(12345.6789 as decimal(24,2)) as  decimal(6,2));
+---------------------+
| Float64(12345.6789) |
+---------------------+
| 1234.56             |
+---------------------+
1 row(s) fetched. 
Elapsed 0.001 seconds.

To Reproduce

try the above select query in datafusion-cli

Expected behavior

It should return null or throw an error as in this example

select  cast(12345.6789 as decimal(6,2));
Arrow error: Invalid argument error: 1234568 is too large to store in a Decimal128 of precision 6. Max is 999999

Additional context

while trying to enable decimal support in comet, Spark returns null in this cast but DataFusion returns a lowered precision value loosing the value of the decimal.

@himadripal himadripal added the bug Something isn't working label Nov 19, 2024
@himadripal
Copy link
Author

In Postgres the following query resulted in an error

SELECT CAST(CAST(12345.6789 AS NUMERIC(24, 2)) AS NUMERIC(6, 2));
[22003] ERROR: numeric field overflow Detail: A field with precision 6, scale 2 must round to an absolute value less than 10^4.

@findepi
Copy link
Member

findepi commented Nov 20, 2024

But if we cast it to a larger decimal and then cast it down to (6,2), then DataFusion cast works fine and looses precision. As shown below

> select cast( cast(12345.6789 as decimal(24,2)) as  decimal(6,2));
+---------------------+
| Float64(12345.6789) |
+---------------------+
| 1234.56             |
+---------------------+

Losing decimal digits (rounding) in a cast is OK.

However here the cast changes value fundamentally, losing digits before the decimal dot, which is not OK.
The original value was ≈12345, the cast result is ≈1234.

@andygrove
Copy link
Member

I experimented with some test code in arrow-rs and see that there are validity checks when casting f64 to decimal but not when casting decimal to decimal.

Setup code:

        let array = vec![Some(123456789)];
        let array = create_decimal_array(array, 24, 2).unwrap();
        println!("{:?}", array);
        // PrimitiveArray<Decimal128(24, 2)>
        // [
        //     123456789,
        // ]

        let input_type = DataType::Decimal128(24, 2);
        let output_type = DataType::Decimal128(6, 2);
        assert!(can_cast_types(&input_type, &output_type));

        let mut options = CastOptions::default();
        options.safe = false;

Casting to decimal with smaller precision works when it should not:

        let result = cast_with_options(&array, &output_type, &options).unwrap();
        println!("{:?}", result);
        // PrimitiveArray<Decimal128(6, 2)>
        // [
        //     123456789,
        // ]

Casting from f64 to decimal fails as expected:

        let array = Float64Array::from(vec![1234567.89]);
        let result = cast_with_options(&array, &output_type, &options).unwrap();
        // InvalidArgumentError("123456789 is too large to store in a Decimal128 of precision 6. Max is 999999")

@himadripal
Copy link
Author

working on it.

@himadripal
Copy link
Author

once the arrow release is out, we can create a Datafusion release and use that here to fix this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants