-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ability of FlightDataEncoder to respect max_flight_data_size for certain data types (strings, dictionaries, etc) #3478
Comments
In reviewing the arrow IPC writer code, it does appear to be clever about using offsets when actually writing (thanks @viirya in #2040 ❤️ ) arrow-rs/arrow-ipc/src/writer.rs Lines 1094 to 1260 in acefeef
However, I am not sure exactly how this will translate to flight data size -- I am writing some more tests now |
PR with tests showing how far from optimal the current splitting logic is: #3481 |
take |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Some implementations of gRPC, such as golang have a default max message size that is "relatively small" (4MB) and the clients will generate errors if they receive larger messages.
The
FlightDataEncoder
has a mechanism (link) to try and avoid this problem by heuristically slicingRecordBatch
s into smaller parts to limit their size. This works well for primitive arrays but does not work well for other cases as we have found upstream in IOx:Lists, structs, and other nested types probably suffer from similar issues with maximum message sizes.
Of course, the smallest message possible is a single row, which can always be be significantly larger than whatever the
max_flight_data_size
limit is for variable length columns (e.g. several large string columns)Describe the solution you'd like
I would like to improve the situation and handle nested types and more effectively reduce the
FlightDataSize
Describe alternatives you've considered
Additional context
See #3347
The text was updated successfully, but these errors were encountered: