-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rust: ADBC driver panics when consuming Arrow batches without properly aligned buffers #2526
Comments
While this should definitely be done in ADBC rust to allow arbitrary drivers to work properly, should we also have the snowflake driver ensure that it always outputs aligned buffers? |
Does Go not align buffers? Also please provide a backtrace if possible |
It should, but for ADBC we're using the mallocator so all the memory we use for Arrow is allocated in C. So, we're at the mercy of malloc |
Attached one now. I didn't have it when I filed the issue. And I think I also figured how to fix this. But the fix will be in the I think this is the place where the https://github.com/apache/arrow-rs/blob/main/arrow-array/src/ffi_stream.rs#L367 Very easy if I make it a requirement without the ability of opt-out from Rust callers like adbc-core. |
FWIW, the allocator in the C# implementation of Arrow will overallocate by 64 bytes and then adjust the starting location to be on a 64-byte boundary. |
Which is great. That means that when
|
The C++ impl does the overallocating trick too (or maybe it was the Java one). I think it wouldn't be hard to have mallocator do the same thing, right Matt? |
It would be nice to know which data type Rust is complaining about (are there any data types that require an alignment >8 bytes anywhere in the spec?) |
FWIW I wonder if this is the same issue as apache/arrow#32276, e.g. if snowflake is using flight under the hood. The C Data interface has the following to say on this topic
IMO copying unaligned buffers is a bandaid and arrow implementations should avoid creating such buffers.
IIRC i128 as used by Decimal128 requires 16 byte alignment on some architectures. I'm not aware of any types with alignment requirements greater than their size. |
The integers coming from the Snowflake driver are actually [1] https://github.com/apache/arrow-rs/blob/main/arrow-data/src/data.rs#L1591 |
I created apache/arrow-go#282 |
Ah, but Go directly uses |
The best thing I can think of is to wrap |
It does seem unrealistic to achieve 16 byte alignment under most real-world circumstances (I don't think anything coming zero copy from IPC, for example, can ever guarantee more than 8). |
That actually might be the issue with the snowflake driver. The IPC reader is gonna try to zero copy where applicable, so it'll depend on how the network buffers allocate when reading data from the snowflake https requests |
How does Go support buffers sent over FFI / from IPC, which require using a release callback / ref count? Perhaps it is worth raising on the mailing list, IMO either FFI should use aligned buffers, or we should remove the language from the spec and highlight instead that FFI may not be zero-copy. I suspect this will depend on how widespread the alignment challenges are, if it is just Go, it might make sense to just fix it there
FWIW arrow-rs copies to align on IPC for this reason, but given network/disk transfers are involved there is less of an expectation of zero-copy here |
Sorry, I was a bit imprecise with that - the allocator interface in arrow-go uses For the problem at hand, Matt and I were talking about adjusting |
It would be a little unfortunate to lose zero-copy for in-process FFI, IMO. It takes away from one of the main selling points (even if I suspect that many applications would not notice in practice). I don't think the problem is unsolvable, it just isn't as trivial a change as I first thought. |
What happened?
Running SQL queries through the Snowflake ADBC driver, from Rust, can cause a crash due to buffer alignment issues.
This is known problem for Rust [1] when it's consuming buffers from systems that don't care as much about alignment as
rustc
does. It's been fixed in the FFI integration with Python [2].The ADBC Rust driver wrapper should do the same. How can it trigger the
ArrayData::align_buffers()
call that makes the Python FFI integration work without alignment issues?[1] apache/arrow#43552
[2] apache/arrow-rs#6472
Stack Trace
How can we reproduce the bug?
No response
Environment/Setup
No response
The text was updated successfully, but these errors were encountered: