-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][FlightRPC] Flight generates misaligned buffers #32276
Comments
Antoine Pitrou / @pitrou: |
David Li / @lidavidm:
so an SSE instruction. Backtrace is
|
Yifei Yang: |
Antoine Pitrou / @pitrou: |
Yifei Yang: |
Yifei Yang: |
Weston Pace / @westonpace: |
David Li / @lidavidm: |
Apache Arrow JIRA Bot: |
A number of users are running into this, perhaps in lieu of a proper solution we might be able to provide a workaround? Perhaps a kernel that properly aligns input buffers? Perhaps such a kernel already exists? Whilst we could patch around this in arrow-rs FFI it undermines the zero-copy expectation of FFI, and of course doesn't help with all the other arrow implementations that require alignment. |
Although the solution in #14758 is not zero-copy if the buffers are unaligned. |
Hi all, just following up on this - I see there are various workarounds being implemented in client/downstream code, but is it likely that the alignment issue is going to be addressed, or should we assume that receiving unaligned buffers from Flight is normal/expected and handle accordingly? If so, perhaps the spec could be clarified - my original understanding from reading it was that Flight/IPC data is always supposed to be aligned (whereas Arrow recommends alignment), but if that's wrong then please correct me ;) We've been trying to use Flight internally (at work), feeding the resulting data into Polars (which I contribute to, and which uses Rust's arrow2 implementation), but we cannot as it consistently results in errors along the lines of Thanks! Footnotes
|
IIRC, its due to gRPC or protobuf that there's never a guarantee of Flight data having the buffers aligned. (I have had issues regarding the misaligned buffers for months at work, see pola-rs/polars#6315. I haven't found any zero-copy solutions yet.) |
#35679 was the fix, but it was not accepted. There is no zero-copy solution because of Protobuf indeed, but gRPC effectively forces a copy on you anyways in C++; it may be possible to do one copy instead of two, but I don't believe you can get to zero-copy except in unusual circumstances. It may be possible to have servers pad the Protobuf message in a way to align the data, but that would also not be a universal fix (and would not be backwards compatible, since some implementations unfortunately reject unknown field tags instead of ignoring them as they should) |
I'll see if I can find the time to take a second look, given that I'm digging around here again. At the very least, I want to fix C++ rejecting unknown tags (#36975) |
Thanks all; if a general fix becomes available that would be ideal, otherwise looks like it'll probably require an extra copy to get it working. |
@alexander-beedie Arrow only recommends alignment for internal processes, it requires it for IPC.
If Flight is handing out misaligned buffers to IPC it is non-compliant and must be fixed. |
That was also my understanding. Some clarification of the docs, the spec, or a fix would be great, otherwise all downstream consumers expecting aligned data are going to discover (and have to solve) the issue individually. |
It was never the intention that flight data on the receiving side must be aligned. The intention some of us always had was if you need something aligned and it isn't you pay the copy then, not before. It'd be great for individual flight clients to have an option to ensure aligned with clear messaging about this possibly incurring a copy but forcing flight to always be aligned because some people rely on alignment just punishes those that don't. If presented as an option, this allows the individual client implementations to choose to optimize away copies now, later or never. (As a note the documentation cited above says nothing about the receiving side afaict). Wrt to unknown tag padding. We could simply add one more dummy field (I don't believe there is any requirement for tags to be in order in messages) or simply restate a field with the first one being arbitrary padding. If I recall correctly, protobuf spec is that the last tag value for a specific tag within a message wins and previous ones should be ignored. (comes from streaming where something can be overridden by being expressed a second time) |
I still intend to look at the options here when I get a chance. I need to do some refactoring in C++ first to be able to pass down options to the deserializer. Also, see #34485 where we may want to evaluate a new framing format for Flight to support this and also things like messages that don't fit within a single gRPC/Protobuf message, and #37900 where we may want to evaluate if we can get more control over memory usage from Flight/gRPC. |
Protobuf's wire format design + our zero-copy serializer/deserializer mean that buffers can end up misaligned. On some Arrow versions, this can cause segfaults in kernels assuming alignment (and generally violates expectations).
We should:
Possibly include buffer alignment in array validation
See if we can adjust the serializer to somehow pad things properly
See if we can do anything about this in the deserializer
Example:
On Arrow 8
On Arrow 7
Reporter: David Li / @lidavidm
Note: This issue was originally created as ARROW-16958. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: