Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-32276: [C++][FlightRPC] Align buffers from Flight #35679

Closed
wants to merge 1 commit into from

Conversation

lidavidm
Copy link
Member

@lidavidm lidavidm commented May 18, 2023

Rationale for this change

Ensure buffers from Flight are at least 8-byte aligned to meet general expectations of downstream users.

What changes are included in this PR?

Manually align buffers after deserialization.

Are these changes tested?

Yes

Are there any user-facing changes?

Flight data will be aligned, at the cost of an additional copy.

@github-actions
Copy link

@lidavidm
Copy link
Member Author

lidavidm commented May 18, 2023

I inlined some stuff from @westonpace's PR but I'd rather wait and be able to use those utilities directly. That said it demonstrates the problem and the solution.

Given more time we might be able to avoid the extra copies, but I'd need to dig deeper into Protobuf & do some benchmarking. (In particular: instead of going gRPC slices -> CodedInputStream -> Arrow buffers, I'd rather directly go from gRPC slices -> Arrow.)

@lidavidm lidavidm force-pushed the gh-32276-flight-align branch from cda6048 to c9d1b82 Compare May 30, 2023 12:29
@lidavidm lidavidm marked this pull request as ready for review May 30, 2023 12:30
@lidavidm lidavidm requested a review from pitrou May 30, 2023 17:17
// XXX: due to where we sit, we can't use a custom allocator
// XXX: any error here will likely crash or hang gRPC!
auto status =
util::EnsureAlignment(std::move(out->body), 64, default_memory_pool())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't 64 a bit heavy-handed? Basically, and assuming the distribution of gRPC slice alignments is uniform, we will reallocate almost all incoming buffers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be configurable as not all code using the C++ flight client is sensitive to alignment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misaligned data pointers can be a significant hurdle when working with client libraries that require proper alignment. If these libraries offered better built-in support for data alignment, it would greatly reduce the challenges for developers.

Ensuring data is aligned by default leads to a smoother, more efficient developer experience.

@@ -380,6 +382,14 @@ ::grpc::Status FlightDataDeserialize(ByteBuffer* buffer,
return ::grpc::Status(::grpc::StatusCode::INTERNAL,
"Unable to read FlightData body");
}
// XXX: due to where we sit, we can't use a custom allocator
// XXX: any error here will likely crash or hang gRPC!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a problem and could actually be a regression for current users of the Flight C++ server, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true of any error returned here. Last I checked, gRPC doesn't actually handle errors despite this being a fallible API, instead choosing to crash.

@pitrou
Copy link
Member

pitrou commented May 31, 2023

I'm skeptical that we should force this onto users of Flight C++, many of whom may not be concerned with misaligned buffers.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels May 31, 2023
@lidavidm
Copy link
Member Author

Given the nature of the gRPC API there's not a great way to customize it at runtime. If someone complains about this in the future we can point them here then. Possibly we can make it a compile-time toggle, though that's effectively useless to most users.

@westonpace @rtpsw were the most recent to mention this. If there's not a real desire for this then I'll close the PR.

@pitrou
Copy link
Member

pitrou commented May 31, 2023

This could be settable using an env var. But I don't think we should realign buffers by default. This can decrease performance and increase memory fragmentation for unsuspecting users.

@westonpace
Copy link
Member

Now that we have a workaround enabled in Acero I don't have any urgent need for this. I'm fine with waiting for it to become a problem (and it may never do so).

@orlp
Copy link

orlp commented Aug 30, 2023

@pitrou

But I don't think we should realign buffers by default.

The arrow standard requires it.

Implementations are recommended to allocate memory on aligned addresses (multiple of 8- or 64-bytes) and pad (overallocate) to a length that is a multiple of 8 or 64 bytes. When serializing Arrow data for interprocess communication, these alignment and padding requirements are enforced.

@pitrou
Copy link
Member

pitrou commented Aug 30, 2023

Hmm, the language in our columnar spec is weird. Alignment has no defined meaning in an IPC stream (but padding has).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][FlightRPC] Flight generates misaligned buffers
6 participants