Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer from external memory doesn't match ScalarBuffer alignment requirement #4431

Closed
viirya opened this issue Jun 17, 2023 · 4 comments
Closed
Labels

Comments

@viirya
Copy link
Member

viirya commented Jun 17, 2023

Describe the bug

Recently some memory alignment error was found while passing arrays from Java arrow to Rust arrow. The error comes from From<Buffer> for ScalarBuffer<T>. I inspected the error further and am wondering if the alignment requirement doesn't make sense (at least for working with external allocated memory).

The idea of alignment check in From<Buffer> is making sure that the buffer pointer is aligned with scalar type T. The existing test shows that:

fn test_unaligned() {
  let expected = [0_i32, 1, 2];
  let buffer = Buffer::from_iter(expected.iter().cloned());
  let buffer = buffer.slice(1);
  ScalarBuffer::<i32>::new(buffer, 0, 2);
}

So the buffer slice is not aligned anymore with i32, so converting it to i32 ScalarBuffer panics with alignment error. But the assumption here is the memory region of the buffer is allocated with type. In Java Arrow, it allocates buffer simply with required number of bytes without type information, it's basically a contiguous memory region with n bytes. So obviously it doesn't necessarily have alignment with some type T there. When we try to pass array backed by the buffer to Rust arrow, the alignment check in From<Buffer> will fail occasionally (sometimes it passes the check if the allocated buffer happens to align with T).

To Reproduce

Expected behavior

Additional context

@viirya viirya added the bug label Jun 17, 2023
@viirya
Copy link
Member Author

viirya commented Jun 17, 2023

I made some change to skip alignment check in From<Buffer> and let Deref panic when trying to deref a ScalarBuffer as [T]. However, currently a ScalarBuffer is assumed to be access as a slice using slice APIs, e.g., getting offset through OffsetBuffer. And we already expose these scalar buffers as slice, e.g. GenericByteArray.value_offsets. To get rid of these slice assumption seems some breaking changes.

@tustvold
Copy link
Contributor

This is a fairly hard-coded assumption that boils down to Rust's fairly strict alignment rules, there isn't really a way to circumvent this. FWIW the arrow specification explicitly calls out that unaligned buffers are problematic, the Java implementation should probably be "fixed", as this will also cause issues for other implementations

@viirya
Copy link
Member Author

viirya commented Jun 20, 2023

Yea, I think I agree with you that this is Rust's alignment rules so seems no good way to avoid it.

What I am not sure about is Arrow spec. I found there is also similar issue on C++ apache/arrow#33336. And it points out that the spec does not require alignment but recommends it.

As this seems not possible to avoid in Rust, I will look into Java Arrow to see if it could be fixed there.

@tustvold
Copy link
Contributor

Closed by #4681

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants