-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add offset_to_first_elem method for Array #994
Conversation
src/impl_owned_array.rs
Outdated
return None; | ||
} | ||
let offset = if std::mem::size_of::<A>() == 0 { | ||
-dimension::offset_from_ptr_to_memory(&self.dim, &self.strides) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we use this dimension function in both cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because the shape and strides are not sufficient information. Consider, for example:
let a = Array2::<f64>::zeros([3, 4]).slice_move(s![1.., ..]);
let b = Array2::<f64>::zeros([2, 4]);
assert_eq!(a.shape(), b.shape());
assert_eq!(a.strides(), b.strides());
assert_ne!(
unsafe { a.as_ptr().offset_from(a.into_raw_vec().as_ptr()) },
unsafe { b.as_ptr().offset_from(b.into_raw_vec().as_ptr()) },
);
a
and b
have the same shape and strides but different offsets.
The only reason for calling dimension::offset_from_ptr_to_memory
here is to get an offset such that indices computed with offset + index[0]*strides[0] + index[1]*strides[1] + ...
will be in-bounds for the result of .into_raw_vec()
. We could return value return None
or Some(0)
for the zero-sized element case, but then it's more likely that the caller will have to explicitly deal with the zero-sized element case too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right. Thanks! That's the three "starts" of the array in memory again and I mixed up (1) and (2)...
- start of memory allocation
- lowest address of an element in the array
- address of logically first element in the array..
I guess if we try and find good names for these (again). I wouldn't mind allocation head, memory head, logical head or something along those lines. Maybe "memory" is too vague to be useful(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like "allocation ptr" and "logical ptr/offset". "Memory head" to refer to the lowest address of any element could be easily confused with the "allocation head". I think "low addr ptr/offset" is a little less likely to be confused with "allocation ptr".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mixed up (1) and (2)...
Yeah, this makes me realize that we should really rename dimension::offset_from_ptr_to_memory
to dimension::offset_from_logical_ptr_to_low_addr_ptr
. (Actually, I'd prefer to have a dimension::offset_from_low_addr_ptr_to_logical_ptr
function instead so that the return value is always nonnegative.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "logical ptr" is very descriptive either. Offset >= 0 sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind "logical ptr", but we can think of another name. Maybe "zero idx ptr" / "zero index ptr"?
src/impl_owned_array.rs
Outdated
/// let offset = arr.offset_to_first_elem().unwrap(); | ||
/// assert_eq!(offset, 0); | ||
/// ``` | ||
pub fn offset_to_first_elem(&self) -> Option<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use the name logic "offset from memory to first elem" again? (Edit: So maybe offset from allocation to first elem?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, offset_from_memory_to_ptr
, offset_from_memory_to_first
, offset_from_memory_to_first_elem
, offset_from_alloc_to_ptr
, offset_from_alloc_to_first
, and offset_from_alloc_to_first_elem
are clearer than offset_to_first_elem
. Another, perhaps better, option is to change .into_raw_vec()
to return (vec, offset)
instead of adding a new method. This would make it harder to accidentally misuse .into_raw_vec()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if the offset is enough information, maybe that's good. Maybe we should just offer an easier to use interface? Something that copies to an in-order vec without trickery (but I think it already exists).
Now it doesn't seem to be happening quickly, but there will come a time when we have more widespread of custom allocation or non-Vec allocation. So into raw vec won't be useful forever, unless one restricts to only Vec-allocated arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the discussion in the other comment chain, I'm thinking that a good name would be offset_from_alloc_to_logical_ptr
. How about making it an internal-only method for now and changing .into_raw_vec()
to return (vec, offset)
as I suggested above?
Edit: For some reason, GitHub is not automatically updating the discussion, so I missed the message above. It seems useful to be able to reuse the existing allocation if possible, like the numpy
crate does, so I don't think that we should remove into_raw_vec
entirely. Even with custom allocation, I'd think that we could return a Vec
parameterized on the custom allocator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, now that I think about it, I wonder if std
will add a trait for Vec
-like objects to minimize the proliferation of type parameters. The current ndarray
API based on ArrayBase
instead of traits can be inconvenient sometimes. Once custom (non-global) allocators are supported, Vec
will be similar to ArrayBase
, since it'll be generic over the allocator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
I've long thought that we might parameterize the Array/ArrayBase not just by the Vec's allocator but by wholly new kinds of owned arrays - let's say some other allocation system, like always 64-byte aligned or what it could be, or maybe always using some external specific allocation function. In those cases we don't use Vec or std allocation traits at all, maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, an ArrayVec
-backed n-D array type would be useful too. Even with additional owned types/parameters, though, we could provide .into_raw_vec()
for only the type parameters that make sense. Long-term, I'd like to move away from ArrayBase
providing the majority of the functionality to a set of traits describing n-D array functionality, implemented for the various array types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agreed. It just seems to be a lot of work getting to the traitification. Maybe we can start it small and do it piece by piece, somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we can do it piece-by-piece. We can introduce traits one-by-one and implement them for ArrayBase
. For example, we can start with an unsafe trait to get the ptr, dim, and strides, and then add a trait using it for readonly indexing/slicing (with methods returning element references and ArrayView
s). I expect it'll take multiple iterations to fully refine the set of traits we need.
While it's a breaking change, we wait with merging this until the 0.15.x releases are done |
pub fn offset_from_ptr_to_memory<D: Dimension>(dim: &D, strides: &D) -> isize { | ||
/// This function computes the offset from the lowest address element to the | ||
/// logically first element. The result is always >= 0. | ||
pub fn offset_from_low_addr_ptr_to_logical_ptr<D: Dimension>(dim: &D, strides: &D) -> usize { | ||
let offset = izip!(dim.slice(), strides.slice()).fold(0, |_offset, (d, s)| { | ||
if (*s as isize) < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about whether it would be better to change the condition to if (*s as isize) < 0 && *d != 0
. In this way, we can allow the stride to be negative when the Axis value is 0 (theoretically, the stride can be any value at this time), and it can avoid pointer out of bounds and cause panic in the following case:
use ndarray::{Array, ShapeBuilder};
fn main() {
let s = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13];
let a= Array::from_shape_vec((2, 0, 2).strides((1, -4isize as usize, 2)),s[0..12].to_vec()).unwrap();
println!("{:?}", a);
}
Output:
[[[]]], shape=[2, 0, 2], strides=[1, -4, 2], layout=Cc (0x5), const ndim=3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this seems like a bug that should be fixed in 0.15.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good catch, @SparrowLii. I've created #998 to fix this in 0.15.2.
This adds a method to get the offset from the start of the allocation to the first element. Without this method, correctly reinterpreting the results of
.into_raw_vec()
as an n-D array is tricky.I came across a bug in the
numpy
crate (PyO3/rust-numpy#182) due to not handling this properly, so the the primary purpose of this PR is to help properly fix that bug.The zero-sized-element case is a bit weird, because for many use cases, a return value of
Some(0)
would be fine. However, if we want the user to be able to compute in-bounds indices into theVec
using the offset, shape, and strides, we need to compute an offset that will work based on the shape and strides.I've fixed/suppressed a couple of clippy warnings unrelated to
offset_to_first_elem
which have appeared with the new release of Rust.By the way, is there a better name for this method?