Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add offset_to_first_elem method for Array #994

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions src/arrayformat.rs
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,8 @@ impl FormatOptions {
self.axis_collapse_limit = std::usize::MAX;
self.axis_collapse_limit_next_last = std::usize::MAX;
self.axis_collapse_limit_last = std::usize::MAX;
self
} else {
self
}
self
}

/// Axis length collapse limit before ellipsizing, where `axis_rindex` is
Expand Down
1 change: 1 addition & 0 deletions src/data_traits.rs
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,7 @@ unsafe impl<A> Data for OwnedArcRepr<A> {
}
}

#[allow(clippy::wrong_self_convention)]
fn to_shared<D>(self_: &ArrayBase<Self, D>) -> ArrayBase<OwnedArcRepr<Self::Elem>, D>
where
Self::Elem: Clone,
Expand Down
58 changes: 51 additions & 7 deletions src/impl_owned_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,55 @@ impl<A, D> Array<A, D>
where
D: Dimension,
{
/// Returns the offset (in units of `A`) from the start of the raw data to
/// the first element, or `None` if the array is empty.
///
/// In other words, after converting the array to `Vec<A>` with
/// [`.into_raw_vec()`](Self::into_raw_vec), this is the offset from the
/// start of the `Vec` to the first element of the array.
///
/// ```
/// use ndarray::{Array2, Axis};
///
/// let mut arr: Array2<f64> = Array2::zeros([3, 4]);
/// arr.slice_axis_inplace(Axis(0), (1..).into());
/// arr[[0, 0]] = 5.;
///
/// let offset = arr.offset_to_first_elem().unwrap();
/// assert_eq!(offset, 4);
///
/// let v = arr.into_raw_vec();
/// assert_eq!(v[offset], 5.);
/// ```
///
/// In the case of zero-sized elements, the offset is somewhat meaningless.
/// For convenience, an offset will be returned such that all indices
/// computed using the offset, shape, and strides will be in-bounds for the
/// `Vec<A>` returned by [`.into_raw_vec()`](Self::into_raw_vec). Note that
/// this offset won't necessarily be the same as the offset for an array of
/// nonzero-sized elements sliced in the same way.
///
/// ```
/// use ndarray::{Array2, Axis};
///
/// let mut arr: Array2<()> = Array2::from_elem([3, 4], ());
/// arr.slice_axis_inplace(Axis(0), (1..).into());
/// let offset = arr.offset_to_first_elem().unwrap();
/// assert_eq!(offset, 0);
/// ```
pub fn offset_to_first_elem(&self) -> Option<usize> {
Copy link
Member

@bluss bluss May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use the name logic "offset from memory to first elem" again? (Edit: So maybe offset from allocation to first elem?)

Copy link
Member Author

@jturner314 jturner314 May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, offset_from_memory_to_ptr, offset_from_memory_to_first, offset_from_memory_to_first_elem, offset_from_alloc_to_ptr, offset_from_alloc_to_first, and offset_from_alloc_to_first_elem are clearer than offset_to_first_elem. Another, perhaps better, option is to change .into_raw_vec() to return (vec, offset) instead of adding a new method. This would make it harder to accidentally misuse .into_raw_vec().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the offset is enough information, maybe that's good. Maybe we should just offer an easier to use interface? Something that copies to an in-order vec without trickery (but I think it already exists).

Now it doesn't seem to be happening quickly, but there will come a time when we have more widespread of custom allocation or non-Vec allocation. So into raw vec won't be useful forever, unless one restricts to only Vec-allocated arrays.

Copy link
Member Author

@jturner314 jturner314 May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the discussion in the other comment chain, I'm thinking that a good name would be offset_from_alloc_to_logical_ptr. How about making it an internal-only method for now and changing .into_raw_vec() to return (vec, offset) as I suggested above?

Edit: For some reason, GitHub is not automatically updating the discussion, so I missed the message above. It seems useful to be able to reuse the existing allocation if possible, like the numpy crate does, so I don't think that we should remove into_raw_vec entirely. Even with custom allocation, I'd think that we could return a Vec parameterized on the custom allocator.

Copy link
Member Author

@jturner314 jturner314 May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, now that I think about it, I wonder if std will add a trait for Vec-like objects to minimize the proliferation of type parameters. The current ndarray API based on ArrayBase instead of traits can be inconvenient sometimes. Once custom (non-global) allocators are supported, Vec will be similar to ArrayBase, since it'll be generic over the allocator.

Copy link
Member

@bluss bluss May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

I've long thought that we might parameterize the Array/ArrayBase not just by the Vec's allocator but by wholly new kinds of owned arrays - let's say some other allocation system, like always 64-byte aligned or what it could be, or maybe always using some external specific allocation function. In those cases we don't use Vec or std allocation traits at all, maybe.

Copy link
Member Author

@jturner314 jturner314 May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, an ArrayVec-backed n-D array type would be useful too. Even with additional owned types/parameters, though, we could provide .into_raw_vec() for only the type parameters that make sense. Long-term, I'd like to move away from ArrayBase providing the majority of the functionality to a set of traits describing n-D array functionality, implemented for the various array types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully agreed. It just seems to be a lot of work getting to the traitification. Maybe we can start it small and do it piece by piece, somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we can do it piece-by-piece. We can introduce traits one-by-one and implement them for ArrayBase. For example, we can start with an unsafe trait to get the ptr, dim, and strides, and then add a trait using it for readonly indexing/slicing (with methods returning element references and ArrayViews). I expect it'll take multiple iterations to fully refine the set of traits we need.

if self.is_empty() {
return None;
}
let offset = if std::mem::size_of::<A>() == 0 {
-dimension::offset_from_ptr_to_memory(&self.dim, &self.strides)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we use this dimension function in both cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the shape and strides are not sufficient information. Consider, for example:

let a = Array2::<f64>::zeros([3, 4]).slice_move(s![1.., ..]);
let b = Array2::<f64>::zeros([2, 4]);
assert_eq!(a.shape(), b.shape());
assert_eq!(a.strides(), b.strides());
assert_ne!(
    unsafe { a.as_ptr().offset_from(a.into_raw_vec().as_ptr()) },
    unsafe { b.as_ptr().offset_from(b.into_raw_vec().as_ptr()) },
);

a and b have the same shape and strides but different offsets.

The only reason for calling dimension::offset_from_ptr_to_memory here is to get an offset such that indices computed with offset + index[0]*strides[0] + index[1]*strides[1] + ... will be in-bounds for the result of .into_raw_vec(). We could return value return None or Some(0) for the zero-sized element case, but then it's more likely that the caller will have to explicitly deal with the zero-sized element case too.

Copy link
Member

@bluss bluss May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. Thanks! That's the three "starts" of the array in memory again and I mixed up (1) and (2)...

  1. start of memory allocation
  2. lowest address of an element in the array
  3. address of logically first element in the array..

I guess if we try and find good names for these (again). I wouldn't mind allocation head, memory head, logical head or something along those lines. Maybe "memory" is too vague to be useful(?)

Copy link
Member Author

@jturner314 jturner314 May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like "allocation ptr" and "logical ptr/offset". "Memory head" to refer to the lowest address of any element could be easily confused with the "allocation head". I think "low addr ptr/offset" is a little less likely to be confused with "allocation ptr".

Copy link
Member Author

@jturner314 jturner314 May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mixed up (1) and (2)...

Yeah, this makes me realize that we should really rename dimension::offset_from_ptr_to_memory to dimension::offset_from_logical_ptr_to_low_addr_ptr. (Actually, I'd prefer to have a dimension::offset_from_low_addr_ptr_to_logical_ptr function instead so that the return value is always nonnegative.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "logical ptr" is very descriptive either. Offset >= 0 sounds good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind "logical ptr", but we can think of another name. Maybe "zero idx ptr" / "zero index ptr"?

} else {
unsafe { self.as_ptr().offset_from(self.data.as_ptr()) }
};
debug_assert!(offset >= 0);
Some(offset as usize)
}

/// Return a vector of the elements in the array, in the way they are
/// stored internally.
///
Expand Down Expand Up @@ -491,13 +540,8 @@ impl<A, D> Array<A, D>

unsafe {
// grow backing storage and update head ptr
let data_to_array_offset = if std::mem::size_of::<A>() != 0 {
self.as_ptr().offset_from(self.data.as_ptr())
} else {
0
};
debug_assert!(data_to_array_offset >= 0);
self.ptr = self.data.reserve(len_to_append).offset(data_to_array_offset);
let data_to_array_offset = self.offset_to_first_elem().unwrap_or(0);
self.ptr = self.data.reserve(len_to_append).add(data_to_array_offset);

// clone elements from view to the array now
//
Expand Down