diff --git a/docs/_toc.yml b/docs/_toc.yml index 5a13d5e92a..9f50ff13ce 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -99,9 +99,9 @@ subtrees: - file: user-guide/how-to-filter-cut-mask title: "Cuts vs. masks [todo]" - file: user-guide/how-to-filter-ragged - title: "Slicing lists within arrays" + title: "Using ragged arrays" - file: user-guide/how-to-filter-masked - title: "Slices with missing values [todo]" + title: "Using arrays with missing values" - file: user-guide/how-to-restructure title: "Restructuring data" diff --git a/docs/user-guide/how-to-filter-masked.md b/docs/user-guide/how-to-filter-masked.md index c40cc613a1..9305374a82 100644 --- a/docs/user-guide/how-to-filter-masked.md +++ b/docs/user-guide/how-to-filter-masked.md @@ -4,20 +4,128 @@ jupytext: extension: .md format_name: myst format_version: 0.13 - jupytext_version: 1.10.3 + jupytext_version: 1.14.4 kernelspec: - display_name: Python 3 + display_name: Python 3 (ipykernel) language: python name: python3 --- -How to use slices that have missing values -========================================== +How to filter with arrays containing missing values +=================================================== -**This is a stub:** I intend to write this article, but haven't yet. +```{code-cell} ipython3 +import awkward as ak +import numpy as np +``` -If you need it soon, create an issue saying so and I'll make it a higher priority. +(how-to-filter-ragged:indexing-with-missing-values)= +## Indexing with missing values +In {ref}`how-to-filter-masked:building-an-awkward-index`, we looked building arrays of integers to perform awkward indexing using {func}`ak.argmin` and {func}`ak.argmax`. In particular, the `keepdims` argument of {func}`ak.argmin` and {func}`ak.argmax` is very useful for creating arrays that can be used to index into the original array. However, reducers such as {func}`ak.argmax` behave differently when they are asked to operate upon empty lists. -[![](../image/github-issues-documentation.png)](https://github.com/scikit-hep/awkward-1.0/issues/new?assignees=&labels=docs&template=documentation.md&title=) +Let's first create an array that contains empty sublists: -The text of your issue doesn't have to be much more than a link to this page, so I can be sure which page you're referring to. If you add details about how and why you need it, however, I may be able to tailor the text to help you more. +```{code-cell} ipython3 +array = ak.Array( + [ + [], + [10, 3, 2, 9], + [4, 5, 5, 12, 6], + [], + [8, 9, -1], + ] +) +array +``` + +Awkward reducers accept a `mask_identity` argument, which changes the {attr}`ak.Array.type` and the values of the result: + +```{code-cell} ipython3 +ak.argmax(array, keepdims=True, axis=-1, mask_identity=False) +``` + +```{code-cell} ipython3 +ak.argmax(array, keepdims=True, axis=-1, mask_identity=True) +``` + +Setting `mask_identity=True` yields the identity value for the reducer instead of `None` when reducing empty lists. From the above examples of {func}`ak.argmax`, we can see that the identity for the {func}`ak.argmax` is `-1`: What happens if we try and use the array produced with `mask_identity=False` to index into `array`? + ++++ + +As discussed in {ref}`how-to-filter-ragged:indexing-with-argmin-and-argmax`, we first need to convert _at least_ one dimension to a ragged dimension + +```{code-cell} ipython3 +index = ak.from_regular( + ak.argmax(array, keepdims=True, axis=-1, mask_identity=False) +) +``` + +Now, if we try and index into `array` with `index`, it will raise an exception + +```{code-cell} ipython3 +:tags: [raises-exception] + +array[index] +``` + +From the error message, it is clear that for some sublist(s) the index `-1` is out of range. This makes sense; some of our sublists are empty, meaning that there is no valid integer to index into them. + +Now let's look at the result of indexing with `mask_identity=True`. + +```{code-cell} ipython3 +index = ak.argmax(array, keepdims=True, axis=-1, mask_identity=True) +``` + +Because it contains an option type, `index` already satisfies rule (2) in {ref}`how-to-filter-masked:building-an-awkward-index`, and we do not need to convert it to a ragged array. We can see that this index succeeds: + +```{code-cell} ipython3 +array[index] +``` + +Here, the missing values in the index array correspond to missing values _in the output array_. + ++++ + +## Indexing with missing sublists + +Ragged indexing also supports using `None` in place of _empty sublists_ within an index. For example, given the following array + +```{code-cell} ipython3 +array = ak.Array( + [ + [10, 3, 2, 9], + [4, 5, 5, 12, 6], + [], + [8, 9, -1], + ] +) +array +``` + +let's use build a ragged index to pull out some particular values. Rather than using empty lists, we can use `None` to mask out sublists that we don't care about: + +```{code-cell} ipython3 +array[ + [ + [0, 1], + None, + [], + [2], + ], +] +``` + +If we compare this with simply providing an empty sublist, + +```{code-cell} ipython3 +array[ + [ + [0, 1], + [], + [], + [2], + ], +] +``` + +we can see that the `None` value introduces an option-type into the final result. `None` values can be used at _any_ level in the index array to introduce an option-type at that depth in the result. diff --git a/docs/user-guide/how-to-filter-ragged.md b/docs/user-guide/how-to-filter-ragged.md index 8442b073af..f573f29d3d 100644 --- a/docs/user-guide/how-to-filter-ragged.md +++ b/docs/user-guide/how-to-filter-ragged.md @@ -4,26 +4,24 @@ jupytext: extension: .md format_name: myst format_version: 0.13 - jupytext_version: 1.14.1 + jupytext_version: 1.14.4 kernelspec: display_name: Python 3 (ipykernel) language: python name: python3 --- -How to filter lists within arrays using ragged slicing -====================================================== +How to filter with ragged arrays +================================ ```{code-cell} ipython3 import awkward as ak import numpy as np ``` -## What is ragged slicing? +## What is awkward indexing? -+++ - -One of the most powerful features of NumPy is the expressiveness of its indexing system. A NumPy array [can be sliced in many different ways](https://numpy.org/doc/stable/user/basics.indexing.html#basic-indexing), such as with a single integer, or an array of integers. Awkward Array implements most of these indexing styles, but adds an additional variant: _ragged indexing_. +One of the most powerful features of NumPy is the expressiveness of its indexing system. A NumPy array [can be sliced in many different ways](https://numpy.org/doc/stable/user/basics.indexing.html#basic-indexing), such as with a single integer, or an array of integers. Awkward Array implements most of these indexing styles, but adds an additional variant: _awkward indexing_. +++ @@ -63,17 +61,18 @@ type: 3 * var * var * float64 +++ -To produce this result, we need ragged indexing. +To produce this result, we need awkward indexing. +++ -## Building a ragged index +(how-to-filter-masked:building-an-awkward-index)= +## Building an awkward index +++ -Ragged indexing requires an index array that +Awkward indexing requires an index array that 1. has a structure matching the array being sliced **up to** (but not including) the final dimension of the index -2. has at _least_ one ragged (`var`) dimension. +2. has at _least_ one ragged (`var`) dimension **or** contain missing values By structure, we mean the number of sublists in each dimension, which can be seen with {func}`ak.num`: @@ -91,11 +90,11 @@ ak.num(array, axis=0) ak.num(array, axis=1) ``` -To put this more simply, the final dimension of the ragged index is used to pull items out of the array. Therefore, Awkward needs the preceeding dimensions to line up! +To put this more simply, the final dimension of the awkward index is used to pull items out of the array. Therefore, Awkward needs the preceeding dimensions to line up! +++ -Recall that we wanted to pull out the following result from `array` using ragged indexing: +Recall that we wanted to pull out the following result from `array` using awkward indexing: ``` [[[], [3.3], [7.7]], [], @@ -134,7 +133,7 @@ array ak.local_index(array) ``` -To create our ragged index, all we need to do is create an array _like_ `ak.local_index(array)`, but with only the local indices that we want to keep, i.e. +To create our awkward index, all we need to do is create an array _like_ `ak.local_index(array)`, but with only the local indices that we want to keep, i.e. ```{code-cell} ipython3 index = ak.Array( @@ -152,7 +151,7 @@ We can see that this array matches the leading structure of `array`, and has at index.type.show() ``` -Let's see what slicing `array` with this ragged index looks like: +Let's see what slicing `array` with this awkward index looks like: ```{code-cell} ipython3 array[index] @@ -162,11 +161,12 @@ Clearly this index produces the result that we were aiming for! +++ +(how-to-filter-ragged:indexing-with-argmin-and-argmax)= ## Indexing with `argmin` and `argmax` +++ -Ragged indexing is especially useful when combined with the positional {func}`ak.argmin` and {func}`ak.argmax` reducers. These functions accept an `keepdims=True` argument that can be used to keep _the same number of dimensions_ as the original array. +Awkward indexing is especially useful when combined with the positional {func}`ak.argmin` and {func}`ak.argmax` reducers. These functions accept an `keepdims=True` argument that can be used to keep _the same number of dimensions_ as the original array. There is also a `mask_identity` argument is explained in {ref}`how-to-filter-ragged:indexing-with-missing-values`. For now, we will set it to `False`. ```{code-cell} ipython3 array = ak.Array( @@ -179,89 +179,55 @@ array = ak.Array( array ``` -Without `keepdims=True`, all reducers collapse a dimension of the original array +With `keepdims=False`, all reducers collapse a dimension of the original array: ```{code-cell} ipython3 -ak.argmin(array, axis=1) +ak.argmin(array, axis=1, keepdims=False, mask_identity=False) ``` If we try and use this index to slice `array`, it will likely not produce the result we might initially expect: ```{code-cell} ipython3 -array[ak.argmin(array, axis=1)] +array[ak.argmin(array, axis=1, keepdims=False, mask_identity=False)] ``` Instead of pulling out the smallest items in `array` along `axis=1`, we have simply re-arranged the sublists of `array` along `axis=0`. Our index has only a single dimension, so for each value in `ak.argmin(array, axis=-1)`, Awkward pulls out the corresponding item from `array`. We want to pull values out of the _second_ dimension, so our index array needs to be two dimensional. +++ -Let's now look at what happens with `keepdims=True`: - -```{code-cell} ipython3 -ak.argmin(array, axis=-1, keepdims=True) -``` +Let's now look at what happens with `keepdims=True`. The result is a two dimensional, fully regular array, with no missing values: ```{code-cell} ipython3 -array[ak.argmin(array, axis=-1, keepdims=True)] +ak.argmin(array, axis=-1, keepdims=True, mask_identity=False) ``` -This now produces the expected result! - -+++ - -## Filtering with missing sublists - -+++ - -Ragged indexing supports using `None` in place of empty sublists within an index. For example +Before we can use this as an index array, we need to convert _at least_ one dimension to a ragged dimension. This follows from rule (2) described in {ref}`how-to-filter-masked:building-an-awkward-index`. ```{code-cell} ipython3 -array = ak.Array( - [ - [10, 3, 2, 9], - [4, 5, 5, 12, 6], - [], - [8, 9, -1], - ] +ak.from_regular( + ak.argmin(array, axis=-1, keepdims=True, mask_identity=False) ) -array ``` -Let's use build a ragged index to pull some values out of `array`. Rather than using empty lists, we can use `None` to mask out sublists that we don't care about: +We can now use this array to index into `array`: ```{code-cell} ipython3 array[ - [ - [0, 1], - None, - [], - [2], - ], + ak.from_regular( + ak.argmin(array, axis=-1, keepdims=True, mask_identity=False) + ) ] ``` -If we compare this with simply providing an empty sublist, - -```{code-cell} ipython3 -array[ - [ - [0, 1], - [], - [], - [2], - ], -] -``` - -we can see that the `None` value introduces an +it produces the expected result! +++ ## Filtering with booleans +As described in {ref}`how-to-filter-masked:building-an-awkward-index`, Awkward Array's awkward indexing is a generalisation of the advanced indexing supported by NumPy. It is therefore reasonable to ask whether Awkward supports awkward indexing with +_boolean_ values, selecting only values for which the index is `True`. -+++ - -Awkward Array's ragged indexing is a generalisation of the advanced indexing supported by NumPy. It is therefore reasonable to ask whether Awkward supports ragged indexing with boolean values, selecting only values for which the index is `True`. Let's create an array of integers: +Let's create an array of integers: ```{code-cell} ipython3 numbers = ak.Array( @@ -273,20 +239,17 @@ numbers = ak.Array( ) ``` -We can use ragged indexing to keep only the even values. Let's generate a boolean mask with the same structure as `numbers`. In order for there to be a single boolean value for each item in `numbers`, the filter array must have exactly the same number of elements. Ufuncs are powerful means of generating boolean masks, as they directly preserve the exact structure of the original array: +We can use awkward indexing to keep only the even values. Let's generate a boolean mask with the same structure as `numbers`. In order for there to be a single boolean value for each item in `numbers`, the filter array must have exactly the same number of elements. Ufuncs, such as {func}`np.mod`, are powerful tools for generating boolean masks, as they directly preserve the exact structure of the original array: ```{code-cell} ipython3 is_even = (numbers % 2) == 0 +is_even ``` ```{code-cell} ipython3 numbers ``` -```{code-cell} ipython3 -is_even -``` - Now we can use `is_even` to slice `numbers`: ```{code-cell} ipython3 @@ -310,3 +273,11 @@ numbers_np[(numbers_np % 2) == 0] ``` NumPy, lacking a ragged array structure, has to flatten the result whereas Awkward Array preserves the number of dimensions in the result. + +```{code-cell} ipython3 +numbers[ + [[True, False, True, False], + [False], + [False, True, False]] +] +```