-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError: boolean index did not match indexed array in awkward 2.x vs 1.x #2214
Comments
You're right that The indexing behaviour in Awkward is driven by the index array's structure, so that's what I'm looking at in order to identify whether this is a bug or a usage issue. The cause of this bug is that the list that wraps the boolean array has offsets that don't start at zero. This is allowed, but sometimes we make assumptions that do not hold true. |
Thanks for the explanation and the quick fix! I will keep |
Using cbc2c4d, I believe I found another instance of this issue. This time, the array is masked to be fully empty. I managed to pickle it with In [4]: hybrid_to_det_level_valid_matches.layout
Out[4]:
<IndexedArray len='0'>
<index><Index dtype='int64' len='0'>
[]
</Index></index>
<content><ListOffsetArray len='183'>
<offsets><Index dtype='int64' len='184'>
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
</Index></offsets>
<content><NumpyArray dtype='bool' len='2'>[False False]</NumpyArray></content>
</ListOffsetArray></content>
</IndexedArray> Based on your comments in the PR, I guess this is an instance of additional bugs popping up. Could you please take a look? Thanks! |
Could you clarify what isn't working with this array? What operation are you performing that is failing, or producing unexpected results? :) |
Sorry, this wasn't my best bug report - was moving too quickly :-) I'm doing some matching between three different collections of jets (the matching itself is done with numba). I build up a mask to remove jets, and then for the remaining jets, I want to assign the index of the matched jets to a particular field. The relevant code looks like this (including for completeness, but I'm not sure it will be meaningful): # Convention is -1 is a match, anything else is an index
jets["part_level", "matching"] = matching_indices_1
jets["det_level", "matching"] = matching_indices_2
jets["hybrid", "matching"] = matching_indices_3
# Mask out event if there are no jets
jets_present_mask = (
(ak.num(jets["part_level"], axis=1) > 0)
& (ak.num(jets["det_level"], axis=1) > 0)
& (ak.num(jets["hybrid"], axis=1) > 0)
)
jets = jets[jets_present_mask]
# Assign indices
hybrid_to_det_level_valid_matches = jets["hybrid", "matching"] > -1
det_to_part_level_valid_matches = jets["det_level", "matching"] > -1
# Index error thrown on the next line!
hybrid_to_det_level_including_det_to_part_level_valid_matches = det_to_part_level_valid_matches[
jets["hybrid", "matching"][hybrid_to_det_level_valid_matches]
] Sometimes File ~/software/dev/mammoth/src/mammoth/framework/analysis/jets.py:167, in jet_matching_embedding(jets, det_level_hybrid_max_matching_distance, part_level_det_level_max_matching_distance)
165 import IPython; IPython.embed()
166 #ak.to_buffers(hybrid_to_det_level_valid_matches)
--> 167 logger.warning(f'{jets["hybrid", "matching"][hybrid_to_det_level_valid_matches]}')
168 hybrid_to_det_level_including_det_to_part_level_valid_matches = det_to_part_level_valid_matches[
169 jets["hybrid", "matching"][hybrid_to_det_level_valid_matches]
170 ]
171 # First, restrict the hybrid level, requiring hybrid to det_level valid matches and
172 # det_level to part_level valid matches.
File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/highlevel.py:951, in Array.__getitem__(self, where)
522 """
523 Args:
524 where (many types supported; see below): Index of positions to
(...)
948 have the same dimension as the array being indexed.
949 """
950 with ak._errors.SlicingErrorContext(self, where):
--> 951 out = self._layout[where]
952 if isinstance(out, ak.contents.NumpyArray):
953 array_param = out.parameter("__array__")
File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/contents/content.py:531, in Content.__getitem__(self, where)
530 def __getitem__(self, where):
--> 531 return self._getitem(where)
File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/contents/content.py:576, in Content._getitem(self, where)
573 return out._getitem_at(0)
575 elif isinstance(where, ak.highlevel.Array):
--> 576 return self._getitem(where.layout)
578 # Convert between nplikes of different backends
579 elif (
580 isinstance(where, ak.contents.Content)
581 and where.backend is not self._backend
582 ):
File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/contents/content.py:655, in Content._getitem(self, where)
652 return where.to_NumpyArray(np.int64)
654 elif isinstance(where, Content):
--> 655 return self._getitem((where,))
657 elif ak._util.is_sized_iterable(where):
658 # Do we have an array
659 nplike = ak._nplikes.nplike_of(where, default=None)
File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/contents/content.py:557, in Content._getitem(self, where)
554 return self
556 # Normalise valid indices onto well-defined basis
--> 557 items = ak._slicing.normalise_items(where, self._backend)
558 # Prepare items for advanced indexing (e.g. via broadcasting)
559 nextwhere = ak._slicing.prepare_advanced_indexing(items)
File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/_slicing.py:260, in normalise_items(where, backend)
258 common_backend = ak._backends.common_backend([backend, where_backend])
259 # First prepare items for broadcasting into like-types
--> 260 return [normalise_item(x, backend=common_backend) for x in where]
File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/_slicing.py:260, in <listcomp>(.0)
258 common_backend = ak._backends.common_backend([backend, where_backend])
259 # First prepare items for broadcasting into like-types
--> 260 return [normalise_item(x, backend=common_backend) for x in where]
File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/_slicing.py:205, in normalise_item(item, backend)
203 # Ragged indexing should be performed with integer contents
204 elif isinstance(item, ak.contents.Content):
--> 205 out = _normalise_item_bool_to_int(_normalise_item_nested(item), backend)
206 assert out.backend is backend
207 if isinstance(out, ak.contents.NumpyArray):
File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/_slicing.py:445, in _normalise_item_bool_to_int(item, backend)
443 item = item.to_ListOffsetArray64(True)
444 localindex = ak._do.local_index(item, axis=1)
--> 445 nextcontent = localindex.content.data[item.content.data]
447 cumsum = item_backend.index_nplike.empty(
448 item.content.data.shape[0] + 1, dtype=np.int64
449 )
450 cumsum[0] = 0
IndexError: boolean index did not match indexed array along dimension 0; dimension is 0 but corresponding boolean dimension is 2 (note that I added a log message for debugging in this stack track just as I was trying to isolate the issue) I guess this was happening before, but I didn't see it because it threw the IndexError that I originally reported (and it's somewhat rarer). I recognize that having the mask become an empty array is a bit of an edge case, but this worked for awkward 1.x, so I was expecting it to work for 2.x . Also, when I tried In summary, this appears to be the same issue, but yeah, I don't think I included enough details in my follow up :-) Hopefully this is clearer! |
Although I can't reproduce your particular bug without any data, I can see that there is a bug in that region of code, so I've made a PR that I think would address it #2246. You're welcome to try out that branch, by cloning it and following the instructions for installing Awkward from the README, or wait until a release! Alternatively, feel free to provide a reproducer that I can include in our test suite :) |
Sorry, I'm a bit confused - did the pickle that I posted via Edit: ah, do you need the array I was trying to apply it to as well? I'm happy to provide something to put into the test suite as long as I can reasonably disentangle it. Thanks for your help! |
That's it! If this line is failing jets["hybrid", "matching"][hybrid_to_det_level_valid_matches] then I just need these two arrays: import pickle
array = jets["hybrid", "matching"]
index = hybrid_to_det_level_valid_matches
raw_data = [
ak.to_buffers(array),
ak.to_buffers(index)
]
with open("debug.pickle.zip", "wb") as f:
pickle.dump(raw_data, f) |
Sorry that I overlooked this point, and thanks for bearing with me - I'm in a conference rush and didn't think this through :-) Here is the pickle as requested: debug.pickle.zip |
No stress! It was enough to start looking, and we can always just ask for more information when things like this happen anyway :) |
Ah, it looks like |
Excellent, that fixed it! Thanks! In terms of data for the test suit, below is the layout of the arrays, drilling down into them to expose (all of?) the data. I'm not sure if this is useful, but it's here for completeness. If there's another way to extract the arrays for tests, I'm also happy to provide it - just let me know In [2]: hybrid_to_det_level_valid_matches.layout
Out[2]:
<IndexedArray len='0'>
<index><Index dtype='int64' len='0'>
[]
</Index></index>
<content><ListOffsetArray len='183'>
<offsets><Index dtype='int64' len='184'>
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
</Index></offsets>
<content><NumpyArray dtype='bool' len='2'>[False False]</NumpyArray></content>
</ListOffsetArray></content>
</IndexedArray>
In [3]: hybrid_to_det_level_valid_matches.layout.content
Out[3]:
<ListOffsetArray len='183'>
<offsets><Index dtype='int64' len='184'>
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2]
</Index></offsets>
<content><NumpyArray dtype='bool' len='2'>[False False]</NumpyArray></content>
</ListOffsetArray>
In [4]: hybrid_to_det_level_valid_matches.layout.content.offsets
Out[4]:
<Index dtype='int64' len='184'>
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
</Index>
In [5]: jets["hybrid", "matching"].layout
Out[5]:
<IndexedArray len='0'>
<index><Index dtype='int64' len='0'>
[]
</Index></index>
<content><ListOffsetArray len='183'>
<offsets><Index dtype='int64' len='184'>
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
</Index></offsets>
<content><NumpyArray dtype='int64' len='2'>[-1 -1]</NumpyArray></content>
</ListOffsetArray></content>
</IndexedArray>
In [6]: jets["hybrid", "matching"].layout.content.offsets
Out[6]:
<Index dtype='int64' len='184'>
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
</Index> |
Version of Awkward Array
2.0.6
Description and code to reproduce
Following up on my previous report, I've found another case where the behavior varies from awkward 1.x to 2.x in an unexpected manner. The rough outline is that I'm trying to apply a mask to select some subset of nested data. In awkward 1.x, this works fine (I did this with a behavior, which you can see a glimpse of the functionality in the trace - here it's a bit for convenience). In awkward 2.x, this raises an IndexError (trace below):
Unfortunately, this has been tricky to reproduce since it's deeply nested in my code, and only seems to occur for some inputs (there's also a random input :-( ). I tried pickling the array, but in testing the pickled array, it works fine. This led me to guessing a workaround - that if I
ak.to_packed()
the data, it would work, and anecdotally, this seemed to have worked (in some previous issue, I learned that pickling calls something liketo_packed
). I managed to reproduce it once in a debug environment (unfortunately, unclear if it will work consistently - there are still some random elements), and grabbed the layouts from the initial array (failing) and on the packed array (working)original:
packed:
to_packed
seems like it's given the right behavior since the original and packed jets appear to have the same values, but I think it's a bug that it's needed here. Can you please take a look?(I'm not sure what the performance implications of
ak.to_packed
are here, but I need a quick fix, and it seems to work, so I'll run with it. If there are cheaper/better workarounds, I would appreciate any suggestions!)The text was updated successfully, but these errors were encountered: