Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: boolean index did not match indexed array in awkward 2.x vs 1.x #2214

Closed
raymondEhlers opened this issue Feb 6, 2023 · 12 comments · Fixed by #2216
Closed

IndexError: boolean index did not match indexed array in awkward 2.x vs 1.x #2214

raymondEhlers opened this issue Feb 6, 2023 · 12 comments · Fixed by #2216
Assignees
Labels
bug The problem described is something that must be fixed

Comments

@raymondEhlers
Copy link
Contributor

Version of Awkward Array

2.0.6

Description and code to reproduce

Following up on my previous report, I've found another case where the behavior varies from awkward 1.x to 2.x in an unexpected manner. The rough outline is that I'm trying to apply a mask to select some subset of nested data. In awkward 1.x, this works fine (I did this with a behavior, which you can see a glimpse of the functionality in the trace - here it's a bit for convenience). In awkward 2.x, this raises an IndexError (trace below):

Cell In [6], line 1
----> 1 jets.jet_splittings.iterative_splittings(jets.subjets)

File ~/software/dev/mammoth/src/mammoth/framework/analysis/jet_substructure.py:387, in JetSplittingArray.iterative_splittings(self, subjets)
    379 def iterative_splittings(self, subjets: SubjetArray) -> SubjetArray:
    380     """Retrieve the iterative splittings.
    381
    382     Args:
   (...)
    385         The splittings which are part of the iterative splitting chain.
    386     """
--> 387     return cast(SubjetArray, self[subjets.iterative_splitting_index])

File ~/software/dev/mammoth/src/mammoth/framework/analysis/jet_substructure.py:253, in SubjetArray.iterative_splitting_index(self)
    250 @property
    251 def iterative_splitting_index(self) -> AwkwardArray[int]:
    252     """Indices of splittings which were part of the iterative splitting chain."""
--> 253     return self.parent_splitting_index[self.part_of_iterative_splitting]

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/highlevel.py:956, in Array.__getitem__(self, where)
    527 """
    528 Args:
    529     where (many types supported; see below): Index of positions to
   (...)
    953 have the same dimension as the array being indexed.
    954 """
    955 with ak._errors.SlicingErrorContext(self, where):
--> 956     out = self._layout[where]
    957     if isinstance(out, ak.contents.NumpyArray):
    958         array_param = out.parameter("__array__")

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/contents/content.py:509, in Content.__getitem__(self, where)
    508 def __getitem__(self, where):
--> 509     return self._getitem(where)

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/contents/content.py:554, in Content._getitem(self, where)
    551         return out._getitem_at(0)
    553 elif isinstance(where, ak.highlevel.Array):
--> 554     return self._getitem(where.layout)
    556 elif (
    557     isinstance(where, Content)
    558     and where._parameters is not None
    559     and (where._parameters.get("__array__") in ("string", "bytestring"))
    560 ):
    561     return self._getitem_fields(ak.operations.to_list(where))

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/contents/content.py:602, in Content._getitem(self, where)
    599         return out._getitem_at(0)
    601 elif isinstance(where, Content):
--> 602     return self._getitem((where,))
    604 elif ak._util.is_sized_iterable(where) and len(where) == 0:
    605     return self._carry(
    606         ak.index.Index64.empty(0, self._backend.index_nplike),
    607         allow_lazy=True,
    608     )

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/contents/content.py:535, in Content._getitem(self, where)
    532     return self
    534 # Normalise valid indices onto well-defined basis
--> 535 items = ak._slicing.normalise_items(where, self._backend)
    536 # Prepare items for advanced indexing (e.g. via broadcasting)
    537 nextwhere = ak._slicing.prepare_advanced_indexing(items)

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/_slicing.py:206, in normalise_items(where, backend)
    204 def normalise_items(where: Sequence, backend: ak._backends.Backend) -> list:
    205     # First prepare items for broadcasting into like-types
--> 206     return [normalise_item(x, backend) for x in where]

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/_slicing.py:206, in <listcomp>(.0)
    204 def normalise_items(where: Sequence, backend: ak._backends.Backend) -> list:
    205     # First prepare items for broadcasting into like-types
--> 206     return [normalise_item(x, backend) for x in where]

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/_slicing.py:172, in normalise_item(item, backend)
    169         return as_numpy.data
    171 elif isinstance(item, ak.contents.Content):
--> 172     out = normalise_item_bool_to_int(normalise_item_nested(item))
    173     if isinstance(out, ak.contents.NumpyArray):
    174         return out.data

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/_slicing.py:373, in normalise_item_bool_to_int(item)
    371 if item.backend.nplike.known_data or item.backend.nplike.known_shape:
    372     localindex = ak._do.local_index(item, axis=1)
--> 373     nextcontent = localindex.content.data[item.content.data]
    375     cumsum = item.backend.index_nplike.empty(
    376         item.content.data.shape[0] + 1, np.int64
    377     )
    378     cumsum[0] = 0

IndexError: boolean index did not match indexed array along dimension 0; dimension is 14 but corresponding boolean dimension is 36

Unfortunately, this has been tricky to reproduce since it's deeply nested in my code, and only seems to occur for some inputs (there's also a random input :-( ). I tried pickling the array, but in testing the pickled array, it works fine. This led me to guessing a workaround - that if I ak.to_packed() the data, it would work, and anecdotally, this seemed to have worked (in some previous issue, I learned that pickling calls something like to_packed). I managed to reproduce it once in a debug environment (unfortunately, unclear if it will work consistently - there are still some random elements), and grabbed the layouts from the initial array (failing) and on the packed array (working)

original:

In [15]: jets.layout
Out[15]:
<RecordArray is_tuple='false' len='1'>
    <content index='0' field='jet_pt'>
        <NumpyArray dtype='float64' len='1'>[24.54459219]</NumpyArray>
    </content>
    <content index='1' field='jet_constituents'>
        <ListOffsetArray len='1'>
            <offsets><Index dtype='int64' len='2'>
                [0 8]
            </Index></offsets>
            <content><RecordArray is_tuple='false' len='8'>
                <parameter name='__record__'>'JetConstituent'</parameter>
                <content index='0' field='pt'>
                    <NumpyArray dtype='float64' len='8'>
                        [ 1.54625939  0.76237501  0.23116202  1.25449706
                          2.66868935  5.11290739  0.70627872 12.29195148]
                    </NumpyArray>
                </content>
                <content index='1' field='eta'>
                    <NumpyArray dtype='float64' len='8'>
                        [0.53683598 0.82056333 0.84215039 0.76359227 0.59864123
                         0.69776704 0.72732497 0.69191189]
                    </NumpyArray>
                </content>
                <content index='2' field='phi'>
                    <NumpyArray dtype='float64' len='8'>
                        [-2.43178638 -2.41334926 -2.50400814 -2.41215308
                         -2.58415947 -2.55853468 -2.52816211 -2.49251164]
                    </NumpyArray>
                </content>
                <content index='3' field='id'>
                    <IndexedArray len='8'>
                        <index><Index dtype='int64' len='8'>
                            [726608 726112 726550 726268 726302 726309 726566
                             726301]
                        </Index></index>
                        <content><NumpyArray dtype='int64' len='1420975'>
                            [     0 100000 100001 ... 100664 100665 100666]
                        </NumpyArray></content>
                    </IndexedArray>
                </content>
            </RecordArray></content>
        </ListOffsetArray>
    </content>
    <content index='2' field='jet_splittings'>
        <IndexedArray len='1'>
            <index><Index dtype='int64' len='1'>
                [0]
            </Index></index>
            <content><ListOffsetArray len='1'>
                <offsets><Index dtype='int64' len='2'>
                    [0 7]
                </Index></offsets>
                <content><RecordArray is_tuple='false' len='7'>
                    <parameter name='__record__'>'JetSplitting'</parameter>
                    <content index='0' field='kt'>
                        <NumpyArray dtype='float64' len='7'>
                            [0.27029765 0.32986462 0.31952325 0.36662835
                             0.02992396 0.02485513 0.04341938]
                        </NumpyArray>
                    </content>
                    <content index='1' field='delta_R'>
                        <NumpyArray dtype='float64' len='7'>
                            [0.17571019 0.14732361 0.12001833 0.06304824
                             0.04238117 0.10773081 0.05698362]
                        </NumpyArray>
                    </content>
                    <content index='2' field='z'>
                        <NumpyArray dtype='float64' len='7'>
                            [0.06298601 0.09764745 0.12847613 0.32129356
                             0.1213707  0.10282853 0.37799871]
                        </NumpyArray>
                    </content>
                    <content index='3' field='parent_index'>
                        <NumpyArray dtype='int64' len='7'>[-1  0  1  2  3  1  5]</NumpyArray>
                    </content>
                </RecordArray></content>
            </ListOffsetArray></content>
        </IndexedArray>
    </content>
    <content index='3' field='subjets'>
        <ListOffsetArray len='1'>
            <offsets><Index dtype='int64' len='2'>
                [22 36]
            </Index></offsets>
            <content><RecordArray is_tuple='false' len='36'>
                <parameter name='__record__'>'Subjet'</parameter>
                <content index='0' field='part_of_iterative_splitting'>
                    <NumpyArray dtype='bool' len='36'>
                        [ True False  True False  True False  True False  True
                         False  True False False False False False False False
                         False False False False  True False  True False  True
                         False  True False False False False False False False]
                    </NumpyArray>
                </content>
                <content index='1' field='parent_splitting_index'>
                    <NumpyArray dtype='int64' len='36'>
                        [ 0  0  1  1  2  2  3  3  4  4  5  5  6  6  7  7  8  8
                          9  9 10 10  0  0  1  1  2  2  3  3  4  4  5  5  6  6]
                    </NumpyArray>
                </content>
                <content index='2' field='constituent_indices'>
                    <ListOffsetArray len='36'>
                        <offsets><Index dtype='int64' len='37'>
                            [ 0 11 12 21 23 29 32 35 38 40 41 42 43 44 46 47 48
                             50 51 52 53 54 55 62 63 67 70 73 74 75 77 78 79 81
                             82 83 84]
                        </Index></offsets>
                        <content><NumpyArray dtype='int64' len='84'>
                            [ 1  3  7  2  6  8  4  5  9 10 11  0  7  2  6  8  4
                              5  9 10 11  1  3  8  4  5  9 10 11  7  2  6  9 10
                             11  8  4  5 10 11  9 10 11  8  4  5  5  4  2  6  7
                              2  6  3  1  2  1  3  4  7  5  6  0  4  7  5  6  2
                              1  3  7  5  6  4  7  5  6  5  6  1  3  2  3  1]
                        </NumpyArray></content>
                    </ListOffsetArray>
                </content>
            </RecordArray></content>
        </ListOffsetArray>
    </content>
    <content index='4' field='leading_track_pt'>
        <IndexedArray len='1'>
            <index><Index dtype='int64' len='1'>
                [1]
            </Index></index>
            <content><NumpyArray dtype='float64' len='2'>[12.69099712 12.34701824]</NumpyArray></content>
        </IndexedArray>
    </content>
</RecordArray>

packed:

In [16]: jets_packed = ak.to_packed(jets)

In [17]: jets_packed.layout
Out[17]:
<RecordArray is_tuple='false' len='1'>
    <content index='0' field='jet_pt'>
        <NumpyArray dtype='float64' len='1'>[24.54459219]</NumpyArray>
    </content>
    <content index='1' field='jet_constituents'>
        <ListOffsetArray len='1'>
            <offsets><Index dtype='int64' len='2'>
                [0 8]
            </Index></offsets>
            <content><RecordArray is_tuple='false' len='8'>
                <parameter name='__record__'>'JetConstituent'</parameter>
                <content index='0' field='pt'>
                    <NumpyArray dtype='float64' len='8'>
                        [ 1.54625939  0.76237501  0.23116202  1.25449706
                          2.66868935  5.11290739  0.70627872 12.29195148]
                    </NumpyArray>
                </content>
                <content index='1' field='eta'>
                    <NumpyArray dtype='float64' len='8'>
                        [0.53683598 0.82056333 0.84215039 0.76359227 0.59864123
                         0.69776704 0.72732497 0.69191189]
                    </NumpyArray>
                </content>
                <content index='2' field='phi'>
                    <NumpyArray dtype='float64' len='8'>
                        [-2.43178638 -2.41334926 -2.50400814 -2.41215308
                         -2.58415947 -2.55853468 -2.52816211 -2.49251164]
                    </NumpyArray>
                </content>
                <content index='3' field='id'>
                    <NumpyArray dtype='int64' len='8'>
                        [100494      5 100436 100154 100188 100195 100452
                         100187]
                    </NumpyArray>
                </content>
            </RecordArray></content>
        </ListOffsetArray>
    </content>
    <content index='2' field='jet_splittings'>
        <ListOffsetArray len='1'>
            <offsets><Index dtype='int64' len='2'>[0 7]</Index></offsets>
            <content><RecordArray is_tuple='false' len='7'>
                <parameter name='__record__'>'JetSplitting'</parameter>
                <content index='0' field='kt'>
                    <NumpyArray dtype='float64' len='7'>
                        [0.27029765 0.32986462 0.31952325 0.36662835 0.02992396
                         0.02485513 0.04341938]
                    </NumpyArray>
                </content>
                <content index='1' field='delta_R'>
                    <NumpyArray dtype='float64' len='7'>
                        [0.17571019 0.14732361 0.12001833 0.06304824 0.04238117
                         0.10773081 0.05698362]
                    </NumpyArray>
                </content>
                <content index='2' field='z'>
                    <NumpyArray dtype='float64' len='7'>
                        [0.06298601 0.09764745 0.12847613 0.32129356 0.1213707
                         0.10282853 0.37799871]
                    </NumpyArray>
                </content>
                <content index='3' field='parent_index'>
                    <NumpyArray dtype='int64' len='7'>[-1  0  1  2  3  1  5]</NumpyArray>
                </content>
            </RecordArray></content>
        </ListOffsetArray>
    </content>
    <content index='3' field='subjets'>
        <ListOffsetArray len='1'>
            <offsets><Index dtype='int64' len='2'>[ 0 14]</Index></offsets>
            <content><RecordArray is_tuple='false' len='14'>
                <parameter name='__record__'>'Subjet'</parameter>
                <content index='0' field='part_of_iterative_splitting'>
                    <NumpyArray dtype='bool' len='14'>
                        [ True False  True False  True False  True False False
                         False False False False False]
                    </NumpyArray>
                </content>
                <content index='1' field='parent_splitting_index'>
                    <NumpyArray dtype='int64' len='14'>[0 0 1 1 2 2 3 3 4 4 5 5 6 6]</NumpyArray>
                </content>
                <content index='2' field='constituent_indices'>
                    <ListOffsetArray len='14'>
                        <offsets><Index dtype='int64' len='15'>
                            [ 0  7  8 12 15 18 19 20 22 23 24 26 27 28 29]
                        </Index></offsets>
                        <content><NumpyArray dtype='int64' len='29'>
                            [2 1 3 4 7 5 6 0 4 7 5 6 2 1 3 7 5 6 4 7 5 6 5 6 1
                             3 2 3 1]
                        </NumpyArray></content>
                    </ListOffsetArray>
                </content>
            </RecordArray></content>
        </ListOffsetArray>
    </content>
    <content index='4' field='leading_track_pt'>
        <NumpyArray dtype='float64' len='1'>[12.34701824]</NumpyArray>
    </content>
</RecordArray>

to_packed seems like it's given the right behavior since the original and packed jets appear to have the same values, but I think it's a bug that it's needed here. Can you please take a look?

(I'm not sure what the performance implications of ak.to_packed are here, but I need a quick fix, and it seems to work, so I'll run with it. If there are cheaper/better workarounds, I would appreciate any suggestions!)

@raymondEhlers raymondEhlers added the bug (unverified) The problem described would be a bug, but needs to be triaged label Feb 6, 2023
@agoose77 agoose77 added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Feb 7, 2023
@agoose77 agoose77 self-assigned this Feb 7, 2023
@agoose77
Copy link
Collaborator

agoose77 commented Feb 7, 2023

You're right that pickle.dump() would pack an array. If you want to share a reproducer that does not pass through ak.to_packed, you can pickle the result of ak.to_buffers. That function decomposes the array into atoms (buffers, form, length) that can be pickled safely.

The indexing behaviour in Awkward is driven by the index array's structure, so that's what I'm looking at in order to identify whether this is a bug or a usage issue.

The cause of this bug is that the list that wraps the boolean array has offsets that don't start at zero. This is allowed, but sometimes we make assumptions that do not hold true.

@raymondEhlers
Copy link
Contributor Author

Thanks for the explanation and the quick fix! I will keep to_buffers in mind for next time

@raymondEhlers
Copy link
Contributor Author

Using cbc2c4d, I believe I found another instance of this issue. This time, the array is masked to be fully empty. I managed to pickle it with ak.to_buffers (here), and the layout is below:

In [4]: hybrid_to_det_level_valid_matches.layout
Out[4]:
<IndexedArray len='0'>
    <index><Index dtype='int64' len='0'>
        []
    </Index></index>
    <content><ListOffsetArray len='183'>
        <offsets><Index dtype='int64' len='184'>
            [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
             0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
             ...
             2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
             2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
        </Index></offsets>
        <content><NumpyArray dtype='bool' len='2'>[False False]</NumpyArray></content>
    </ListOffsetArray></content>
</IndexedArray>

Based on your comments in the PR, I guess this is an instance of additional bugs popping up. Could you please take a look? Thanks!

@agoose77
Copy link
Collaborator

Could you clarify what isn't working with this array? What operation are you performing that is failing, or producing unexpected results? :)

@raymondEhlers
Copy link
Contributor Author

raymondEhlers commented Feb 15, 2023

Sorry, this wasn't my best bug report - was moving too quickly :-)

I'm doing some matching between three different collections of jets (the matching itself is done with numba). I build up a mask to remove jets, and then for the remaining jets, I want to assign the index of the matched jets to a particular field. The relevant code looks like this (including for completeness, but I'm not sure it will be meaningful):

    # Convention is -1 is a match, anything else is an index
    jets["part_level", "matching"] = matching_indices_1
    jets["det_level", "matching"] = matching_indices_2
    jets["hybrid", "matching"] = matching_indices_3
   
    # Mask out event if there are no jets
    jets_present_mask = (
        (ak.num(jets["part_level"], axis=1) > 0)
        & (ak.num(jets["det_level"], axis=1) > 0)
        & (ak.num(jets["hybrid"], axis=1) > 0)
    )
    jets = jets[jets_present_mask]

    # Assign indices
    hybrid_to_det_level_valid_matches = jets["hybrid", "matching"] > -1
    det_to_part_level_valid_matches = jets["det_level", "matching"] > -1
    # Index error thrown on the next line!
    hybrid_to_det_level_including_det_to_part_level_valid_matches = det_to_part_level_valid_matches[
        jets["hybrid", "matching"][hybrid_to_det_level_valid_matches]
    ]

Sometimes hybrid_to_det_level_valid_matches ends up being empty because there are not many jets in the first place, and they all get masked out. That was fine in awkward 1.x, but in awkward 2.x with the fix for this issue, I still receive the same IndexError:

File ~/software/dev/mammoth/src/mammoth/framework/analysis/jets.py:167, in jet_matching_embedding(jets, det_level_hybrid_max_matching_distance, part_level_det_level_max_matching_distance)
    165 import IPython; IPython.embed()
    166 #ak.to_buffers(hybrid_to_det_level_valid_matches)
--> 167 logger.warning(f'{jets["hybrid", "matching"][hybrid_to_det_level_valid_matches]}')
    168 hybrid_to_det_level_including_det_to_part_level_valid_matches = det_to_part_level_valid_matches[
    169     jets["hybrid", "matching"][hybrid_to_det_level_valid_matches]
    170 ]
    171 # First, restrict the hybrid level, requiring hybrid to det_level valid matches and
    172 # det_level to part_level valid matches.

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/highlevel.py:951, in Array.__getitem__(self, where)
    522 """
    523 Args:
    524     where (many types supported; see below): Index of positions to
   (...)
    948 have the same dimension as the array being indexed.
    949 """
    950 with ak._errors.SlicingErrorContext(self, where):
--> 951     out = self._layout[where]
    952     if isinstance(out, ak.contents.NumpyArray):
    953         array_param = out.parameter("__array__")

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/contents/content.py:531, in Content.__getitem__(self, where)
    530 def __getitem__(self, where):
--> 531     return self._getitem(where)

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/contents/content.py:576, in Content._getitem(self, where)
    573         return out._getitem_at(0)
    575 elif isinstance(where, ak.highlevel.Array):
--> 576     return self._getitem(where.layout)
    578 # Convert between nplikes of different backends
    579 elif (
    580     isinstance(where, ak.contents.Content)
    581     and where.backend is not self._backend
    582 ):

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/contents/content.py:655, in Content._getitem(self, where)
    652     return where.to_NumpyArray(np.int64)
    654 elif isinstance(where, Content):
--> 655     return self._getitem((where,))
    657 elif ak._util.is_sized_iterable(where):
    658     # Do we have an array
    659     nplike = ak._nplikes.nplike_of(where, default=None)

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/contents/content.py:557, in Content._getitem(self, where)
    554     return self
    556 # Normalise valid indices onto well-defined basis
--> 557 items = ak._slicing.normalise_items(where, self._backend)
    558 # Prepare items for advanced indexing (e.g. via broadcasting)
    559 nextwhere = ak._slicing.prepare_advanced_indexing(items)

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/_slicing.py:260, in normalise_items(where, backend)
    258 common_backend = ak._backends.common_backend([backend, where_backend])
    259 # First prepare items for broadcasting into like-types
--> 260 return [normalise_item(x, backend=common_backend) for x in where]

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/_slicing.py:260, in <listcomp>(.0)
    258 common_backend = ak._backends.common_backend([backend, where_backend])
    259 # First prepare items for broadcasting into like-types
--> 260 return [normalise_item(x, backend=common_backend) for x in where]

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/_slicing.py:205, in normalise_item(item, backend)
    203 # Ragged indexing should be performed with integer contents
    204 elif isinstance(item, ak.contents.Content):
--> 205     out = _normalise_item_bool_to_int(_normalise_item_nested(item), backend)
    206     assert out.backend is backend
    207     if isinstance(out, ak.contents.NumpyArray):

File ~/software/dev/mammoth/.venv/lib/python3.10/site-packages/awkward/_slicing.py:445, in _normalise_item_bool_to_int(item, backend)
    443 item = item.to_ListOffsetArray64(True)
    444 localindex = ak._do.local_index(item, axis=1)
--> 445 nextcontent = localindex.content.data[item.content.data]
    447 cumsum = item_backend.index_nplike.empty(
    448     item.content.data.shape[0] + 1, dtype=np.int64
    449 )
    450 cumsum[0] = 0

IndexError: boolean index did not match indexed array along dimension 0; dimension is 0 but corresponding boolean dimension is 2

(note that I added a log message for debugging in this stack track just as I was trying to isolate the issue)

I guess this was happening before, but I didn't see it because it threw the IndexError that I originally reported (and it's somewhat rarer). I recognize that having the mask become an empty array is a bit of an edge case, but this worked for awkward 1.x, so I was expecting it to work for 2.x . Also, when I tried ak.to_packed on hybrid_to_det_level_valid_matches, it also seemed okay, although I didn't follow this through entirely.

In summary, this appears to be the same issue, but yeah, I don't think I included enough details in my follow up :-) Hopefully this is clearer!

@agoose77
Copy link
Collaborator

Although I can't reproduce your particular bug without any data, I can see that there is a bug in that region of code, so I've made a PR that I think would address it #2246. You're welcome to try out that branch, by cloning it and following the instructions for installing Awkward from the README, or wait until a release! Alternatively, feel free to provide a reproducer that I can include in our test suite :)

@raymondEhlers
Copy link
Contributor Author

raymondEhlers commented Feb 15, 2023

Sorry, I'm a bit confused - did the pickle that I posted via ak.to_buffers not reproduce the issue? (linked again here). I'm also willing to test it - I'm just confused.

Edit: ah, do you need the array I was trying to apply it to as well? I'm happy to provide something to put into the test suite as long as I can reasonably disentangle it. Thanks for your help!

@agoose77
Copy link
Collaborator

Edit: ah, do you need the array I was trying to apply it to as well? I'm happy to provide something to put into the test suite as long as I can reasonably disentangle it. Thanks for your help!

That's it! If this line is failing

jets["hybrid", "matching"][hybrid_to_det_level_valid_matches]

then I just need these two arrays:

import pickle

array = jets["hybrid", "matching"]
index = hybrid_to_det_level_valid_matches

raw_data = [
	ak.to_buffers(array),
	ak.to_buffers(index)
]
with open("debug.pickle.zip", "wb") as f:
	pickle.dump(raw_data, f)

@raymondEhlers
Copy link
Contributor Author

Sorry that I overlooked this point, and thanks for bearing with me - I'm in a conference rush and didn't think this through :-)

Here is the pickle as requested: debug.pickle.zip

@agoose77
Copy link
Collaborator

No stress! It was enough to start looking, and we can always just ask for more information when things like this happen anyway :)

@agoose77
Copy link
Collaborator

Ah, it looks like from_buffers doesn't read the unused data, so it's not possible to debug your problem using these data. Could you try this build of Awkward?

awkward-2.0.8a0-py3-none-any.whl.zip

@raymondEhlers
Copy link
Contributor Author

Excellent, that fixed it! Thanks!

In terms of data for the test suit, below is the layout of the arrays, drilling down into them to expose (all of?) the data. I'm not sure if this is useful, but it's here for completeness. If there's another way to extract the arrays for tests, I'm also happy to provide it - just let me know

In [2]: hybrid_to_det_level_valid_matches.layout
Out[2]:
<IndexedArray len='0'>
    <index><Index dtype='int64' len='0'>
        []
    </Index></index>
    <content><ListOffsetArray len='183'>
        <offsets><Index dtype='int64' len='184'>
            [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
             0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
             ...
             2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
             2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
        </Index></offsets>
        <content><NumpyArray dtype='bool' len='2'>[False False]</NumpyArray></content>
    </ListOffsetArray></content>
</IndexedArray>

In [3]: hybrid_to_det_level_valid_matches.layout.content
Out[3]:
<ListOffsetArray len='183'>
    <offsets><Index dtype='int64' len='184'>
        [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
         0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
         ...
         2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
         2 2 2 2 2 2 2 2 2]
    </Index></offsets>
    <content><NumpyArray dtype='bool' len='2'>[False False]</NumpyArray></content>
</ListOffsetArray>

In [4]: hybrid_to_det_level_valid_matches.layout.content.offsets
Out[4]:
<Index dtype='int64' len='184'>
    [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
     2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
     2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
</Index>

In [5]: jets["hybrid", "matching"].layout
Out[5]:
<IndexedArray len='0'>
    <index><Index dtype='int64' len='0'>
        []
    </Index></index>
    <content><ListOffsetArray len='183'>
        <offsets><Index dtype='int64' len='184'>
            [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
             0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
             ...
             2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
             2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
        </Index></offsets>
        <content><NumpyArray dtype='int64' len='2'>[-1 -1]</NumpyArray></content>
    </ListOffsetArray></content>
</IndexedArray>

In [6]: jets["hybrid", "matching"].layout.content.offsets
Out[6]:
<Index dtype='int64' len='184'>
    [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
     2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
     2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
</Index>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants