Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: re-introduce unknown-length #2229

Merged
merged 7 commits into from
Feb 9, 2023

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Feb 9, 2023

TL;DR

  • Replace use of None as "unknown length" token with unknown_length singleton.
  • Replace use of nplike.add_shape_item(length, x) with length + x

TL

The rationale for using None as a token for an unknown length was weak, but was ultimately due to the following factors:

  • Array API uses None for unknown dimension lengths
  • None easily serialises to JSON
  • None is a natural singleton

Some of this was discussed in #2220 (comment)

However, a significant problem with using None is that it is also a meaningful index value, and we would be unable to disambiguate between array[None] and array[array.layout.length] for typetracer arrays. Whilst this would only affect L2 typetracer users, it's enough to change the balance.

Thereafter, there are arguments to be made in favour of using nplike.XXX_shape_item functions to operate upon lengths vs a rich object that implements magic methods. I preferred the former, due to explicitness and the ability to do e.g. validation on known lengths (such as ensuring that they are positive). However, code is less easy to read from an expression perspective. @jpivarski and I discussed this, and we settled on re-introducing the object-oriented API.

We will maintain a strong distinction between an index: int | ArrayLike and a length: int | type[unknown_length], to ensure that unknown values map correctly between spaces.

This reverts the inability to set None as length values in RecordArray.

@agoose77 agoose77 changed the title refactor: re-introduce unknown-length feat!: re-introduce unknown-length Feb 9, 2023
@agoose77 agoose77 temporarily deployed to docs-preview February 9, 2023 11:14 — with GitHub Actions Inactive
@@ -61,7 +62,7 @@ def _to_dict_part(self, verbose, toplevel):
return self._to_dict_extra(
{
"class": "RegularArray",
"size": self._size,
"size": None if self._size is unknown_length else self._size,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we encode unknown_length as None

@@ -54,7 +53,7 @@ def from_dict(input: dict) -> Form:
elif input["class"] == "RegularArray":
return ak.forms.RegularForm(
content=from_dict(input["content"]),
size=input["size"],
size=unknown_length if input["size"] is None else input["size"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we decode None to mean unknown_length

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BitMaskedArray has length metadata that can now be unknown_length, but I guess it's not a part of the BitMaskedForm the way that size is for RegularForm. I had thought that there might be another place to do this, but apparently not.

@agoose77 agoose77 temporarily deployed to docs-preview February 9, 2023 11:22 — with GitHub Actions Inactive
@codecov
Copy link

codecov bot commented Feb 9, 2023

Codecov Report

Merging #2229 (53b641c) into main (15964ce) will increase coverage by 0.01%.
The diff coverage is 90.59%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/_do.py 84.39% <ø> (ø)
src/awkward/_broadcasting.py 89.26% <57.14%> (ø)
src/awkward/_nplikes/shape.py 77.04% <77.04%> (ø)
src/awkward/contents/indexedoptionarray.py 88.20% <87.50%> (+0.01%) ⬆️
src/awkward/contents/recordarray.py 84.92% <93.33%> (+0.17%) ⬆️
src/awkward/contents/regulararray.py 86.92% <94.44%> (+0.15%) ⬆️
src/awkward/_nplikes/typetracer.py 73.17% <95.23%> (-0.82%) ⬇️
src/awkward/_nplikes/array_module.py 88.88% <100.00%> (+2.65%) ⬆️
src/awkward/_nplikes/numpylike.py 75.09% <100.00%> (+0.37%) ⬆️
src/awkward/_slicing.py 88.52% <100.00%> (-0.37%) ⬇️
... and 15 more

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great—mostly replacements of Noneunknown_length, and it's usually easier to go the other way (None appears in other contexts than unknown lengths). It looks good to me, including the serialization to and from JSON.

@@ -54,7 +53,7 @@ def from_dict(input: dict) -> Form:
elif input["class"] == "RegularArray":
return ak.forms.RegularForm(
content=from_dict(input["content"]),
size=input["size"],
size=unknown_length if input["size"] is None else input["size"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BitMaskedArray has length metadata that can now be unknown_length, but I guess it's not a part of the BitMaskedForm the way that size is for RegularForm. I had thought that there might be another place to do this, but apparently not.

Comment on lines -488 to +489
for x1 in [True, False, None]:
for x2 in [True, False, None]:
for x3 in [True, False, None]:
for x4 in [True, False, None]:
for x5 in [True, False, None]:
mask = [x1, x2, x3, x4, x5]
expected = [
m if m is None else x
for x, m in zip(data, mask)
if m is not False
]
array2 = ak.highlevel.Array(mask, check_valid=True).layout
assert to_list(array[array2]) == expected
assert array.to_typetracer()[array2].form == array[array2].form
for mask in itertools.repeat([True, False, None], 5):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

@agoose77 agoose77 merged commit cb19001 into main Feb 9, 2023
@agoose77 agoose77 deleted the agoose77/feat-restore-unknown-length branch February 9, 2023 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants