-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: re-introduce unknown-length
#2229
Conversation
unknown-length
unknown-length
@@ -61,7 +62,7 @@ def _to_dict_part(self, verbose, toplevel): | |||
return self._to_dict_extra( | |||
{ | |||
"class": "RegularArray", | |||
"size": self._size, | |||
"size": None if self._size is unknown_length else self._size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we encode unknown_length
as None
@@ -54,7 +53,7 @@ def from_dict(input: dict) -> Form: | |||
elif input["class"] == "RegularArray": | |||
return ak.forms.RegularForm( | |||
content=from_dict(input["content"]), | |||
size=input["size"], | |||
size=unknown_length if input["size"] is None else input["size"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we decode None
to mean unknown_length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BitMaskedArray has length metadata that can now be unknown_length
, but I guess it's not a part of the BitMaskedForm the way that size
is for RegularForm. I had thought that there might be another place to do this, but apparently not.
Codecov Report
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great—mostly replacements of None
→ unknown_length
, and it's usually easier to go the other way (None
appears in other contexts than unknown lengths). It looks good to me, including the serialization to and from JSON.
@@ -54,7 +53,7 @@ def from_dict(input: dict) -> Form: | |||
elif input["class"] == "RegularArray": | |||
return ak.forms.RegularForm( | |||
content=from_dict(input["content"]), | |||
size=input["size"], | |||
size=unknown_length if input["size"] is None else input["size"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BitMaskedArray has length metadata that can now be unknown_length
, but I guess it's not a part of the BitMaskedForm the way that size
is for RegularForm. I had thought that there might be another place to do this, but apparently not.
for x1 in [True, False, None]: | ||
for x2 in [True, False, None]: | ||
for x3 in [True, False, None]: | ||
for x4 in [True, False, None]: | ||
for x5 in [True, False, None]: | ||
mask = [x1, x2, x3, x4, x5] | ||
expected = [ | ||
m if m is None else x | ||
for x, m in zip(data, mask) | ||
if m is not False | ||
] | ||
array2 = ak.highlevel.Array(mask, check_valid=True).layout | ||
assert to_list(array[array2]) == expected | ||
assert array.to_typetracer()[array2].form == array[array2].form | ||
for mask in itertools.repeat([True, False, None], 5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
TL;DR
None
as "unknown length" token withunknown_length
singleton.nplike.add_shape_item(length, x)
withlength + x
TL
The rationale for using
None
as a token for an unknown length was weak, but was ultimately due to the following factors:None
for unknown dimension lengthsNone
easily serialises to JSONNone
is a natural singletonSome of this was discussed in #2220 (comment)
However, a significant problem with using
None
is that it is also a meaningful index value, and we would be unable to disambiguate betweenarray[None]
andarray[array.layout.length]
for typetracer arrays. Whilst this would only affect L2 typetracer users, it's enough to change the balance.Thereafter, there are arguments to be made in favour of using
nplike.XXX_shape_item
functions to operate upon lengths vs a rich object that implements magic methods. I preferred the former, due to explicitness and the ability to do e.g. validation on known lengths (such as ensuring that they are positive). However, code is less easy to read from an expression perspective. @jpivarski and I discussed this, and we settled on re-introducing the object-oriented API.We will maintain a strong distinction between an
index: int | ArrayLike
and alength: int | type[unknown_length]
, to ensure that unknown values map correctly between spaces.This reverts the inability to set
None
as length values inRecordArray
.