-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update string equality tests for h5py 3.1 compatibility #263
Conversation
Codecov Report
@@ Coverage Diff @@
## master #263 +/- ##
==========================================
+ Coverage 45.16% 45.38% +0.22%
==========================================
Files 229 229
Lines 19079 19096 +17
Branches 2756 2761 +5
==========================================
+ Hits 8617 8667 +50
+ Misses 9945 9921 -24
+ Partials 517 508 -9 |
@dwpaley strangely enough, while I recognise the changes you are making here, I did not need them for h5py 3.1 to work correctly for me (been using this already) Much 🤔 In no sense is this a comment on the change set which I think you will find mirrored already on several branches... |
Asked @noemifrisina @benjaminhwilliams to take a look as they have both been looking at this code closely lately |
Could have something to do with reading files in different versions of h5py than they were written. Similar to issues we had when we were doing the python 3 conversion. |
format/nexus.py
Outdated
@@ -111,7 +111,7 @@ def visitor(name, obj): | |||
numpy.string_("NXsubentry"), | |||
]: | |||
if "definition" in obj: | |||
if obj["definition"][()] == numpy.string_("NXmx"): | |||
if numpy.string_(obj["definition"][()]) == numpy.string_("NXmx"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we know that these strings are returned as bytes wouldn't
if numpy.string_(obj["definition"][()]) == numpy.string_("NXmx"): | |
if obj["definition"][()] == b"NXmx": |
be more natural?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively:
if numpy.string_(obj["definition"][()]) == numpy.string_("NXmx"): | |
if obj["definition"][()] in ("NXmx", b"NXmx"): |
if they can be both. Which would lend itself to be wrapped in a helper function such as
def _h5match(obj, string):
return obj[()] in (string, string.encode("utf-8"))
(...)
if _h5match(obj["definition"], "NXmx"):
(...)
By my reading
Implies that all attributes are read as str objects now, and the only time |
Okay, this was me being hopelessly optimistic. It looks like, now, attribute strings can be So it'd have to be this, or the latter of Markus' suggestions (although maybe another variant - say a |
These can now sometimes be numpy.string_, sometimes str, depending on how the value was written. Rather than converting everything to numpy.string_, convert to str immediately when read, so we don't need to convert literals all over the place. In the best case, this is a no-op.
I've got a prototype of this (on top of the work here) in nexus_h5py3...ndevenish:nexus_h5py3, which also seems to work |
Since tests are breaking I've pulled this in now. Thanks! |
h5py version 3 updated the rules for strings: https://docs.h5py.org/en/stable/strings.html. Some attributes that were previously returned as regular strings are now returned as bytes, which breaks tests like
if obj.handle["depends_on"][()] == ".":
. This PR converts both sides of all string equality tests informat/nexus.py
to bytes usingnumpy.string_
.