-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestGeoPackageHydrofabric.test_uid_1_a
failure
#468
Comments
So, a bit of background. We are using In doing some initial investigating, I came across pandas documentation that notes that the
With that in mind, we need to find an alternative stable alternative to compute a unique identifier that does not use |
Issue is with Issue is present in |
Well, its never that easy. Looks like it is something else with Either way, we should sort that layers list before hashing to create the unique identifier (#TODO @aaraney). |
I've confirmed in a quick test on two different machines (an Intel Mac and M2 Mac) that sorting the order of processing of the layers will produce consistent results when generating the unique id. I'll leave it to you for the moment, @aaraney, but let me know if you want me to go ahead and open a PR. |
Thanks for verifying that, @robertbartel. Yeah, if you want to put in a quick PR that would be great! |
Fixing to ensure deterministic ordering of layers. Should address NOAA-OWP#468.
Update: im still tracking down the change that introduced this (feature) bug. But, ive found the source. So, in short, for feature in features_lst:
# load geometry
if hasattr(feature, "__geo_interface__"):
feature = feature.__geo_interface__
... A feature looks like:
Note the absence of |
The hardcoded hash that we are testing against in the test is wrong. The correct hash is the hash produced by the call to |
So, what are our options? One option is to is to pin |
The alternative would probably be to retrieve a more restricted subset of low level data ourselves from the dataframe from which to construct a hash. I.e., manually build the hash from only the relevant (to our purposes) contents of the pandas object, rather than using I'm fine with going with |
There is actually one more option, though I don't know how feasible it is: lobby for suitable unique identifier(s) to be in the hydrofabric data. We could probably get by on just a unique hydrofabric version string and the already-included set of catchment ids. |
Yeah, I really like that idea. I think its at least worth chewing on for a bit. We might bring it up to @program-- and get his thoughts. |
I think you guys seem to have it mostly figured out -- If you want a (somewhat) persistent hash for a given subset, I would most likely try to take all the IDs: On the other point, |
That is really the key. We don't want a persistent hash across multiple releases of the hydrofabric. We want something that will reflect if the underlying data changes (or, more practically, if we are dealing with two different hydrofabrics in two parts of a larger operation). We just need it to be deterministic and consistent for the same hydrofabric release/data. |
Possibly related to #324.
Failure: https://github.com/NOAA-OWP/DMOD/actions/runs/6510147982/job/17683206387#step:10:319
Effected Code:
TestGeoPackageHydrofabric.test_uid_1_a
is failing.The text was updated successfully, but these errors were encountered: