Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backward compatibility of raw-converted dataset with xr.DataTree [all tests ci] #1447

Merged
merged 37 commits into from
Feb 20, 2025

Conversation

oftfrfbf
Copy link
Contributor

@oftfrfbf oftfrfbf commented Feb 5, 2025

This is for issue #1420 to reconcile backwards compatability.

Test files were added for legacy data from v0.8.4 up to v0.9.2 (technically up to version "0.9.1.dev30+gb328bf91.d20241223" since there is no tagged "0.9.2" yet). The test data was derived from an 'ek60' raw file and '.nc' and '.zarr' echodata backups were created using python 3.10.12.

I created a Pooch registry and wired the test files into the test path — if that is a problem I can rework the code and we can add the files to the google drive. In the interim that registry will eventually need to be pointed to test files living somewhere in echopype release assets and changes will need to be made to the conftest.py base_url and version values.

(Also sorry for some of the erroneous edits that got added. I am not sure why my pycharm environment was making so many changes, must be some problem with the way I configured the linter.)

oftfrfbf and others added 20 commits January 19, 2025 05:46
* init commit

* revert change to fix merge conflict

* test only one file

* use other file

* move test duplicate to test convert ek

* add extra line

* move test back to ek80 convert

* pin zarr and add check unique ping time duplicates and tests

* fix test message

* test remove zarr pin

* add back zarr pin
… updates (OSOceanAcoustics#1433)

* add .values to change water level scalar value

* add .values to fillna

* fix assign attributes

* remove test1

* update echodata properly

* remove print statement
…1436)

* assemble ad2cp timestamp at ns instead of us

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add noqa

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…anAcoustics#1434)

* use import_resources.files to open yaml files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import from importlib.resources

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…#1429)

* pin zarr and netcdf4 in requirements.txt

* change np.alltrue to np.all

* use np.argmin(da.data) instead of da.argmin()

* use S instead s for MVBS ping_time_bin

* extract single element in computing nsamples and L in tapered_chirp

* change from S to s in testing.py
* add type-extensions

* sort requirements.txt
…oustics#1445)

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.3.0 to 5.4.0.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v5.3.0...v5.4.0)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/PyCQA/isort: 5.13.2 → 6.0.0](PyCQA/isort@5.13.2...6.0.0)
- [github.com/psf/black: 24.10.0 → 25.1.0](psf/black@24.10.0...25.1.0)
- [github.com/codespell-project/codespell: v2.4.0 → v2.4.1](codespell-project/codespell@v2.4.0...v2.4.1)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@oftfrfbf oftfrfbf assigned oftfrfbf and leewujung and unassigned oftfrfbf Feb 5, 2025
@oftfrfbf oftfrfbf added bug Something isn't working AA SI labels Feb 5, 2025
@codecov-commenter
Copy link

codecov-commenter commented Feb 5, 2025

Codecov Report

Attention: Patch coverage is 96.42857% with 1 line in your changes missing coverage. Please review.

Project coverage is 85.77%. Comparing base (9f56124) to head (ef015d9).
Report is 188 commits behind head on main.

Files with missing lines Patch % Lines
echopype/echodata/echodata.py 96.42% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1447      +/-   ##
==========================================
+ Coverage   83.52%   85.77%   +2.25%     
==========================================
  Files          64       72       +8     
  Lines        5686     6608     +922     
==========================================
+ Hits         4749     5668     +919     
- Misses        937      940       +3     
Flag Coverage Δ
unittests 85.77% <96.42%> (+2.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@leewujung leewujung changed the title Ep 1420 Backward compatibility of raw-converted dataset with xr.DataTree [all tests ci] Feb 5, 2025
@leewujung leewujung closed this Feb 5, 2025
@leewujung leewujung reopened this Feb 5, 2025
@leewujung
Copy link
Member

@oftfrfbf : Ah ok, this is because the channel coordinate/dimension in the parent (Sonar) group has 5 channels, and one child group has 3 channels and the other has 2 channels (i.e., some channels collect power/angle data, some channels collect complex data, see here). I think the easiest is to change channel in the Sonar group to be channel_all, so that there is no conflicts here.

The ek80 legacy dataset is now in the docker image!

@oftfrfbf
Copy link
Contributor Author

oftfrfbf commented Feb 11, 2025

The ek80 legacy dataset is now in the docker image!

@leewujung thank you for adding that!

I think the easiest is to change channel in the Sonar group to be channel_all, so that there is no conflicts here.

Is this something I should follow up on? If so, where would it need to happen? I would assume in the "from_file" method to fix legacy versions I would need to do another rename operation, but it would also need to happen in the echodata object as well elsewhere?

(I will be offline from Feb 15th to the 23rd p.s.)

@leewujung
Copy link
Member

I think the easiest is to change channel in the Sonar group to be channel_all, so that there is no conflicts here.

Is this something I should follow up on? If so, where would it need to happen? I would assume in the "from_file" method to fix legacy versions I would need to do another rename operation, but it would also need to happen in the echodata object as well elsewhere?

(I will be offline from Feb 15th to the 23rd p.s.)

Ah you're right, I somehow missed the part about the initial setting in the Echodata object.

Could you please make the change to fix legacy versions? and I'll pick up from there to update the part to set this variable when assembling the EchoData.

Also, thinking about the logic again, we can avoid opening the data twice by only reading the Sonar group using xr.open_dataset(FILENAME/Sonar, ...), and do open_datatree if it is the new data (containing channel_all) or do open_groups and the associated ops you have if it is the legacy data (containing channel). Do you think this would work?

@oftfrfbf
Copy link
Contributor Author

oftfrfbf commented Feb 14, 2025

Could you please make the change to fix legacy versions? and I'll pick up from there to update the part to set this variable when assembling the EchoData.

Made some changes to accommodate this. I noticed that the children were also changed from 'channel' to 'channel_all' when I renamed the variable, I am not entirely sure if that is what you were looking to happen.

I was having some sort of problem running the tests with versions 0.9.2 of the test files, I am not sure why that is so if you could please troubleshoot and resolve. Maybe it has to do with the test files being dev snapshot conversions that are out of date now (might have to recreate them with the latest dev 0.9.2 version of echopype).

Also, thinking about the logic again, we can avoid opening the data twice by only reading the Sonar group using xr.open_dataset(FILENAME/Sonar, ...), and do open_datatree if it is the new data (containing channel_all) or do open_groups and the associated ops you have if it is the legacy data (containing channel). Do you think this would work?

This makes sense to me. It would be nice to test it out to see what the performance differences are, I am out of time before leaving town though so this will have to wait for another day. I will be offline until the 22nd. Things are also getting very very grim with respect to university institute funding and my federal institution so I will have to see if we still exist a week from now. Good luck with continued development.

@leewujung
Copy link
Member

Maybe it has to do with the test files being dev snapshot conversions that are out of date now (might have to recreate them with the latest dev 0.9.2 version of echopype)

By "the latest dev 0.9.2 version of echopype" do you mean what's in main?

I noticed that the children were also changed from 'channel' to 'channel_all' when I renamed the variable, I am not entirely sure if that is what you were looking to happen.

I think it may need to be a roundabout to make xarray recognizing the channels are not supposed to be inherited, something like:

  • changing channel in the child groups to channel_new
  • changing channel in the parent group to channel_all
  • changing channel_new back to channel

Things are also getting very very grim with respect to university institute funding and my federal institution so I will have to see if we still exist a week from now. Good luck with continued development.

Sigh, I hope things will go the right way... 🤞 Thanks for all of your help!

@oftfrfbf
Copy link
Contributor Author

'By "the latest dev 0.9.2 version of echopype" do you mean what's in main?'

Yeah, I didn't have a lot of time to figure out what the problem was - or if there even was one. Maybe I just didn't have the latest updates synced properly.

@leewujung
Copy link
Member

leewujung commented Feb 16, 2025

So the channel to channel_all actually was not the problem for the failed tests. The problem lies in assuming all echodata files one tries to open has time1 in the Platform/NMEA group, but that part may be different depending on the sonar_model and whether it's legacy.

I'll get this fixed using the approach I proposed above to open individual group instead of open_datatree or open_groups to avoid opening the dataset twice. Let's see if it works out!

@leewujung
Copy link
Member

By "the latest dev 0.9.2 version of echopype" do you mean what's in main?

Yeah, I didn't have a lot of time to figure out what the problem was - or if there even was one. Maybe I just didn't have the latest updates synced properly.

Ok, I think we can do without this, since there isn't a v0.9.2 release!

# Update Sonar for all sonar_model if channel exists as a coordinate
sonar = temp_tree["/Sonar"]
if "channel" in sonar.coords:
# TODO: do we actually want to rename the children as well?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but everything must be renamed to ensure that the datatree enforced parent-child inheritance holds if it's read in again right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And from #1419, I think that the "channel_all" renaming should only be done to the EK80 files and not the EK60 files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or anything that goes through EK80 set groups, so EK80 and ES80.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should've also highlighted line 224.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctuguinay : I think that we should keep the variable name the same across different sonar models, so that it is easier to code things up based on those downstream. In this case that would be channel_all in the Sonar group and channel in the Sonar/Beam_groupX groups.

This does mean that we have both channel_all and channel in the Sonar/Beam_groupX groups due to the inheritance, but no variables are aligned with the inherited channel_all coordinate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leewujung Ah I see, I think I agree with you on this. Right now, the set groups for EK60 doesn't have "channel_all" in Sonar/Beam_groupX. I can add that to ensure that we have uniform naming across both EK60 and EK80 set groups.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - I have overlooked that missing variable... yes please add it in!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok @ctuguinay and I just talked and decided to keep it as is to not introduce confusion. We'll take on a review of all variables at another time.

@ctuguinay
Copy link
Collaborator

@leewujung I just had a few comments on the legacy checker/fixing code. Let me know if they're correct and I can go in and modify if needed.

@leewujung
Copy link
Member

@ctuguinay : Thanks for reviewing this, and making the change for .dims to .sizes. I responded to you comments above. Let me know what you think.

@leewujung leewujung added this to the v0.10.0 milestone Feb 20, 2025
@leewujung
Copy link
Member

Ok, we've resolved all points here now and I'll go ahead to merge. Thanks @oftfrfbf and @ctuguinay for your work on this!

@leewujung leewujung merged commit 616f4c7 into OSOceanAcoustics:main Feb 20, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AA SI bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants