Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop Ping Time Duplicates #1382

Merged
merged 12 commits into from
Jan 28, 2025

Conversation

ctuguinay
Copy link
Collaborator

Addresses the duplicate ping time part of #1374

@ctuguinay ctuguinay added bug Something isn't working data conversion labels Aug 27, 2024
@ctuguinay ctuguinay added this to the v0.9.1 milestone Aug 27, 2024
@ctuguinay ctuguinay self-assigned this Aug 27, 2024
@codecov-commenter
Copy link

codecov-commenter commented Aug 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.66%. Comparing base (9f56124) to head (a466b26).
Report is 164 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1382      +/-   ##
==========================================
- Coverage   83.52%   80.66%   -2.86%     
==========================================
  Files          64       19      -45     
  Lines        5686     3108    -2578     
==========================================
- Hits         4749     2507    -2242     
+ Misses        937      601     -336     
Flag Coverage Δ
unittests 80.66% <100.00%> (-2.86%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ctuguinay ctuguinay marked this pull request as ready for review August 27, 2024 21:19
@ctuguinay ctuguinay requested a review from leewujung August 27, 2024 21:21
@ctuguinay
Copy link
Collaborator Author

@leewujung This should be ready for review

Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctuguinay : Thanks for looking into this. I agree this would address the problem, but one thing I think we should check is if the entries with duplicated ping_time actually contain the same data. If not, the duplication of ping_time does not make sense, but dropping it would also mean that we lose data?

Also, the same as my comment on another PR, that if both files you used in the tests have the same issues, let's just keep one of them to avoid our test suite getting too big.

@leewujung
Copy link
Member

@ctuguinay: I just got back to this and saw my previous comments, which still stands. We can merge this once those are addressed. Thanks!

@ctuguinay
Copy link
Collaborator Author

ctuguinay commented Dec 31, 2024

@leewujung So I did a quick check of the first two duplicated ping time pairs to see if .drop_duplicates(dim='ping_time') was dropping only ping times that contained duplicate data and it turns out that that was the case:

image

And here's a more generalized check that shows that subsequent ping time pairs are all equal:

image

Thanks for looking into this. I agree this would address the problem, but one thing I think we should check is if the entries with duplicated ping_time actually contain the same data. If not, the duplication of ping_time does not make sense, but dropping it would also mean that we lose data?

And I was thinking about this again, and I think we would either need to drop duplicate ping times even if they contained unique data or we would need to apply some tiny time offset to them such that they did not have the same duplicate ping times, since xr.concat in set_sonar() will fail if there are any duplicates in the datasets were are concatenating.

I don't have a .raw file where I can test this tiny time offset idea on since I don't I have seen a dataset where the ping time duplicates contain unique data.

@ctuguinay ctuguinay requested a review from leewujung December 31, 2024 01:36
@leewujung
Copy link
Member

And I was thinking about this again, and I think we would either need to drop duplicate ping times even if they contained unique data or we would need to apply some tiny time offset to them such that they did not have the same duplicate ping times, since xr.concat in set_sonar() will fail if there are any duplicates in the datasets were are concatenating.

I don't have a .raw file where I can test this tiny time offset idea on since I don't I have seen a dataset where the ping time duplicates contain unique data.

I agree. In the grand scale of things, I kinda feel that dropping one ping is not a big deal, but perhaps that goes against the 0% lossless idea for the raw-converted data.

Perhaps what we can do is to drop duplicated ping_time regardless, but spit out a warning to user if the dropped pings contain different data compared to the retained ping?

I'll shoot an email to Simrad rep to see if he has thoughts about why the pings are duplicated in the first place, and what he thinks about your idea to add a tiny time offset. Will cc you!

@leewujung
Copy link
Member

I took a look at the file you used above, and found out that in fact ALL PINGS are duplicated (!), from the dictionary fields low_date/high_date in ParseEK80:
Screenshot 2024-12-31 at 8 38 56 AM

The data appear to be completely duplicated as well.

Maybe something was happening here with the instrument?

@ctuguinay How many files in the set you were working on encountered this problem?

@ctuguinay
Copy link
Collaborator Author

@leewujung Oh huh that is truly strange. I only had 2 files from the 2021 NWFSC Pacific Hake Survey (Shimada) that displayed this problem.

I wonder if this is just a streaming issue so that duplicate datagrams were sequentially saved to the same place instead of two separate places. I also wonder what Echoview makes of this. Perhaps it just ignores the second ping?

@leewujung
Copy link
Member

Good that that was just 2 files, so not a big hit in terms of data storage.

Yeah would be good to see what Echoview handling looks like. Maybe we can ask folks with a license to help here.

@leewujung
Copy link
Member

From communication with Kongsberg we know that this is likely EK80 software issue -- regardless of whether or not it was fully fixed after v1.12.2 (since the files you saw having this problem was from v1.23, but there was no problem in v21.15).

@ctuguinay: Let's drop the duplicated pings as you have implemented, and send a warning to user if the data in the dropped ping are not identical to the preceding (non-dropped) ping so that it is clear what has happened. Thanks!

@ctuguinay ctuguinay marked this pull request as draft January 21, 2025 21:35
@ctuguinay ctuguinay marked this pull request as ready for review January 28, 2025 02:47
@ctuguinay
Copy link
Collaborator Author

@leewujung This should be ready for review again. I think the pinning worked? Not sure.

@leewujung
Copy link
Member

@ctuguinay : Could you try unpinning zarr and see if everything passes here? The failed tests in #1429 are likely in other modules and not the ones changed in this PR. (There I used the [ci tests all] PR title to trigger running all tests; here only the ones in the modules that had code changes are run.)

Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@leewujung leewujung merged commit a35430f into OSOceanAcoustics:main Jan 28, 2025
5 checks passed
leewujung added a commit that referenced this pull request Feb 2, 2025
* Drop Ping Time Duplicates (#1382)

* init commit

* revert change to fix merge conflict

* test only one file

* use other file

* move test duplicate to test convert ek

* add extra line

* move test back to ek80 convert

* pin zarr and add check unique ping time duplicates and tests

* fix test message

* test remove zarr pin

* add back zarr pin

* Fix tests that fail from new Xarray variable and attribute assignment updates (#1433)

* add .values to change water level scalar value

* add .values to fillna

* fix assign attributes

* remove test1

* update echodata properly

* remove print statement

* Assemble AD2CP timestamp with nanosecond precision (#1436)

* assemble ad2cp timestamp at ns instead of us

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add noqa

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Use `import_resources.files` instead of the legacy `open_text` (#1434)

* use import_resources.files to open yaml files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import from importlib.resources

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add exception for "anc"

---------

Co-authored-by: Caesar Tuguinay <87830138+ctuguinay@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
oftfrfbf pushed a commit to oftfrfbf/echopype that referenced this pull request Feb 5, 2025
* init commit

* revert change to fix merge conflict

* test only one file

* use other file

* move test duplicate to test convert ek

* add extra line

* move test back to ek80 convert

* pin zarr and add check unique ping time duplicates and tests

* fix test message

* test remove zarr pin

* add back zarr pin
@leewujung leewujung modified the milestones: v0.9.1, v0.10.0 Feb 18, 2025
leewujung added a commit that referenced this pull request Feb 20, 2025
…ll tests ci] (#1447)

* EP-1420 resolved path issue with ek60_missing_channel_power

* EP-1420 fixed path to use string concatenation

* formatting

* adding conditionals for checking platform nmea and exchange from time1 to time_nmea

* working datatree for v0.9.0 w o calibration

* Drop Ping Time Duplicates (#1382)

* init commit

* revert change to fix merge conflict

* test only one file

* use other file

* move test duplicate to test convert ek

* add extra line

* move test back to ek80 convert

* pin zarr and add check unique ping time duplicates and tests

* fix test message

* test remove zarr pin

* add back zarr pin

* Fix tests that fail from new Xarray variable and attribute assignment updates (#1433)

* add .values to change water level scalar value

* add .values to fillna

* fix assign attributes

* remove test1

* update echodata properly

* remove print statement

* Assemble AD2CP timestamp with nanosecond precision (#1436)

* assemble ad2cp timestamp at ns instead of us

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add noqa

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Use `import_resources.files` instead of the legacy `open_text` (#1434)

* use import_resources.files to open yaml files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import from importlib.resources

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* bump codespell version, add exceptions (#1438)

* Pin `zarr` and `netcdf4` temporarily [all tests ci] (#1429)

* pin zarr and netcdf4 in requirements.txt

* change np.alltrue to np.all

* use np.argmin(da.data) instead of da.argmin()

* use S instead s for MVBS ping_time_bin

* extract single element in computing nsamples and L in tapered_chirp

* change from S to s in testing.py

* Add `type-extensions` to requirements.txt (#1440)

* add type-extensions

* sort requirements.txt

* add manual trigger to pypi workflow (#1442)

* update cff with ices paper data (#1443)

* add assign actual range utility function (#1435)

* chore(deps): bump actions/setup-python from 5.3.0 to 5.4.0 (#1445)

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.3.0 to 5.4.0.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v5.3.0...v5.4.0)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit autoupdate (#1446)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/PyCQA/isort: 5.13.2 → 6.0.0](PyCQA/isort@5.13.2...6.0.0)
- [github.com/psf/black: 24.10.0 → 25.1.0](psf/black@24.10.0...25.1.0)
- [github.com/codespell-project/codespell: v2.4.0 → v2.4.1](codespell-project/codespell@v2.4.0...v2.4.1)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* added pooch and bound the test registry to the echodata test

* EP-1420 added legacy test files >=v0.8.4, created pooch registry, parameterized test

* EP-1420 checked tests

* EP-1420 debugging TestEchoData test failures, test_nbytes, test_setattr, & test_setitem

* EP-1447 troubleshooting bug still

* EP-1420 removed pooch and added normal test path resources

* EP-1420 fixed test_nbytes, commented out test_setattr

* EP-1420 changed "time_nmea" to "nmea_time", added ek80 legacy_datatree tests, fixed readme

* EP-1420 removed import

* trying to add sonar channel_all

* EP-1420 checked sonar for channel and renamed to channel_all

* check legacy data differently

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove v0.9.2 zarr test files

* remove test_setattr that is no longer needed, see #1457 that removes EchoData.__setattr__

* Update echopype/tests/echodata/test_echodata.py

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Caesar Tuguinay <87830138+ctuguinay@users.noreply.github.com>
Co-authored-by: Wu-Jung Lee <leewujung@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data conversion
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants