-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop Ping Time Duplicates #1382
Drop Ping Time Duplicates #1382
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1382 +/- ##
==========================================
- Coverage 83.52% 80.66% -2.86%
==========================================
Files 64 19 -45
Lines 5686 3108 -2578
==========================================
- Hits 4749 2507 -2242
+ Misses 937 601 -336
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@leewujung This should be ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctuguinay : Thanks for looking into this. I agree this would address the problem, but one thing I think we should check is if the entries with duplicated ping_time
actually contain the same data. If not, the duplication of ping_time
does not make sense, but dropping it would also mean that we lose data?
Also, the same as my comment on another PR, that if both files you used in the tests have the same issues, let's just keep one of them to avoid our test suite getting too big.
@ctuguinay: I just got back to this and saw my previous comments, which still stands. We can merge this once those are addressed. Thanks! |
@leewujung So I did a quick check of the first two duplicated ping time pairs to see if And here's a more generalized check that shows that subsequent ping time pairs are all equal:
And I was thinking about this again, and I think we would either need to drop duplicate ping times even if they contained unique data or we would need to apply some tiny time offset to them such that they did not have the same duplicate ping times, since I don't have a .raw file where I can test this tiny time offset idea on since I don't I have seen a dataset where the ping time duplicates contain unique data. |
I agree. In the grand scale of things, I kinda feel that dropping one ping is not a big deal, but perhaps that goes against the 0% lossless idea for the raw-converted data. Perhaps what we can do is to drop duplicated I'll shoot an email to Simrad rep to see if he has thoughts about why the pings are duplicated in the first place, and what he thinks about your idea to add a tiny time offset. Will cc you! |
I took a look at the file you used above, and found out that in fact ALL PINGS are duplicated (!), from the dictionary fields The data appear to be completely duplicated as well. Maybe something was happening here with the instrument? @ctuguinay How many files in the set you were working on encountered this problem? |
@leewujung Oh huh that is truly strange. I only had 2 files from the 2021 NWFSC Pacific Hake Survey (Shimada) that displayed this problem. I wonder if this is just a streaming issue so that duplicate datagrams were sequentially saved to the same place instead of two separate places. I also wonder what Echoview makes of this. Perhaps it just ignores the second ping? |
Good that that was just 2 files, so not a big hit in terms of data storage. Yeah would be good to see what Echoview handling looks like. Maybe we can ask folks with a license to help here. |
From communication with Kongsberg we know that this is likely EK80 software issue -- regardless of whether or not it was fully fixed after v1.12.2 (since the files you saw having this problem was from v1.23, but there was no problem in v21.15). @ctuguinay: Let's drop the duplicated pings as you have implemented, and send a warning to user if the data in the dropped ping are not identical to the preceding (non-dropped) ping so that it is clear what has happened. Thanks! |
@leewujung This should be ready for review again. I think the pinning worked? Not sure. |
@ctuguinay : Could you try unpinning zarr and see if everything passes here? The failed tests in #1429 are likely in other modules and not the ones changed in this PR. (There I used the [ci tests all] PR title to trigger running all tests; here only the ones in the modules that had code changes are run.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks!
* Drop Ping Time Duplicates (#1382) * init commit * revert change to fix merge conflict * test only one file * use other file * move test duplicate to test convert ek * add extra line * move test back to ek80 convert * pin zarr and add check unique ping time duplicates and tests * fix test message * test remove zarr pin * add back zarr pin * Fix tests that fail from new Xarray variable and attribute assignment updates (#1433) * add .values to change water level scalar value * add .values to fillna * fix assign attributes * remove test1 * update echodata properly * remove print statement * Assemble AD2CP timestamp with nanosecond precision (#1436) * assemble ad2cp timestamp at ns instead of us * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add noqa --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Use `import_resources.files` instead of the legacy `open_text` (#1434) * use import_resources.files to open yaml files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import from importlib.resources * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add exception for "anc" --------- Co-authored-by: Caesar Tuguinay <87830138+ctuguinay@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* init commit * revert change to fix merge conflict * test only one file * use other file * move test duplicate to test convert ek * add extra line * move test back to ek80 convert * pin zarr and add check unique ping time duplicates and tests * fix test message * test remove zarr pin * add back zarr pin
…ll tests ci] (#1447) * EP-1420 resolved path issue with ek60_missing_channel_power * EP-1420 fixed path to use string concatenation * formatting * adding conditionals for checking platform nmea and exchange from time1 to time_nmea * working datatree for v0.9.0 w o calibration * Drop Ping Time Duplicates (#1382) * init commit * revert change to fix merge conflict * test only one file * use other file * move test duplicate to test convert ek * add extra line * move test back to ek80 convert * pin zarr and add check unique ping time duplicates and tests * fix test message * test remove zarr pin * add back zarr pin * Fix tests that fail from new Xarray variable and attribute assignment updates (#1433) * add .values to change water level scalar value * add .values to fillna * fix assign attributes * remove test1 * update echodata properly * remove print statement * Assemble AD2CP timestamp with nanosecond precision (#1436) * assemble ad2cp timestamp at ns instead of us * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add noqa --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Use `import_resources.files` instead of the legacy `open_text` (#1434) * use import_resources.files to open yaml files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import from importlib.resources * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * bump codespell version, add exceptions (#1438) * Pin `zarr` and `netcdf4` temporarily [all tests ci] (#1429) * pin zarr and netcdf4 in requirements.txt * change np.alltrue to np.all * use np.argmin(da.data) instead of da.argmin() * use S instead s for MVBS ping_time_bin * extract single element in computing nsamples and L in tapered_chirp * change from S to s in testing.py * Add `type-extensions` to requirements.txt (#1440) * add type-extensions * sort requirements.txt * add manual trigger to pypi workflow (#1442) * update cff with ices paper data (#1443) * add assign actual range utility function (#1435) * chore(deps): bump actions/setup-python from 5.3.0 to 5.4.0 (#1445) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.3.0 to 5.4.0. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5.3.0...v5.4.0) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (#1446) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/PyCQA/isort: 5.13.2 → 6.0.0](PyCQA/isort@5.13.2...6.0.0) - [github.com/psf/black: 24.10.0 → 25.1.0](psf/black@24.10.0...25.1.0) - [github.com/codespell-project/codespell: v2.4.0 → v2.4.1](codespell-project/codespell@v2.4.0...v2.4.1) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * added pooch and bound the test registry to the echodata test * EP-1420 added legacy test files >=v0.8.4, created pooch registry, parameterized test * EP-1420 checked tests * EP-1420 debugging TestEchoData test failures, test_nbytes, test_setattr, & test_setitem * EP-1447 troubleshooting bug still * EP-1420 removed pooch and added normal test path resources * EP-1420 fixed test_nbytes, commented out test_setattr * EP-1420 changed "time_nmea" to "nmea_time", added ek80 legacy_datatree tests, fixed readme * EP-1420 removed import * trying to add sonar channel_all * EP-1420 checked sonar for channel and renamed to channel_all * check legacy data differently * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove v0.9.2 zarr test files * remove test_setattr that is no longer needed, see #1457 that removes EchoData.__setattr__ * Update echopype/tests/echodata/test_echodata.py --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Caesar Tuguinay <87830138+ctuguinay@users.noreply.github.com> Co-authored-by: Wu-Jung Lee <leewujung@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Addresses the duplicate ping time part of #1374