[MRG] emit fewer warnings about potential ANI estimation issues #2061

bluegenes · 2022-05-19T20:36:58Z

addresses #2058 by emitting fewer warnings.

remove size accuracy warning during containment estimation
for size accuracy and "jaccard error too high", warn at end in:
- search
- prefetch
- gather
- multigather
- compare
- add tests

Notes and Questions:

I've removed the warnings from the underlying functions, so no warnings will show up when using the python api functions. There are ways to check these for each comparison in the python API, so maybe we should just recommend those for any folks getting ANI from the python API.
For size accuracy and jaccard error, we now warn at the end of compare/search/prefetch/gather if there were any issues. For these, we do not currently have output columns for these properties, so if users get this warning, there will be no way to know which of the comparisons generated the issue, other than the fact that ANI will not be estimated for these comparisons (ANI gets zeroed for both size accuracy and jaccard error issues).
potential false negatives are a bit different. We now warn at the end of compare if there are any potential false negatives. But most of the time, this won't work for search because if there are no hashes in common, there will just be no match found during initial search, so a SearchResult/PrefetchResult etc will never be generated. I do currently store this as a property in *Result, but haven't figured out test data to get a True value out of it, so maybe this should not be included? Perhaps this is the situation where we want to warn immediately upon comparison, since it will mostly show up when the scaled value is too high/query sketch is too small?

codecov · 2022-05-19T20:45:53Z

Codecov Report

Merging #2061 (f63d983) into latest (1ba1e7c) will increase coverage by 7.53%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           latest    #2061      +/-   ##
==========================================
+ Coverage   84.28%   91.81%   +7.53%     
==========================================
  Files         130       99      -31     
  Lines       15255    11047    -4208     
  Branches     2155     2178      +23     
==========================================
- Hits        12857    10143    -2714     
+ Misses       2099      606    -1493     
+ Partials      299      298       -1

Flag	Coverage Δ
python	`91.81% <100.00%> (+0.07%)`	⬆️
rust	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/sourmash/distance_utils.py	`99.36% <ø> (-0.03%)`	⬇️
src/sourmash/commands.py	`88.88% <100.00%> (+0.41%)`	⬆️
src/sourmash/compare.py	`100.00% <100.00%> (ø)`
src/sourmash/minhash.py	`94.16% <100.00%> (ø)`
src/sourmash/search.py	`97.91% <100.00%> (+0.01%)`	⬆️
src/sourmash/sketchcomparison.py	`97.12% <100.00%> (+1.88%)`	⬆️
src/core/src/ffi/mod.rs
src/core/src/ffi/hyperloglog.rs
src/core/src/ffi/index/mod.rs
src/core/tests/storage.rs
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ba1e7c...f63d983. Read the comment docs.

…o fewer-warnings

…enerated

bluegenes · 2022-05-24T01:21:51Z

@ctb ready for review

src/sourmash/distance_utils.py

src/sourmash/minhash.py

src/sourmash/commands.py

ctb · 2022-05-24T13:36:51Z

p.s. this looks really great, thank you :)

ctb · 2022-05-24T15:48:14Z

question: will any warning show up even when no match is found? I'm seeing that behavior in prefetch on genbank with latest, curious if this PR fixes that.

bluegenes · 2022-05-24T16:29:52Z

question: will any warning show up even when no match is found? I'm seeing that behavior in prefetch on genbank with latest, curious if this PR fixes that.

Which warnings are you seeing when no matches are found? (All?) The warning I would want to show up is the one about potential false negatives, but I don't know how to have that show up just once with the current design.

This PR should make it so no warnings show up at all if you have no matches in search/prefetch/gather

bluegenes · 2022-05-24T20:06:11Z

@ctb ready for re-review. I'm not sure what happened to screw up the wheels though :/. Any suggestions?

ctb · 2022-05-24T20:19:02Z

no worries, it's almost certainly not PR specific.

bluegenes added 2 commits May 19, 2022 13:19

only warn about size accuracy once during search,prefetch,gather

e706aba

dont warn during ANI estimation either

8972a08

bluegenes and others added 10 commits May 19, 2022 14:27

handle other dist warnings in search, prefetch, gather

8107476

handle warnings in compare

dd52126

test for warning outputs in compare

522c6ea

check during search

2ddf6b5

fix

b9f3c47

Merge branch 'latest' into fewer-warnings

8af58fb

add compare ANI test for jaccard err too high

e88bc3e

cant get fn during search bc no searchresult is ever generated

f03ae0d

Merge branch 'fewer-warnings' of github.com:sourmash-bio/sourmash int…

327ba8d

…o fewer-warnings

cant get fn during prefetch/gather/multigather bc no result is ever g…

431eca7

…enerated

bluegenes changed the title ~~[WIP] emit fewer warnings about potential ANI estimation issues~~ [MRG] emit fewer warnings about potential ANI estimation issues May 24, 2022

ctb requested changes May 24, 2022

View reviewed changes

bluegenes added 4 commits May 24, 2022 11:13

rm commented warnings; update compare size warning

52ac49b

upd size ani warning for search/prefetch/gather

87c0987

upd comment

6098ed5

rm extra space

f63d983

ctb merged commit 99c3997 into latest May 24, 2022

ctb deleted the fewer-warnings branch May 24, 2022 20:18

ctb mentioned this pull request Jul 9, 2022

Draft release notes & release process updates for v4.4.1 #2112

Closed

6 tasks

ctb mentioned this pull request Jul 23, 2022

verbose output by sourmash gather with ANI #2058

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] emit fewer warnings about potential ANI estimation issues #2061

[MRG] emit fewer warnings about potential ANI estimation issues #2061

bluegenes commented May 19, 2022 •

edited

Loading

codecov bot commented May 19, 2022 •

edited

Loading

bluegenes commented May 24, 2022

ctb commented May 24, 2022

ctb commented May 24, 2022

bluegenes commented May 24, 2022 •

edited

Loading

bluegenes commented May 24, 2022

ctb commented May 24, 2022

[MRG] emit fewer warnings about potential ANI estimation issues #2061

[MRG] emit fewer warnings about potential ANI estimation issues #2061

Conversation

bluegenes commented May 19, 2022 • edited Loading

codecov bot commented May 19, 2022 • edited Loading

Codecov Report

bluegenes commented May 24, 2022

ctb commented May 24, 2022

ctb commented May 24, 2022

bluegenes commented May 24, 2022 • edited Loading

bluegenes commented May 24, 2022

ctb commented May 24, 2022

bluegenes commented May 19, 2022 •

edited

Loading

codecov bot commented May 19, 2022 •

edited

Loading

bluegenes commented May 24, 2022 •

edited

Loading