Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] emit fewer warnings about potential ANI estimation issues #2061

Merged
merged 16 commits into from
May 24, 2022

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented May 19, 2022

addresses #2058 by emitting fewer warnings.

  • remove size accuracy warning during containment estimation
  • for size accuracy and "jaccard error too high", warn at end in:
    • search
    • prefetch
    • gather
    • multigather
    • compare
    • add tests

Notes and Questions:

  • I've removed the warnings from the underlying functions, so no warnings will show up when using the python api functions. There are ways to check these for each comparison in the python API, so maybe we should just recommend those for any folks getting ANI from the python API.
  • For size accuracy and jaccard error, we now warn at the end of compare/search/prefetch/gather if there were any issues. For these, we do not currently have output columns for these properties, so if users get this warning, there will be no way to know which of the comparisons generated the issue, other than the fact that ANI will not be estimated for these comparisons (ANI gets zeroed for both size accuracy and jaccard error issues).
  • potential false negatives are a bit different. We now warn at the end of compare if there are any potential false negatives. But most of the time, this won't work for search because if there are no hashes in common, there will just be no match found during initial search, so a SearchResult/PrefetchResult etc will never be generated. I do currently store this as a property in *Result, but haven't figured out test data to get a True value out of it, so maybe this should not be included? Perhaps this is the situation where we want to warn immediately upon comparison, since it will mostly show up when the scaled value is too high/query sketch is too small?

@codecov
Copy link

codecov bot commented May 19, 2022

Codecov Report

Merging #2061 (f63d983) into latest (1ba1e7c) will increase coverage by 7.53%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           latest    #2061      +/-   ##
==========================================
+ Coverage   84.28%   91.81%   +7.53%     
==========================================
  Files         130       99      -31     
  Lines       15255    11047    -4208     
  Branches     2155     2178      +23     
==========================================
- Hits        12857    10143    -2714     
+ Misses       2099      606    -1493     
+ Partials      299      298       -1     
Flag Coverage Δ
python 91.81% <100.00%> (+0.07%) ⬆️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/distance_utils.py 99.36% <ø> (-0.03%) ⬇️
src/sourmash/commands.py 88.88% <100.00%> (+0.41%) ⬆️
src/sourmash/compare.py 100.00% <100.00%> (ø)
src/sourmash/minhash.py 94.16% <100.00%> (ø)
src/sourmash/search.py 97.91% <100.00%> (+0.01%) ⬆️
src/sourmash/sketchcomparison.py 97.12% <100.00%> (+1.88%) ⬆️
src/core/src/ffi/mod.rs
src/core/src/ffi/hyperloglog.rs
src/core/src/ffi/index/mod.rs
src/core/tests/storage.rs
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ba1e7c...f63d983. Read the comment docs.

@bluegenes bluegenes changed the title [WIP] emit fewer warnings about potential ANI estimation issues [MRG] emit fewer warnings about potential ANI estimation issues May 24, 2022
@bluegenes
Copy link
Contributor Author

@ctb ready for review

src/sourmash/distance_utils.py Outdated Show resolved Hide resolved
src/sourmash/distance_utils.py Outdated Show resolved Hide resolved
src/sourmash/distance_utils.py Outdated Show resolved Hide resolved
src/sourmash/minhash.py Outdated Show resolved Hide resolved
src/sourmash/commands.py Outdated Show resolved Hide resolved
@ctb
Copy link
Contributor

ctb commented May 24, 2022

p.s. this looks really great, thank you :)

@ctb
Copy link
Contributor

ctb commented May 24, 2022

question: will any warning show up even when no match is found? I'm seeing that behavior in prefetch on genbank with latest, curious if this PR fixes that.

@bluegenes
Copy link
Contributor Author

bluegenes commented May 24, 2022

question: will any warning show up even when no match is found? I'm seeing that behavior in prefetch on genbank with latest, curious if this PR fixes that.

Which warnings are you seeing when no matches are found? (All?) The warning I would want to show up is the one about potential false negatives, but I don't know how to have that show up just once with the current design.

This PR should make it so no warnings show up at all if you have no matches in search/prefetch/gather

@bluegenes
Copy link
Contributor Author

@ctb ready for re-review. I'm not sure what happened to screw up the wheels though :/. Any suggestions?

@ctb ctb merged commit 99c3997 into latest May 24, 2022
@ctb ctb deleted the fewer-warnings branch May 24, 2022 20:18
@ctb
Copy link
Contributor

ctb commented May 24, 2022

no worries, it's almost certainly not PR specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants