-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what to do about ANI estimate for two very small scaled sketches? #2003
Comments
6 tasks
Hot take: Warning + |
Handled by #2032. |
This is happening much more often than I expected, and for some applications (prefetch, gather), can yield very verbose output (#2058). Can deal with verbosity by changing the warning strategy, but I'm not sure zeroing out is the right call, if it's happening this often... |
thresholds modified in #2074 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
With #1967, we will now estimate ANI for any scaled sketch comparisons, regardless of sketch size. These estimates may be inaccurate for viruses/small genomes.
context from #1967 (comment):
@ctb:
@bluegenes:
At the moment, we just report the ANI, except for extremely tiny test data where we can't actually estimate ANI.
For jaccard --> ANI, we estimate the error on the jaccard estimate itself, and raise a warning when the error may be too high (but still currently report ANI). I have an item in the
SearchResult
class that keeps track of whether the jaccard estimate error is too high -- I think we should at least consider doing that, but would also be open to zeroing out the ANI estimate.From #1798 (comment):
I was hoping we might be able to use HLL to avoid issues with small sketches, but I suppose instead we could use this to estimate an error based on the sketch size, and raise a warning when the error/ zero out the ANI when the error is too high?
The text was updated successfully, but these errors were encountered: