Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] adjust output of gather to indicate weighted/unweighted results #1819

Merged
merged 2 commits into from
Feb 2, 2022

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Feb 2, 2022

This PR adjusts the output of gather so that it's clear whether the coverage is abundance-weighted or not.

The full output is below, but the only changed line is:

for unweighted / --ignore-abundance:
the recovered matches hit 34.8% of the query (unweighted)

for weighted:
the recovered matches hit 35.3% of the abundance-weighted query

Full output below.

Some decisions I made in this PR:

Fixes #1805.

cc #1818

full output

with -p abund sketch -

== This is sourmash version 4.2.4.dev0+g73aeb155.d20220116. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

select query k=31 automatically.
loaded query: gut.cluster.41.fa.gz.cdbg_ids.... (k=31, DNA)
loaded 1 databases.

Starting prefetch sweep across databases.
Found 4 signatures via prefetch; now doing gather.

overlap     p_query p_match avg_abund
---------   ------- ------- ---------
134.0 kbp      7.6%    4.3%       1.9    FR883346.1 Clostridium sp. CAG:221 ge...
114.0 kbp     27.7%    3.0%       9.0    MARQ01000035.1 Clostridium perfringen...
found less than 50.0 kbp in common. => exiting

found 2 matches total;
the recovered matches hit 35.3% of the abundance-weighted query

with --ignore-abundance:

== This is sourmash version 4.2.4.dev0+g73aeb155.d20220116. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

select query k=31 automatically.
loaded query: gut.cluster.41.fa.gz.cdbg_ids.... (k=31, DNA)
loaded 1 databases.

Starting prefetch sweep across databases.
Found 4 signatures via prefetch; now doing gather.

overlap     p_query p_match
---------   ------- -------
134.0 kbp     19.5%    4.3%    FR883346.1 Clostridium sp. CAG:221 ge...
114.0 kbp     15.3%    3.0%    MARQ01000035.1 Clostridium perfringen...
found less than 50.0 kbp in common. => exiting

found 2 matches total;
the recovered matches hit 34.8% of the query (unweighted)

@ctb
Copy link
Contributor Author

ctb commented Feb 2, 2022

@drtamermansour @bluegenes @mr-eyes thoughts on this change? it's small but very useful.

@codecov
Copy link

codecov bot commented Feb 2, 2022

Codecov Report

Merging #1819 (ec7e297) into latest (4f9d17d) will increase coverage by 6.67%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1819      +/-   ##
==========================================
+ Coverage   83.48%   90.15%   +6.67%     
==========================================
  Files         113       87      -26     
  Lines       12181     8482    -3699     
  Branches     1626     1627       +1     
==========================================
- Hits        10169     7647    -2522     
+ Misses       1754      577    -1177     
  Partials      258      258              
Flag Coverage Δ
python 90.15% <100.00%> (+<0.01%) ⬆️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/commands.py 88.24% <100.00%> (+0.02%) ⬆️
src/core/src/ffi/signature.rs
src/core/src/encodings.rs
src/core/src/ffi/mod.rs
src/core/src/sketch/hyperloglog/estimators.rs
src/core/src/ffi/hyperloglog.rs
src/core/src/cmd.rs
src/core/src/errors.rs
src/core/src/index/search.rs
src/core/src/ffi/cmd/compute.rs
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f9d17d...ec7e297. Read the comment docs.

Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be very helpful!

@ctb ctb merged commit 5bd2e35 into latest Feb 2, 2022
@ctb ctb deleted the add/abund_print branch February 2, 2022 21:41
@drtamermansour
Copy link

This is VERY useful.
I suggested using the string "abundance ignored" instead of "unweighted" to be more informative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

be clearer about when abundance weighting is used in gather output
3 participants