You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when @bluegenes was digging into some classification results, she discovered that gather was not outputting all of the prefetch results (as evaluated by comparing to sourmash prefetch -o ...).
(IMO they should always be the same.)
This is a bug / behavior change introduced in #2116, merged into latest in #2111, and released in sourmash v4.4.2.
this means that when sketches with identical contents are encountered, only one is stored;
and then when the list of all signatures is retrieved with CounterGather.signatures() duplicates are omitted;
The relevant code in commands.py::gather is:
fordbindatabases:
counter=Nonetry:
counter=db.counter_gather(prefetch_query, args.threshold_bp)
exceptValueError:
# catch "no signatures to search" ValueError if empty db. continuesave_prefetch.add_many(counter.signatures())
ifprefetch_csvout_fp:
forfound_sigincounter.signatures():
... # write info to CSV
and here .signatures() is not returning all signatures.
I'm not 100% sure how to resolve this. One option would be to adjust CounterGather to save prefetch results internally; possible, but maybe messy? Another would be to store all the signatures in a list, too.
Ideally any solution would result in the same code being used in commands.py::gather and commands.py::prefetch so that this doesn't happen again ;). And of course we'll want some tests.
The text was updated successfully, but these errors were encountered:
when @bluegenes was digging into some classification results, she discovered that
gather
was not outputting all of the prefetch results (as evaluated by comparing tosourmash prefetch -o ...
).(IMO they should always be the same.)
This is a bug / behavior change introduced in #2116, merged into
latest
in #2111, and released in sourmash v4.4.2.In brief,
CounterGather
and relatedIndex
code. #2116, I changedCounterGather
over to using a dict with md5sum as key to store signatures;CounterGather.signatures()
duplicates are omitted;The relevant code in
commands.py::gather
is:and here
.signatures()
is not returning all signatures.I'm not 100% sure how to resolve this. One option would be to adjust
CounterGather
to save prefetch results internally; possible, but maybe messy? Another would be to store all the signatures in a list, too.Ideally any solution would result in the same code being used in
commands.py::gather
andcommands.py::prefetch
so that this doesn't happen again ;). And of course we'll want some tests.The text was updated successfully, but these errors were encountered: