You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running the demo data in charcoal produces a ValueError for threshold_bp.
I added a print statement to the charcoal file charcoal/contigs_list_contaminants.py to print the length of the minhash.
checkpoint hitlist_make_contigs_matches: [15/1736]
input: demo/genomes/LoombaR_2017__SID1050_bax__bin.11.fa.gz, output.demo/stage1/LoombaR_2017__SID1050_bax__bin.11.fa.gz.sig, output.demo/stage1/LoombaR_2017__SID1050_bax__bin.11.fa.gz.matches.
zip, demo/demo-lineages.csv, output.demo/stage1_hitlist.csv
output: output.demo/stage2/LoombaR_2017__SID1050_bax__bin.11.fa.gz.matches.json
jobid: 34
wildcards: g=LoombaR_2017__SID1050_bax__bin.11.fa.gz
Downstream jobs will be updated after completion.
Activating conda environment: /home/tereiter/github/charcoal/.snakemake/conda/8763af208a097216f72fb7c92a9f00f6
examining spreadsheet headers...
** assuming column 'accession' is identifiers in spreadsheet
loaded 64 tax assignments.
loaded 29 matches from 'output.demo/stage1/LoombaR_2017__SID1050_bax__bin.11.fa.gz.matches.zip'
loaded 29 signatures & created LCA Database
reading contigs from LoombaR_2017__SID1050_bax__bin.11.fa.gz
threshold_bp is 3000
mh length is 201
mh length is 165
mh length is 155
mh length is 164
mh length is 133
mh length is 114
mh length is 105
mh length is 106
mh length is 86
mh length is 89
mh length is 79
mh length is 89
mh length is 84
mh length is 71
mh length is 59
mh length is 72
mh length is 57
mh length is 60
mh length is 67
mh length is 51
mh length is 63
mh length is 43
mh length is 36
mh length is 42
mh length is 50
mh length is 34
mh length is 27
mh length is 39
mh length is 17
Traceback (most recent call last):
File "/home/tereiter/github/charcoal/.snakemake/conda/8763af208a097216f72fb7c92a9f00f6/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/tereiter/github/charcoal/.snakemake/conda/8763af208a097216f72fb7c92a9f00f6/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/tereiter/github/charcoal/charcoal/contigs_list_contaminants.py", line 208, in <module>
returncode = cmdline(sys.argv[1:])
File "/home/tereiter/github/charcoal/charcoal/contigs_list_contaminants.py", line 203, in cmdline
return main(args)
File "/home/tereiter/github/charcoal/charcoal/contigs_list_contaminants.py", line 145, in main
for acc, match_lin, count in get_matches(mh, lca_db, lin_db, match_rank,
File "/home/tereiter/github/charcoal/charcoal/contigs_list_contaminants.py", line 33, in get_matches
results = lca_db.gather(query_sig, threshold_bp=threshold_bp)
File "/home/tereiter/github/charcoal/.snakemake/conda/8763af208a097216f72fb7c92a9f00f6/lib/python3.9/site-packages/sourmash/index.py", line 214, in gather
for result in self.prefetch(query, threshold_bp, **kwargs):
File "/home/tereiter/github/charcoal/.snakemake/conda/8763af208a097216f72fb7c92a9f00f6/lib/python3.9/site-packages/sourmash/index.py", line 204, in prefetch
search_fn = make_gather_query(query.minhash, threshold_bp,
File "/home/tereiter/github/charcoal/.snakemake/conda/8763af208a097216f72fb7c92a9f00f6/lib/python3.9/site-packages/sourmash/search.py", line 70, in make_gather_query
raise ValueError("requested threshold_bp is unattainable with this query")
ValueError: requested threshold_bp is unattainable with this query
So with a threshold bp of 3000, an mh of 17 fails:
def make_gather_query(query_mh, threshold_bp, *, best_only=True):
"Make a search object for gather."
if not query_mh:
raise ValueError("query is empty!?")
scaled = query_mh.scaled
if not scaled:
raise TypeError("query signature must be calculated with scaled")
# are we setting a threshold?
threshold = 0
if threshold_bp:
if threshold_bp < 0:
raise TypeError("threshold_bp must be non-negative")
# if we have a threshold_bp of N, then that amounts to N/scaled
# hashes:
n_threshold_hashes = threshold_bp / scaled
# that then requires the following containment:
threshold = n_threshold_hashes / len(query_mh)
# is it too high to ever match? if so, exit.
if threshold > 1.0:
raise ValueError("requested threshold_bp is unattainable with this query")
if best_only:
search_obj = JaccardSearchBestOnly(SearchType.CONTAINMENT,
threshold=threshold)
else:
search_obj = JaccardSearch(SearchType.CONTAINMENT,
threshold=threshold)
return search_obj
I don't really have a comment here, I think I'll just make a work around in charcoal, just wanted to document somewhere that this happens
The text was updated successfully, but these errors were encountered:
hi @taylorreiter this is intentional - it was added in #1392, and catches situations where the threshold_bp requested is nonsensical. I don't think this needs to be changed, but I'll take a look at the code in charcoal and figure out if we should be doing something different there.
Running the demo data in charcoal produces a
ValueError
forthreshold_bp
.I added a print statement to the charcoal file
charcoal/contigs_list_contaminants.py
to print the length of the minhash.So with a threshold bp of 3000, an mh of 17 fails:
I don't really have a comment here, I think I'll just make a work around in charcoal, just wanted to document somewhere that this happens
The text was updated successfully, but these errors were encountered: