Skip to content

Commit

Permalink
Uniquify csv output from multigather
Browse files Browse the repository at this point in the history
Hello!
Hope you and yours are doing well. I am using multigather to query protein sequences against each other, and primarily use the csv file for downstream processing. As is, if the signatures in `--query query.sig` were created from one fasta file with e.g. `--singleton`, then all results iteratively overwrite each other into the same csv file. This proposed change adds the md5sum to the query file to ensure uniqueness. Other suggestions are welcome!

(not tested yet)

Warmest,
Olga
  • Loading branch information
olgabot authored May 28, 2022
1 parent 99c3997 commit db08838
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions src/sourmash/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -998,9 +998,10 @@ def multigather(args):
query_filename = query.filename
if not query_filename:
# use md5sum if query.filename not properly set
query_filename = query.md5sum()

output_base = os.path.basename(query_filename)
output_base = query.md5sum()
else:
# Uniquify the output file if all signatures were made from the same file (e.g. with --singleton)
output_base = os.path.basename(query_filename) + "." + query.md5sum()
output_csv = output_base + '.csv'

w = None
Expand Down

0 comments on commit db08838

Please sign in to comment.