-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multigather
CSV output uses signature filename
as basename.
#2328
Comments
…2322) This PR fixes #2321 so that more than one output line is placed in the CSV. Oops! It also adds a notification of what the CSV output file name is. Last but not least, it supports `--output-dir` as a way to set the base path for all output files. Fixes #2321. TODO: - [x] add tests - [x] make sure filename output behavior is documented - [x] consider adding an option to have multigather save CSV results in some other way, like by md5 or ...something. - punted to #2328
should support |
This is tackled over in #2065 by @olgabot. A few observations and opinions -
Provisional resolution per #2722 would be -
|
Yes to these!
|
A few more thoughts on #2722 -
|
Taking a step back - what do we want to be able to do with multigather?
Things to confirm:
Things to resolve:
|
Just adding a vote here for allowing
This would likely be especially useful when dealing with extremely large numbers of queries and/or for contig-level gather. |
note also connection with contig gather #2564 - sketch genome with |
…t-add-query-md5sum` (#2722) This PR: - adds documentation for `multigather` to sourmash docs! - builds on #2065 / #2721 so that tests pass. - adds an option `-U/--output-add-query-md5sum` to `sourmash multigather` - adds an option `--force-allow-overwrite-output` to `sourmash multigather` - **CHANGES BEHAVIOR** of multigather by treating `query.filename == '-'` as if `query.filename` is empty, thus replacing it with md5sum - **CHANGES BEHAVIOR** of multigather by failing loudly and clearly if output files are going to be overwritten - adds `-E/--extension` to allow output to files other than `.sig` See discussion over in [#2328: `multigather` CSV output uses signature `filename` as basename](#2328). To add: - [x] tests for `-U`; - [x] implement and test `-E/--extension` - [x] implement and test `--force-allow-overwrite-output` - [x] fix for `query_filename` being None/empty in `-U` branch - [x] documentation update for changed output behavior for multigather: '-' => using md5sum - [x] documentation update for changed output behavior for multigather: fails if overwrite happens - [x] fix multigather link in docs --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Olga Botvinnik <olga.botvinnik@gmail.com> Co-authored-by: Keya Barve <53328492+keyabarve@users.noreply.github.com> Co-authored-by: ccbaumler <63077899+ccbaumler@users.noreply.github.com> Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Taylor Reiter <taylorreiter@gmail.com> Co-authored-by: Erik Young <eeyoung@ucdavis.edu> Co-authored-by: David Koslicki <dmkoslicki@gmail.com> Co-authored-by: Luiz Irber <luizirber@users.noreply.github.com> Co-authored-by: Colton Baumler <baumlerc@farm.ucdavis.edu> Co-authored-by: Luiz Irber <contact+github@luizirber.org> Co-authored-by: N. Tessa Pierce-Ward <ntpierce@gmail.com> Co-authored-by: Peter Cock <p.j.a.cock@googlemail.com> Co-authored-by: Francesco Beghini <francesco.beghini@yale.edu> Co-authored-by: Jason Stajich <jason.stajich@ucr.edu> Co-authored-by: Katrin Leinweber <9948149+katrinleinweber@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
#2722 has been merged! I will look through this issue and extract undone things and useful ruminations into a new issue. |
In #2321 and #2322 we delve back into multigather... and I remembered how annoying the CSV output is, in that it is output to the signature
filename
for each query.At the very least it would be good to have there be an option to put it somewhere else, like an md5sum or something. For 4.x this would be an option and we could make it default for v5.
An alternative is to deprecate multigather per #1614.
The text was updated successfully, but these errors were encountered: