Skip to content

Commit

Permalink
fetch-from-ncbi-virus: Pass options directly to ncbi-virus-url
Browse files Browse the repository at this point in the history
Simplify the bash script by directly passing these options to the script
that uses them.

This requires changing the order so that required arguments are
specified before options.
  • Loading branch information
victorlin committed Aug 31, 2023
1 parent 7b862f3 commit 34089c4
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 32 deletions.
30 changes: 5 additions & 25 deletions fetch-from-ncbi-virus
Original file line number Diff line number Diff line change
@@ -1,16 +1,10 @@
#!/usr/bin/env bash
# usage: fetch-from-ncbi-virus [options] <ncbi_taxon_id> <github_repo>
# usage: fetch-from-ncbi-virus <ncbi_taxon_id> <github_repo> [options]
#
# Fetch metadata and nucleotide sequences from [NCBI Virus](https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/)
# and output NDJSON records to stdout.
#
# options:
#
# --filter=<filter_query> Filter criteria to add as `fq` param values for the NCBI Virus URL
# May be specified multiple times.
#
# --field=<output_column_name>:<ncbi_virus_field_name> Metadata fields to add as `fl` param values for the NCBI Virus URL
# May be specified multiple times.
# [options] are passed directly to ncbi-virus-url. See that script for usage details.
#
# Originally copied from "bin/fetch-from-genbank" in nextstrain/ncov-ingest:
# https://github.com/nextstrain/ncov-ingest/blob/2a5f255329ee5bdf0cabc8b8827a700c92becbe4/bin/fetch-from-genbank
Expand All @@ -21,27 +15,13 @@ bin="$(dirname "$0")"


main() {
declare -a filters
declare -a fields

for arg; do
case "$arg" in
--filter=*)
filters+=("${arg#*=}")
shift;;
--field=*)
fields+=("${arg#*=}")
shift;;
*)
break;;
esac
done

local ncbi_taxon_id="${1:?NCBI taxon id is required.}"
local github_repo="${2:?A GitHub repository with owner and repository name is required as the second argument}"
declare -a ncbi_virus_url_options
ncbi_virus_url_options=("${@:3}")

local ncbi_virus_url
ncbi_virus_url="$("$bin"/ncbi-virus-url --ncbi-taxon-id "$ncbi_taxon_id" --filters "${filters[@]}" --fields "${fields[@]}")"
ncbi_virus_url="$("$bin"/ncbi-virus-url --ncbi-taxon-id "$ncbi_taxon_id" "${ncbi_virus_url_options[@]}")"

fetch "$ncbi_virus_url" "$github_repo" | "$bin"/csv-to-ndjson
}
Expand Down
12 changes: 5 additions & 7 deletions tests/fetch-from-ncbi-virus/filter-and-fields.t
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
Get the virus lineage IDs for 4 early Dengue sequences, testing the options --filter and --field.

$ $TESTDIR/../../fetch-from-ncbi-virus \
> --filter='CreateDate_dt:([1987-11-29T00:00:00Z TO 1987-11-29T00:00:01Z])' \
> --field='viruslineage_ids:VirusLineageId_ss' \
> 12637 nextstrain/ingest
$ $TESTDIR/../../fetch-from-ncbi-virus 12637 nextstrain/ingest \
> --filters 'CreateDate_dt:([1987-11-29T00:00:00Z TO 1987-11-29T00:00:01Z])' \
> --fields 'viruslineage_ids:VirusLineageId_ss'
{"genbank_accession":"X05375","genbank_accession_rev":"X05375.1","database":"GenBank","strain":"","region":"","location":"","collected":"","submitted":"1987-11-29T00:00:00Z","updated":"2016-07-26T00:00:00Z","length":"360","host":"","isolation_source":"","bioproject_accession":"","biosample_accession":"","sra_accession":"","title":"Dengue virus type 2 genomic RNA for envelope protein E N-term","authors":"Biedrzycka,A., Cauchi,M.R., Bartholomeusz,A., Gorman,J.J., Wright,P.J.","submitting_organization":"","publications":"2952760","sequence":"GTAACTTATGGGACGTGTACCACCACAGGAGAACACAGAAGAGAAAAAAGATCAGTGGCACTCGTTCCACATGTGGGAATGGGACTGGAGACACGAACTGAAACATGGATGTCATCAGAAGGGGCCTGGAAACATGCCCAGAGAATTGAAACTTGGATCTTGAGACATCCAGGCTTTACCATAATGGCAGCAATCCTGGCATACACCATAGGAACGACACATTTCCAAAGAGCCCTGATTTTCATCTTACTGACAGCTGTCGCTCCTTCAATGACAATGCGTTGCATAGGAATATCAAATAGAGACTTTGTAGAAGGGGTTTCAGGAGGAAGCTGGGTTGACATAGTCTTAGAACATGGA","viruslineage_ids":"10239,2559587,2732396,2732406,2732462,2732545,11050,11051,12637,11060"}
{"genbank_accession":"X05376","genbank_accession_rev":"X05376.1","database":"GenBank","strain":"","region":"","location":"","collected":"","submitted":"1987-11-29T00:00:00Z","updated":"2016-07-26T00:00:00Z","length":"360","host":"","isolation_source":"","bioproject_accession":"","biosample_accession":"","sra_accession":"","title":"Dengue virus type 2 genomic RNA for NS1 protein N-term","authors":"Biedrzycka,A., Cauchi,M.R., Bartholomeusz,A., Gorman,J.J., Wright,P.J.","submitting_organization":"","publications":"2952760","sequence":"ACAACAATGAGGGGAGCGAAGAGAATGGCCATTTTAGGTGACACAGCTTGGGATTTTGGATCCCTGGGAGGAGTGTTTACATCTATAGGAAAGGCTCTCCACCAAGTTTTCGGAGCAATCTATGGGGCTGCCTTCAGTGGGGTCTCATGGACTATGAAAATCCTCATAGGAGTCATTATCACATGGATAGGAATGAATTCACGCAGCACCTCACTTTCTGTGTCACTAGTATTGGTGGGAGTCGTGACGCTGTATTTGGGAGTTATGGTGCAGGCCGATAGTGGTTGCGTTGTGAGCTGGAAAAACAAAGAACTGAAGTGTGGCAGTGGGATTTTCATCACAGACAACGTGCACACATGG","viruslineage_ids":"10239,2559587,2732396,2732406,2732462,2732545,11050,11051,12637,11060"}
{"genbank_accession":"X05377","genbank_accession_rev":"X05377.1","database":"GenBank","strain":"","region":"","location":"","collected":"","submitted":"1987-11-29T00:00:00Z","updated":"2016-07-26T00:00:00Z","length":"360","host":"","isolation_source":"","bioproject_accession":"","biosample_accession":"","sra_accession":"","title":"Dengue virus type 2 genomic RNA for NS3 protein N-term","authors":"Biedrzycka,A., Cauchi,M.R., Bartholomeusz,A., Gorman,J.J., Wright,P.J.","submitting_organization":"","publications":"2952760","sequence":"CTCACTGTGTGCTACGTGCTCACTGGACGATCGGCCGATTTGGAACTGGAGAGAGCCGCCGATGTCAAATGGGAAGATCAGGCAGAGATATCAGGAAGCAGTCCAATCCTGTCAATAACAATATCAGAAGATGGTAGCATGTCGATAAAAAACGAAGAGGAAGAACAAACACTGACCATACTCATTAGAACAGGATTGCTGGTGATCTCAGGACTTTTTCCTGTATCAATACCAATCACGGCAGCAGCATGGTACCTGTGGGAAGTGAAGAAACAACGGGCTGGAGTATTGTGGGATGTCCCTTCACCCCCACCCGTGGGAAAGGCTGAACTGGAAGATGGAGCCTATAGAATCAAGCAA","viruslineage_ids":"10239,2559587,2732396,2732406,2732462,2732545,11050,11051,12637,11060"}
{"genbank_accession":"X05378","genbank_accession_rev":"X05378.1","database":"GenBank","strain":"","region":"","location":"","collected":"","submitted":"1987-11-29T00:00:00Z","updated":"2016-07-26T00:00:00Z","length":"360","host":"","isolation_source":"","bioproject_accession":"","biosample_accession":"","sra_accession":"","title":"Dengue virus type 2 genomic RNA for NS5 protein N-term","authors":"Biedrzycka,A., Cauchi,M.R., Bartholomeusz,A., Gorman,J.J., Wright,P.J.","submitting_organization":"","publications":"2952760","sequence":"GATCCAATACCCTATGATCCAAAGTTTGAAAAGCAGTTGGGACAAGTAATGCTCCTAGTCCTCTGCGGGACTCAAGTGTTGATGATGAGGACTACATGGGCTCTGTGTGAGGCTTTAACCTTAGCGACCGGGCCTATCTCCACATTGTGGGAAGGAAATCCAGGGAGGTTTTGGAACACTACCATTGCAGTGTCAATGGCTAACATTTTTAGAGGGAGTTACTTGGCCGGAGCTGGACTTCTCTTTTCCATCATGAAGAACACAACCAACACGAGAAGGGGAACTGGCAACATAGGAGAGACGCTTGGAGAGAAATGGAAAAGCCGATTGAACGCATTGGGGAAAAGTGAATTCCAGATC","viruslineage_ids":"10239,2559587,2732396,2732406,2732462,2732545,11050,11051,12637,11060"}

Do the same but without --field.

$ $TESTDIR/../../fetch-from-ncbi-virus \
> --filter='CreateDate_dt:([1987-11-29T00:00:00Z TO 1987-11-29T00:00:01Z])' \
> 12637 nextstrain/ingest
$ $TESTDIR/../../fetch-from-ncbi-virus 12637 nextstrain/ingest \
> --filters 'CreateDate_dt:([1987-11-29T00:00:00Z TO 1987-11-29T00:00:01Z])'
{"genbank_accession":"X05375","genbank_accession_rev":"X05375.1","database":"GenBank","strain":"","region":"","location":"","collected":"","submitted":"1987-11-29T00:00:00Z","updated":"2016-07-26T00:00:00Z","length":"360","host":"","isolation_source":"","bioproject_accession":"","biosample_accession":"","sra_accession":"","title":"Dengue virus type 2 genomic RNA for envelope protein E N-term","authors":"Biedrzycka,A., Cauchi,M.R., Bartholomeusz,A., Gorman,J.J., Wright,P.J.","submitting_organization":"","publications":"2952760","sequence":"GTAACTTATGGGACGTGTACCACCACAGGAGAACACAGAAGAGAAAAAAGATCAGTGGCACTCGTTCCACATGTGGGAATGGGACTGGAGACACGAACTGAAACATGGATGTCATCAGAAGGGGCCTGGAAACATGCCCAGAGAATTGAAACTTGGATCTTGAGACATCCAGGCTTTACCATAATGGCAGCAATCCTGGCATACACCATAGGAACGACACATTTCCAAAGAGCCCTGATTTTCATCTTACTGACAGCTGTCGCTCCTTCAATGACAATGCGTTGCATAGGAATATCAAATAGAGACTTTGTAGAAGGGGTTTCAGGAGGAAGCTGGGTTGACATAGTCTTAGAACATGGA"}
{"genbank_accession":"X05376","genbank_accession_rev":"X05376.1","database":"GenBank","strain":"","region":"","location":"","collected":"","submitted":"1987-11-29T00:00:00Z","updated":"2016-07-26T00:00:00Z","length":"360","host":"","isolation_source":"","bioproject_accession":"","biosample_accession":"","sra_accession":"","title":"Dengue virus type 2 genomic RNA for NS1 protein N-term","authors":"Biedrzycka,A., Cauchi,M.R., Bartholomeusz,A., Gorman,J.J., Wright,P.J.","submitting_organization":"","publications":"2952760","sequence":"ACAACAATGAGGGGAGCGAAGAGAATGGCCATTTTAGGTGACACAGCTTGGGATTTTGGATCCCTGGGAGGAGTGTTTACATCTATAGGAAAGGCTCTCCACCAAGTTTTCGGAGCAATCTATGGGGCTGCCTTCAGTGGGGTCTCATGGACTATGAAAATCCTCATAGGAGTCATTATCACATGGATAGGAATGAATTCACGCAGCACCTCACTTTCTGTGTCACTAGTATTGGTGGGAGTCGTGACGCTGTATTTGGGAGTTATGGTGCAGGCCGATAGTGGTTGCGTTGTGAGCTGGAAAAACAAAGAACTGAAGTGTGGCAGTGGGATTTTCATCACAGACAACGTGCACACATGG"}
{"genbank_accession":"X05377","genbank_accession_rev":"X05377.1","database":"GenBank","strain":"","region":"","location":"","collected":"","submitted":"1987-11-29T00:00:00Z","updated":"2016-07-26T00:00:00Z","length":"360","host":"","isolation_source":"","bioproject_accession":"","biosample_accession":"","sra_accession":"","title":"Dengue virus type 2 genomic RNA for NS3 protein N-term","authors":"Biedrzycka,A., Cauchi,M.R., Bartholomeusz,A., Gorman,J.J., Wright,P.J.","submitting_organization":"","publications":"2952760","sequence":"CTCACTGTGTGCTACGTGCTCACTGGACGATCGGCCGATTTGGAACTGGAGAGAGCCGCCGATGTCAAATGGGAAGATCAGGCAGAGATATCAGGAAGCAGTCCAATCCTGTCAATAACAATATCAGAAGATGGTAGCATGTCGATAAAAAACGAAGAGGAAGAACAAACACTGACCATACTCATTAGAACAGGATTGCTGGTGATCTCAGGACTTTTTCCTGTATCAATACCAATCACGGCAGCAGCATGGTACCTGTGGGAAGTGAAGAAACAACGGGCTGGAGTATTGTGGGATGTCCCTTCACCCCCACCCGTGGGAAAGGCTGAACTGGAAGATGGAGCCTATAGAATCAAGCAA"}
Expand Down

0 comments on commit 34089c4

Please sign in to comment.