The VCF Annotator tool provides a Python class for annotating VCFs with VRS Allele IDs. A command-line interface is available for accessing these functions from a shell or shell script.
Note:
The examples run from the root of the vrs-python directory and assumes that input.vcf.gz
lives in the current directory
To see the help page:
vrs-annotate vcf --help
Like other VRS-Python tools, the VCF annotator requires access to sequence and identifier data services, as implemented in libraries like SeqRepo. By default, the CLI will attempt to connect to a SeqRepo REST instance at http://localhost:5000/seqrepo
, but a URI can be passed with the --dataproxy_uri
option or set with the GA4GH_VRS_DATAPROXY_URI
environment variable (the former takes priority over the latter).
For example, to use a local set of SeqRepo data, you can use an absolute file path:
vrs-annotate vcf --dataproxy_uri="seqrepo+file:///usr/local/share/seqrepo/2024-02-20/" --vcf_out=out.vcf.gz input.vcf.gz
Alternative, a relative file path:
vrs-annotate vcf --dataproxy_uri="seqrepo+../seqrepo/2024-02-20/" --vcf_out=out.vcf.gz input.vcf.gz
Or an alternate REST path:
vrs-annotate vcf --dataproxy_uri="seqrepo+http://mylabwebsite.org/seqrepo" --vcf_out=out.vcf.gz input.vcf.gz
--vrs_attributes
Will include VRS_Start, VRS_End, VRS_State fields in the INFO field.
--assembly
[TEXT]
The assembly that the
vcf_in
data uses. [default: GRCh38]
--skip_ref
Skip VRS computation for REF alleles.
--require_validation
Require validation checks to pass in order to return a VRS object
--help
Show the options available