You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SelectVariants (all versions -- tested most recently with 4.0.9.0)
Description
SelectVariants has an option for excluding specific sites by rsID, --exclude-ids. Currently the matching is done by exact string matching. The limitation is that if I have a site with two rsIDs (as appears in the 1000 Genomes dataset), I can't filter against just one of them. For example, if I want to exclude this site from 1000G participant HG00096 (because it makes my synthetic data generator choke, if you must know):
I have to specify --exclude-ids 'rs554836707;rs56090907' in my SelectVariants command. I would like to be able to specify just --exclude-ids rs554836707 or --exclude-ids rs56090907 with the same result (the site is excluded). Right now if I do use just one like that, the site is not excluded because the pattern fails to match.
The reason I want to be able to do that is because there are several other similar sites that I need to exclude, so I make an array of rsIDs to exclude, e.g. provide
public boolean test(final VariantContext vc) {
return includeIDs.contains(vc.getID());
}
This seems like it should do the right thing, except that vc.getID() is a String, not a List<String>. @vdauwera can we assume that a VariantContext ID will not contain a semicolon, and therefore we can parse by splitting on ';'?
* Changed SelectVariants so that it van handle multiple rsIDs separated by ';' in a VCF file.
* The tests testKeepSelectionIDLiteral and testKeepSelectionIDFromFile broke due to the change in the test file complexExample1.vcf. I modified the file testSelectVariants_KeepSelectionID.vcf appropriately so that the tests pass as they should.
* Changes due to review by David Benjamin and Phil Shapiro.
* Made the code in VariantIDsVariantFilter's test function more concise.
Feature request
Tool(s) or class(es) involved
SelectVariants (all versions -- tested most recently with 4.0.9.0)
Description
SelectVariants has an option for excluding specific sites by rsID,
--exclude-ids
. Currently the matching is done by exact string matching. The limitation is that if I have a site with two rsIDs (as appears in the 1000 Genomes dataset), I can't filter against just one of them. For example, if I want to exclude this site from 1000G participant HG00096 (because it makes my synthetic data generator choke, if you must know):I have to specify
--exclude-ids 'rs554836707;rs56090907'
in my SelectVariants command. I would like to be able to specify just--exclude-ids rs554836707
or--exclude-ids rs56090907
with the same result (the site is excluded). Right now if I do use just one like that, the site is not excluded because the pattern fails to match.The reason I want to be able to do that is because there are several other similar sites that I need to exclude, so I make an array of rsIDs to exclude, e.g. provide
as an input to my WDL, which handles it in the command block as
Right now I have to use
which is stylistically ugly and would not cover cases where only one rsID is annotated for whatever reason.
The text was updated successfully, but these errors were encountered: