Skip to content
This repository has been archived by the owner on Feb 7, 2018. It is now read-only.

GQs being dropped for some sites and not others #10

Closed
winni-genp opened this issue Jun 7, 2016 · 6 comments
Closed

GQs being dropped for some sites and not others #10

winni-genp opened this issue Jun 7, 2016 · 6 comments

Comments

@winni-genp
Copy link
Contributor

Hey,
I am seeing some GQs missing in some SNPs of agg output. I can't figure out why some SNPs have GQs and others do not. Notice in the excerpt below how position 752505 has a SNP that has GQs, and position 752566 has a SNP that does not have GQs. I have checked in the input per-sample gVCFs that all samples have a GQ at position 752566.

Anyone have any idea what might be causing this?

1   752502  .   A   <NON_REF>   95  .   AN=260;AC=0;PF=0;AD=18710,0;DP=18710    GT:GQ:DP:DPF:AD:PF  0/0:30:168:.:.,.:.  0/0:40:213:.:.,.:.  0/0:30:285:.:.,.:.  0/0:50:93:.:.,.:.   0/0:20:232:.:.,.:.  0/0:30:189:.:.,.:.
1   752503  .   A   <NON_REF>   72  .   AN=260;AC=0;PF=0;AD=19112,0;DP=19112    GT:GQ:DP:DPF:AD:PF  0/0:30:168:.:.,.:.  0/0:40:213:.:.,.:.  0/0:30:285:.:.,.:.  0/0:50:93:.:.,.:.   0/0:20:232:.:.,.:.  0/0:30:189:.:.,.:.
1   752505  .   A   <NON_REF>   62  .   AN=260;AC=0;PF=0;AD=18692,0;DP=18692    GT:GQ:DP:DPF:AD:PF  0/0:30:168:.:.,.:.  0/0:40:213:.:.,.:.  0/0:30:285:.:.,.:.  0/0:50:93:.:.,.:.   0/0:20:232:.:.,.:.  0/0:30:189:.:.,.:.
1   752505  .   A   G   390 .   AN=260;AC=1;PF=1;AD=18751,23;DP=18774   GT:GQ:DP:DPF:AD:PF  0/0:30:168:.:.:.    0/0:40:213:.:.:.    0/0:30:285:.:.:.    0/0:50:93:.:.:. 0/0:20:232:.:.:.    0/0:30:189:.:.:.
1   752506  .   A   <NON_REF>   77  .   AN=260;AC=0;PF=0;AD=18731,0;DP=18731    GT:GQ:DP:DPF:AD:PF  0/0:30:168:.:.,.:.  0/0:40:213:.:.,.:.  0/0:30:285:.:.,.:.  0/0:50:93:.:.,.:.   0/0:20:232:.:.,.:.  0/0:30:189:.:.,.:.
1   752509  .   CA  C   501 .   AN=260;AC=4;PF=1;AD=18606,68;DP=18674   GT:GQ:DP:DPF:AD:PF  0/0:30:168:.:.:.    0/0:40:213:.:.:.    0/0:30:285:.:.:.    0/0:50:93:.:.:. 0/0:20:232:.:.:.    0/0:30:189:.:.:.
1   752511  .   A   <NON_REF>   105 .   AN=260;AC=0;PF=0;AD=18211,0;DP=18211    GT:GQ:DP:DPF:AD:PF  0/0:30:168:.:.,.:.  0/0:40:213:.:.,.:.  0/0:30:285:.:.,.:.  0/0:50:93:.:.,.:.   0/0:20:232:.:.,.:.  0/0:30:189:.:.,.:.
1   752512  .   A   <NON_REF>   67  .   AN=260;AC=0;PF=0;AD=18287,0;DP=18287    GT:GQ:DP:DPF:AD:PF  0/0:30:168:.:.,.:.  0/0:40:213:.:.,.:.  0/0:30:285:.:.,.:.  0/0:50:93:.:.,.:.   0/0:20:232:.:.,.:.  0/0:30:189:.:.,.:.
1   752514  .   T   <NON_REF>   97  .   AN=260;AC=0;PF=0;AD=18161,0;DP=18161    GT:GQ:DP:DPF:AD:PF  0/0:30:168:.:.,.:.  0/0:40:213:.:.,.:.  0/0:30:285:.:.,.:.  0/0:50:93:.:.,.:.   0/0:20:232:.:.,.:.  0/0:30:189:.:.,.:.
1   752535  .   A   <NON_REF>   59  .   AN=260;AC=0;PF=0;AD=18236,0;DP=18236    GT:GQ:DP:DPF:AD:PF  0/0:30:168:.:.,.:.  0/0:40:213:.:.,.:.  0/0:30:285:.:.,.:.  0/0:50:93:.:.,.:.   0/0:20:232:.:.,.:.  0/0:30:189:.:.,.:.
1   752540  .   A   <NON_REF>   83  .   AN=260;AC=0;PF=0;AD=17787,0;DP=17787    GT:GQ:DP:DPF:AD:PF  0/0:30:168:.:.,.:.  0/0:40:213:.:.,.:.  0/0:30:285:.:.,.:.  0/0:50:93:.:.,.:.   0/0:20:232:.:.,.:.  0/0:30:189:.:.,.:.
1   752566  .   G   A   2077    .   AN=260;AC=198;PF=1;AD=3102,4039;DP=7141 GT:GQ:DP:DPF:AD:PF  1/1:.:38:.:0,38:1   1/1:.:43:.:0,43:1   1/1:.:43:.:1,42:1   0/1:.:32:.:15,17:1  1/1:.:44:.:1,43:1   1/1:.:27:.:0,27:1
1   752567  .   T   <NON_REF>   111 .   AN=260;AC=0;PF=0;AD=2183,0;DP=2183  GT:GQ:DP:DPF:AD:PF  0/0:.:0:.:.,.:1 0/0:.:0:.:.,.:1 0/0:.:0:.:.,.:1 0/0:.:0:.:.,.:1 0/0:.:0:.:.,.:1 0/0:.:0:.:.,.:1
1   752593  .   T   G   442 .   AN=260;AC=12;PF=0.833333;AD=13070,251;DP=13321  GT:GQ:DP:DPF:AD:PF  0/0:60:105:.:.:.    0/0:70:114:.:.:.    0/0:70:123:.:.:.    0/0:40:81:.:.:. 0/0:80:134:.:.:.    0/0:40:82:.:.:.
1   752594  .   G   <NON_REF>   109 .   AN=260;AC=0;PF=0;AD=12617,0;DP=12617    GT:GQ:DP:DPF:AD:PF  0/0:60:105:.:.,.:.  0/0:70:114:.:.,.:.  0/0:70:123:.:.,.:.  0/0:40:81:.:.,.:.   0/0:80:134:.:.,.:.  0/0:40:82:.:.,.:.
@winni-genp winni-genp changed the title GQs being dropped for some sites GQs being dropped for some sites and not others Jun 7, 2016
@jaredo
Copy link
Contributor

jaredo commented Jun 7, 2016

Are these Illumina GVCFs? This looks like GATK output.

@jaredo
Copy link
Contributor

jaredo commented Jun 7, 2016

To elaborate, we use . for the ALT in homozygous reference blocks whilst GATK uses <NON_REF>. Neither of these are actually concordant with the VCF4.3 specification!

@winni-genp
Copy link
Contributor Author

This output is from another variant caller.

I'll try and come up with a minimal example of the problem.

@jaredo
Copy link
Contributor

jaredo commented Jun 7, 2016

Ah. This tool will only work with Illumina GVCFs. There are various assumptions it makes about the input format of the files. It is unlikely to work with GVCFs from other callers. This is not intentional exclusion, formats are simply different.

More generally, there is definitely a need for:

  1. variant callers (including Illumina) to converge on a GVCF standard. VCF4.3 spec outlines one
  2. a high quality gvcf merger that can then work with this format (agg is a prototype I hacked together)

There are really a lot of challenges in doing 2. properly. I am pretty concerned about indels and smarter people than me have commented on this.

See for example:

freebayes/freebayes#76
samtools/hts-specs#77
ga4gh/ga4gh-schemas#169

@winni-genp
Copy link
Contributor Author

Ok, I was able to debug this problem as an error on my side.

The VCF I was working with only had the GQX and not the GQ format tag specified. I think the GQs I was getting after genotyping were just random garbage.

@jaredo
Copy link
Contributor

jaredo commented Jun 7, 2016

I have updated the readme to emphasise this tool won't work with all GVCFs.

I will also add some sanity checks to ingest1 to try and catch these problems earlier on to prevent people losing their time.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants