-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No variants called in any haplotype when SNV is not in linkage with other SNVs? #18
Comments
Upon closer reading of your paper I see that you are aware of this already:
Is this still planned for an upcoming version? |
Hello, @AdmiralenOla Sorry for the late reply. Yes, we are aware of such behavior. It's not really clear what to do with such cases. Some samples may have plenty of such isolated SNVs. For example if you look and corona virus data it is pretty long and mutations are distributed on longer ranges than one read can cover. And we may have sites with 5-50% variant frequency. Should we try to attach such "orphan" mutations? To what haplotype then? Unfortunately, it is not clear where to get the information to get this decision. Even if we have just two pairs of linked SNVs far from each other, it is not clear if they come from the same haplotype or different. So we report two haplotypes. Those are shortcomings of the technology. |
Thanks for your reply, @vtsyvina. I agree that this is a limitation in the technology . However, you noted in your paper that you had a plan to address this limitation, and that got me curious. For example, some type of probabilistic framework that assumes the haplotypes have similar SNV frequency at all sites may in some cases be used to assign full-length haplotypes. |
Dear CliqueSNV team,
I've been experimenting with your tool and think perhaps I have found a bug. If there is a single, isolated SNV with no other SNVs in linkage within the mapping reads, i.e. distance to other SNVs is greater than read length, that SNV is never called in any of the haplotypes.
I'm trying to understand the algorithm described in your paper, and I guess this makes sense, because these SNVs are not in cliques with any other SNVs(?) But for some types of data it will mean that common haplotypes will not be present in the results. In one of my examples, there is a clear 45/55% distribution between C and T at a particular site, and the total read depth is around 30,000.
Graphic presentation of my problem.

I can provide bam files for testing if you'd like.
The text was updated successfully, but these errors were encountered: