Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop a VEP for non-human organisms #28

Closed
biojiangke opened this issue May 24, 2024 · 1 comment
Closed

Develop a VEP for non-human organisms #28

biojiangke opened this issue May 24, 2024 · 1 comment

Comments

@biojiangke
Copy link

Thanks for developing the application! GPN-MSA seems to be a great approach for predicting variant effects, especially for variants outside protein coding regions. I have a quick question about its application on non-human organisms: based on my understanding of the process, if I'd like to build a VEP for a non-human organism (our target organism), I need to use the MSA data set and re-train a model for the target organism, then run the "VEP" process on that model. Is this the correct way to do it?

@gonzalobenegas
Copy link
Collaborator

Hello, thanks for your interest! That would be the right approach. The challenge is getting the MSA. The current code starts from an alignment in MAF format referenced to the target organism. This is available for many organisms in UCSC Genome Browser downloads (see Multiple alignments).

One of the annoying aspects about MAF format is that it is referenced to a single species, so you can't repurpose the same file for other species. This is possible with HAL format (e.g. from HAL you can generate a MAF referenced on any of its species, but it can be slow).

Our current code for processing alignments (using MAF->Zarr) is not optimal. If starting again I'd take a look at https://github.com/ComparativeGenomicsToolkit/taffy/ as an alternative. In the end, what you need for training is being able to random access windows of the genome, ideally with multiple threads at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants