-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: allows nuc annotation to be pulled in through .gff
#950
Conversation
Currently, `augur translate` cannot output `nuc` annotation, if `.gff` is used as input. This is due to the features being loaded being overly restrictive to genes. Allowing `source` to be read in as features allows the `.gff` to contain nuc annotations in the following format: ``` MT903344.1 Genbank source 1 197233 . + . locus_tag=nuc ``` That way `augur export doesn't crash if a `.gff` is used as annotation input for translate.
Codecov Report
@@ Coverage Diff @@
## master #950 +/- ##
=======================================
Coverage 59.34% 59.34%
=======================================
Files 42 42
Lines 6011 6011
Branches 1539 1539
=======================================
Hits 3567 3567
Misses 2185 2185
Partials 259 259
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed during triage meeting today that it would be good to know which parts of Augur call this function, to get a sense of the side-effects this change might have. Along those same lines, it would be awesome to see a functional test for the use case you describe here to confirm that this simple change has the effect you want and that future changes to this code won't break that functionality.
It's been sooooo long since I looked at this, and I would have to go dig to go find old TB files (and I'm travelling without external HDD at the moment), but I'd be a little cautious here, as it flickers some vague memory inside of me from when I used to TB stuff. Unfortunately I really don't remember much of this so it would take some digging for me to remember.... so mostly just here to wave a caution flag to ensure that this works how we expect across VCF and various GFF files for, ex, bacteria. |
Thanks @emmahodcroft, the idea would be to bring Could you maybe add a functional integration test for TB so we can check and announce breaking changes or see how we can ensure compatibility in case things change? I'll close this as the scope I'd like this to take will be bigger than this simple one line fix. |
I'm afraid it's highly unlikely I go back to anything related to TB right now; I simply don't have time. But if any source files are needed for testing I can try to dig them up, dig up old TB repos or other VCF repos etc. I probably can't make source data publicly available though - none of it came from public repos. |
Currently,
augur translate
cannot outputnuc
annotation, if.gff
is used as input. This is due to the features being loaded being overly restrictive to genes.Allowing
source
to be read in as features allows the.gff
to contain nuc annotations in the following format:That way
augur export
doesn't crash if a.gff
is used as annotation input for translate.Related issue(s)
Related to #881
This fix allows me to use the same
genemap.gff
for both Nextclade and the workflow that creates the Nextclade ref tree. This is very good for consistency and reduced maintenance effort when Monkeypox genemap is changing a lot.Testing
I don't know if this touches any other functionality. It's a util function so there's a risk. But then at the same time, it shouldn't be wrong to output a genemap with nuc annotated.
CI passes so looks good. One may want to systematically check every function where this gff load utility is used. But first would be good to see if this PR is usable or it has some systematic flaw.