Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allows nuc annotation to be pulled in through .gff #950

Closed
wants to merge 1 commit into from

Conversation

corneliusroemer
Copy link
Member

@corneliusroemer corneliusroemer commented May 25, 2022

Currently, augur translate cannot output nuc annotation, if .gff is used as input. This is due to the features being loaded being overly restrictive to genes.

Allowing source to be read in as features allows the .gff to contain nuc annotations in the following format:

MT903344.1	Genbank	source	1	197233	.	+	.	locus_tag=nuc

That way augur export doesn't crash if a .gff is used as annotation input for translate.

Related issue(s)

Related to #881

This fix allows me to use the same genemap.gff for both Nextclade and the workflow that creates the Nextclade ref tree. This is very good for consistency and reduced maintenance effort when Monkeypox genemap is changing a lot.

Testing

I don't know if this touches any other functionality. It's a util function so there's a risk. But then at the same time, it shouldn't be wrong to output a genemap with nuc annotated.

CI passes so looks good. One may want to systematically check every function where this gff load utility is used. But first would be good to see if this PR is usable or it has some systematic flaw.

Currently, `augur translate` cannot output `nuc` annotation, if `.gff` is used as input. This is due to the features being loaded being overly restrictive to genes.

Allowing `source` to be read in as features allows the `.gff` to contain nuc annotations in the following format:
```
MT903344.1	Genbank	source	1	197233	.	+	.	locus_tag=nuc
```

That way `augur export doesn't crash if a `.gff` is used as annotation input for translate.
@codecov
Copy link

codecov bot commented May 25, 2022

Codecov Report

Merging #950 (5df5d5a) into master (5916362) will not change coverage.
The diff coverage is 0.00%.

@@           Coverage Diff           @@
##           master     #950   +/-   ##
=======================================
  Coverage   59.34%   59.34%           
=======================================
  Files          42       42           
  Lines        6011     6011           
  Branches     1539     1539           
=======================================
  Hits         3567     3567           
  Misses       2185     2185           
  Partials      259      259           
Impacted Files Coverage Δ
augur/utils.py 64.65% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5916362...5df5d5a. Read the comment docs.

Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed during triage meeting today that it would be good to know which parts of Augur call this function, to get a sense of the side-effects this change might have. Along those same lines, it would be awesome to see a functional test for the use case you describe here to confirm that this simple change has the effect you want and that future changes to this code won't break that functionality.

@emmahodcroft
Copy link
Member

It's been sooooo long since I looked at this, and I would have to go dig to go find old TB files (and I'm travelling without external HDD at the moment), but I'd be a little cautious here, as it flickers some vague memory inside of me from when I used to TB stuff.

Unfortunately I really don't remember much of this so it would take some digging for me to remember.... so mostly just here to wave a caution flag to ensure that this works how we expect across VCF and various GFF files for, ex, bacteria.

@corneliusroemer
Copy link
Member Author

Thanks @emmahodcroft, the idea would be to bring augur translate in line with GFF3 official standard. The way it's done right now is haphazard and peculiar. It's not even standardised across .gb and .gff files created from the same source.

Could you maybe add a functional integration test for TB so we can check and announce breaking changes or see how we can ensure compatibility in case things change?

I'll close this as the scope I'd like this to take will be bigger than this simple one line fix.

@emmahodcroft
Copy link
Member

I'm afraid it's highly unlikely I go back to anything related to TB right now; I simply don't have time. But if any source files are needed for testing I can try to dig them up, dig up old TB repos or other VCF repos etc. I probably can't make source data publicly available though - none of it came from public repos.

@victorlin victorlin deleted the allow-nuc-in-gff branch June 30, 2023 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

3 participants