Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-Ensembl GTF testing #110

Merged
merged 23 commits into from
Jan 12, 2021
Merged

Non-Ensembl GTF testing #110

merged 23 commits into from
Jan 12, 2021

Conversation

hoelzer
Copy link
Contributor

@hoelzer hoelzer commented Dec 8, 2020

I'm testing the workflow for genome/annotation that are not derived from Ensembl. But still, the GTF follows the Ensembl structure (gene, transcript, exon).

The test worked surprisingly well :) I just run into a general issue with a plotting function that failed because after p-value filtering fewer genes were remaining than defined by ntop.

Please leave this PR open: I will check the output and improve some of the scripts to plot still meaningful output even if we can not extract information such as gene_biotype from the GTF. Also, we don't want to link to Ensembl with a GTF that does not have matching IDs.

@hoelzer
Copy link
Contributor Author

hoelzer commented Dec 16, 2020

@MarieLataretu actually I think that's all for now. So this branch works with my custom GTF-styled annotation file. Sure, there can be more generalized and such, but I think it's fine for now. If you agree, we can merge (how? should we first merge the current master into that branch? o_O)

bin/deseq2.R Outdated
Comment on lines 171 to 178
plot.heatmap.top_fc <- function(out.dir, resFold, trsf_data, trsf_type, ntop, pcutoff='', samples.info=df.samples.info, genes.info=df.gene.anno) {
selected.ensembl.ids <- row.names(resFold[order(resFold$log2FoldChange, decreasing=TRUE), ])[1:ntop]
# check how many elements are in the dataframe
# if less elements are in the dataframe than selected by ntop, reduce ntop
if (length(resFold$log2FoldChange) < ntop) {
ntop = length(resFold$log2FoldChange)
}
if (ntop > 1) {
selected.ensembl.ids <- row.names(resFold[order(resFold$log2FoldChange, decreasing=TRUE), ])[1:ntop]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed that also in #123 ☺️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed that also in #123

:D

@MarieLataretu
Copy link
Collaborator

MarieLataretu commented Jan 6, 2021

I just run the test profile (so normal Ensembl annotation):

  • fixed a small bug in the reporting tools ruby script
  • renamed the refactored reporting tools table to ..._extended.html, because it's more informative
  • deseq2 tables look good 👍

non-Ensembl test is scheduled for today.


(how? should we first merge the current master into that branch? o_O)

My first guess would be to merge this branch into master (there were no changes in the master in this module).
We should be careful thought, if we merge #123 into the master before. Maybe we should even merge #123 into this branch before and test, because there are some changes in the annotation preparation

@hoelzer
Copy link
Contributor Author

hoelzer commented Jan 6, 2021

(how? should we first merge the current master into that branch? o_O)

My first guess would be to merge this branch into master (there were no changes in the master in this module).
We should be careful thought, if we merge #123 into the master before. Maybe we should even merge #123 into this branch before and test, because there are some changes in the annotation preparation

Ah I see. Just checked #123 and agree, we could merge the changes from #123 into this branch #110, check if everything still works (I can also do a test again with my customized GTF file) and then merge into master?

@hoelzer hoelzer mentioned this pull request Jan 6, 2021
@MarieLataretu
Copy link
Collaborator

So this branch works with my custom GTF-styled annotation file.

What is the difference to a normal Ensembl annotation? Right now we still expect 'geneandexonwithgene_id`s in the description, do we?

@hoelzer
Copy link
Contributor Author

hoelzer commented Jan 7, 2021

So this branch works with my custom GTF-styled annotation file.

What is the difference to a normal Ensembl annotation? Right now we still expect 'geneandexonwithgene_id`s in the description, do we?

yep, the main difference I checked was that there is ne ENSxxxxxxx ID. So my structure of the GTF is still a valid hierarchical GTF and looks like this:

contig_19_segment0_pilon_pilon	ID	gene	17	11637	.	-	.	gene_id "IDG1"; transcript_id "IDT1.1"; annotation_tag "hybrid_flye_pilon_0"; hmm_matches "NA"; homologies "NA"; gene_name "IDG1";
contig_19_segment0_pilon_pilon	ID	transcript	17	11637	.	-	.	gene_id "IDG1"; transcript_id "IDT1.1"; annotation_tag "hybrid_flye_pilon_0"; hmm_matches "NA"; homologies "NA"; 
contig_19_segment0_pilon_pilon	ID	exon	17	6319	.	-	.	gene_id "IDG1"; transcript_id "IDT1.1"; exon_id "IDE1.1.1"; 
contig_19_segment0_pilon_pilon	ID	exon	6474	11637	.	-	.	gene_id "IDG1"; transcript_id "IDT1.1"; exon_id "IDE1.1.2"; 

So I think that at least any valid GTF formatted file with a gene and exon feature and a gene_id in the descr column should work.

@MarieLataretu
Copy link
Collaborator

Ah I see. Just checked #123 and agree, we could merge the changes from #123 into this branch #110, check if everything still works (I can also do a test again with my customized GTF file) and then merge into master?

We now can test on #123, because git is confusing sometimes

@MarieLataretu MarieLataretu self-assigned this Jan 12, 2021
@MarieLataretu
Copy link
Collaborator

These three commands run without errors, the reprotingTools tables look fine and random checks of the output also 👍

nextflow run main.nf -w work -profile test,local,conda -resume --max_cores 3 --softlink_results --featurecounts_additional_params '-t exon -g transcript_id' --feature_id_type 'ensembl_transcript_id'
nextflow run main.nf -w work -profile local,conda,test -resume --max_cores 3 --softlink_results --featurecounts_additional_params '-t exon -g exon_id' --feature_id_type 'ensembl_exon_id' --output results/exon
nextflow run main.nf -w work -profile local,conda,test -resume --max_cores 3 --softlink_results

From my side we can merge into master!

@hoelzer
Copy link
Contributor Author

hoelzer commented Jan 12, 2021

yeah! I scrolled the changes again - from my side please merge! Then we also do a new release?

And should we reply to Ahmed (#116) again? I think this issue thread started the whole transcript/exon input change

@MarieLataretu
Copy link
Collaborator

yeah! I scrolled the changes again - from my side please merge! Then we also do a new release?

Okay! Looks like a minor release to me.

And should we reply to Ahmed (#116) again? I think this issue thread started the whole transcript/exon input change

Yes, at least the non-Ensembl part should work!

@MarieLataretu MarieLataretu merged commit c3ee7b3 into master Jan 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants