-
Notifications
You must be signed in to change notification settings - Fork 83
Standard annotation and reference files #241
Comments
will add to the next release |
Are there any other reference files that you think should be added as well? |
The only other thing I can think of offhand is a mapping file from biomart for ENS to Hugo to Entrez.. this was useful for me with the PDX paper to harmonize gene symbols (update old ones if algorithms used old mappings to current), but I'm not sure if this will be the case for us since we use more recent hg38 and the same ref for all DNA algorithms and same goes into STAR for both fusion algorithms. This was an issue in the MAF and the 4 diff fusion algorithms using different reference genomes. Eg: former MLL is now KMT2A. |
RNA reference used: https://www.gencodegenes.org/human/release_27.html And this gtf: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.primary_assembly.annotation.gtf.gz |
Added with #273 |
File(s)
*.gtf
Release
v9
Link to OpenPBTA-manuscript
Put a link to the relevant section of the OpenPBTA manuscript here.
Question/issue
As far as I am aware, we do not currently provide gtf or fasta files for downstream analysis in a standardized way with the repository or data download. Providing such files would ease analysis by new contributors (see #198 (comment) for example) and improve reproducibility. We do currently install a txdb version, but that may not be completely aligned with the upstream analysis.
Since much of the upstream analysis relies on Gencode v27, (e.g. [Gene expression abundance] (https://alexslemonade.github.io/OpenPBTA-manuscript/#gene-expression-abundance-estimation), this would seem a logical file to include.
Including the relevant reference fasta file could also be useful, though perhaps at a lower priority.
The text was updated successfully, but these errors were encountered: