Skip to content

Commit

Permalink
adding changes from AlexsLemonade#1009
Browse files Browse the repository at this point in the history
  • Loading branch information
kgaonkar6 committed May 11, 2021
2 parents 36d38cc + 486b9a2 commit 8d2d498
Show file tree
Hide file tree
Showing 342 changed files with 116,958 additions and 106,026 deletions.
8 changes: 8 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ jobs:
name: List Data Directory Contents
command: ./scripts/run_in_ci.sh ls data/testing

- run:
name: Check python packages
command: ./scripts/run_in_ci.sh bash scripts/check-python.sh

- run:
name: High level histology grouping for plot labels
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('figures/mapping-histology-labels.Rmd', clean = TRUE)"
Expand Down Expand Up @@ -279,6 +283,10 @@ jobs:
name: Run survival plots
command: ./scripts/run_in_ci.sh bash analyses/survival-analysis/run_survival.sh

- run:
name: Scavenge back hotspots
command: ./scripts/run_in_ci.sh bash analyses/hotspots-detection/run_overlaps_hotspot.sh

deploy:
machine:
docker_layer_caching: true
Expand Down
99 changes: 77 additions & 22 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,14 @@ RUN apt-get -y --no-install-recommends install \
RUN apt-get -y --no-install-recommends install \
libpoppler-cpp-dev

# Install pip3 and instalation tools
# Install pip3 and low-level python installation reqs
RUN apt-get -y --no-install-recommends install \
python3-pip python3-dev
RUN pip3 install "setuptools==46.3.0" "six==1.14.0" "wheel==0.34.2"
RUN pip3 install \
"Cython==0.29.15" \
"setuptools==46.3.0" \
"six==1.14.0" \
"wheel==0.34.2"

# Install java
RUN apt-get -y --no-install-recommends install \
Expand Down Expand Up @@ -237,40 +241,94 @@ RUN R -e "remotes::install_github('wilkox/treemapify', ref = 'e70adf727f4d13223d
# Need this specific version of circlize so it has hg38
RUN R -e "remotes::install_github('jokergoo/circlize', ref = 'b7d86409d7f893e881980b705ba1dbc758df847d', dependencies = TRUE)"

# Install python libraries
# Install python packages
##########################

# Install python3 data science tools
# Install python3 tools and ALL dependencies
RUN pip3 install \
"cycler==0.10.0" "kiwisolver==1.1.0" "pyparsing==2.4.5" "python-dateutil==2.8.1" "pytz==2019.3" \
"cython==0.29.15" \
"appdirs==1.4.4" \
"attrs==20.3.0" \
"backcall==0.2.0" \
"bleach==3.3.0" \
"bx-python==0.8.8" \
"certifi==2020.12.5" \
"chardet==4.0.0" \
"ConfigArgParse==1.4" \
"CrossMap==0.3.9" \
"cycler==0.10.0" \
"datrie==0.8.2" \
"decorator==4.4.2" \
"defusedxml==0.7.1" \
"docutils==0.16" \
"entrypoints==0.3" \
"gitdb==4.0.7" \
"GitPython==3.1.14" \
"idna==2.10" \
"importlib-metadata==2.1.1" \
"ipykernel==4.8.1" \
"ipython==7.9.0" \
"ipython-genutils==0.2.0" \
"jedi==0.17.2" \
"Jinja2==2.11.3" \
"jsonschema==3.2.0" \
"jupyter-client==6.1.12" \
"jupyter-core==4.6.3" \
"kiwisolver==1.1.0" \
"MarkupSafe==1.1.1" \
"matplotlib==3.0.3" \
"mistune==0.8.4" \
"mizani==0.5.4" \
"nbconvert==5.6.1" \
"nbformat==5.1.2" \
"notebook==6.0.0" \
"numpy==1.17.3" \
"packaging==20.9" \
"palettable==3.3.0" \
"pandas==0.25.3" \
"pandocfilters==1.4.3" \
"parso==0.7.1" \
"patsy==0.5.1" \
"pexpect==4.8.0" \
"pickleshare==0.7.5" \
"plotnine==0.3.0" \
"prometheus-client==0.9.0" \
"prompt-toolkit==2.0.10" \
"psutil==5.8.0" \
"ptyprocess==0.7.0" \
"pyarrow==0.16.0" \
"pybedtools==0.8.1" \
"pyBigWig==0.3.17" \
"Pygments==2.8.1" \
"pyparsing==2.4.5" \
"pyreadr==0.2.1" \
"pyrsistent==0.17.3" \
"pysam==0.15.4" \
"python-dateutil==2.8.1" \
"pytz==2019.3" \
"PyYAML==5.3.1" \
"pyzmq==20.0.0" \
"ratelimiter==1.2.0.post0" \
"requests==2.25.1" \
"rpy2==2.9.3" \
"scikit-learn==0.19.1" \
"scipy==1.3.2" \
"seaborn==0.8.1" \
"Send2Trash==1.5.0" \
"six==1.14.0" \
"smmap==4.0.0" \
"snakemake==5.8.1" \
"statsmodels==0.10.2" \
"tzlocal==2.0" \
"terminado==0.8.3" \
"testpath==0.4.4" \
"tornado==6.1" \
"traitlets==4.3.3" \
"tzlocal==2.0.0" \
"urllib3==1.26.4" \
"wcwidth==0.2.5" \
"webencodings==0.5.1" \
"widgetsnbextension==2.0.0" \
&& rm -rf /root/.cache/pip/wheels

# Install Rpy2
RUN pip3 install "rpy2==2.9.3" \
&& rm -rf /root/.cache/pip/wheels

# Install CrossMap for liftover
RUN pip3 install \
"bx-python==0.8.8" \
"pybigwig==0.3.17" \
"pysam==0.15.4" \
"CrossMap==0.3.9" \
"wrapt==1.12.1" \
"zipp==1.2.0" \
&& rm -rf /root/.cache/pip/wheels


Expand Down Expand Up @@ -312,9 +370,6 @@ RUN ./install_bioc.r \
multipanelfigure \
gplots

# pybedtools for D3B TMB analysis
RUN pip3 install "pybedtools==0.8.1"

# Molecular subtyping MB
RUN R -e "remotes::install_github('d3b-center/medullo-classifier-package', ref = 'e3d12f64e2e4e00f5ea884f3353eb8c4b612abe8', dependencies = TRUE, upgrade = FALSE)" \
&& ./install_bioc.r MM2S
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ Artifacts include both vector or high-resolution figures sufficient for inclusio

#### Software Dependencies

Analyses should be performed within the project's [Docker container](https://github.com/AlexsLemonade/OpenPBTA-analysis#docker-container).
Analyses should be performed within the project's [Docker container](https://github.com/AlexsLemonade/OpenPBTA-analysis#docker-image).
We use a single monolithic container in these analyses for ease of use.
If you need software that is not included, please edit the Dockerfile to install the relevant software or file a [new issue on this repository](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/new) requesting assistance.

Expand Down Expand Up @@ -292,6 +292,8 @@ To add dependencies that are required for your analysis to the project Docker im
* Installing most packages, from CRAN or Bioconductor, should be done with our `install_bioc.R` script, which will ensure that the proper MRAN snapshot is used. `BiocManager::install()` should *not* be used, as it will not install from MRAN.
* R packages that are not available in the MRAN snapshot can be installed via github with the `remotes::install_github()` function, with the commit specified by the `ref` argument.
* Python packages should be installed with `pip3 install` with version numbers for all packages and dependencies specified.
* As a secondary check, we maintain a `requirements.txt` file to check versions of all python packages and dependencies.
* When adding a new package, make sure that all dependencies are also added; every package should appear with a specified version **both** in the `Dockerfile` and `requirements.txt`.
* Other software can be installed with `apt-get`, but this should *never* be used for R packages.

If you need assistance adding a dependency to the Dockerfile, [file a new issue on this repository](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/new) to request help.
Expand Down
3 changes: 2 additions & 1 deletion analyses/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,13 @@ Note that _nearly all_ modules use the harmonized clinical data file (`pbta-hist
| [`collapse-rnaseq`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/collapse-rnaseq) | `pbta-gene-expression-rsem-fpkm.polya.rds` <br> `pbta-gene-expression-rsem-fpkm.stranded.rds` <br> `gencode.v27.primary_assembly.annotation.gtf.gz` | Collapses RSEM FPKM matrices such that gene symbols are de-duplicated. | `results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` (included in data download; too large for tracking via GitHub) <br> `results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` (included in data download; too large for tracking via GitHub)
| [`comparative-RNASeq-analysis`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/comparative-RNASeq-analysis) | `pbta-gene-expression-rsem-tpm.polya.rds` <br> `pbta-gene-expression-rsem-tpm.stranded.rds` <br> `pbta-histologies.tsv` <br> `pbta-mend-qc-manifest.tsv` <br> `pbta-mend-qc-results.tar.gz` | *In progress*; will produce expression outlier profiles per [#229](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/229) | N/A |
| [`compare-gistic`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/compare-gistic) | `analyses/run-gistic/results/pbta-cnv-consensus-gistic.zip` <br> `analyses/run-gistic/results/pbta-cnv-consensus-hgat-gistic.zip` <br> `analyses/run-gistic/results/pbta-cnv-consensus-lgat-gistic.zip` <br> `analyses/run-gistic/results/pbta-cnv-consensus-medulloblastoma-gistic.zip` | Comparison of the GISTIC results of the entire cohort with the GISTIC results of three individual histolgies, namely, LGAT, HGAT and medulloblastoma ([#547](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/547) | N/A
| [`copy_number_consensus_call`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/copy_number_consensus_call) | `pbta-cnv-cnvkit.seg.gz` <br> `pbta-cnv-controlfreec.tsv.gz` <br> `pbta-sv-manta.tsv.gz` | Produces consensus copy number calls per [#128](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/128) and a set of excluded regions where CNV calls are not made | `results/cnv_consensus.tsv` <br> `results/pbta-cnv-consensus.seg.gz` (included in data download) <br> `ref/cnv_excluded_regions.bed` <br> `ref/cnv_callable.bed`
| [`copy_number_consensus_call`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/copy_number_consensus_call) | `pbta-cnv-cnvkit.seg.gz` <br> `pbta-cnv-controlfreec.tsv.gz` <br> `pbta-sv-manta.tsv.gz` | Produces consensus copy number calls per [#128](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/128) and a set of excluded regions where CNV calls are not made as per [#1010](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1010) | `results/cnv_consensus.tsv` <br> `results/pbta-cnv-consensus.seg.gz` (included in data download) <br> `ref/cnv_excluded_regions.bed` <br> `ref/cnv_callable.bed`
| [`create-subset-files`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/create-subset-files) | All files | This module contains the code to create the subset files used in continuous integration | All subset files for continuous integration
| [`focal-cn-file-preparation`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/focal-cn-file-preparation) | `pbta-cnv-cnvkit.seg.gz` <br> `pbta-cnv-controlfreec.tsv.gz` <br> `pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg.gz` | Maps from copy number variant caller segments to gene identifiers; will be updated to take into account changes that affect entire cytobands, chromosome arms ([#186](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/186))| `results/cnvkit_annotated_cn_autosomes.tsv.gz` <br> `results/cnvkit_annotated_cn_x_and_y.tsv.gz` <br> `results/controlfreec_annotated_cn_autosomes.tsv.gz` <br> `results/controlfreec_annotated_cn_x_and_y.tsv.gz` <br> `results/consensus_seg_annotated_cn_autosomes.tsv.gz` (included in data download) <br> `results/consensus_seg_annotated_cn_x_and_y.tsv.gz` (included in data download)
| [`fusion_filtering`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion_filtering) | `pbta-fusion-arriba.tsv.gz` <br> `pbta-fusion-starfusion.tsv.gz` | Standardizes, filters, and prioritizes fusion calls | `results/pbta-fusion-putative-oncogenic.tsv`(included in data download) <br> `results/pbta-fusion-recurrent-fusion-byhistology.tsv` (included in data download) <br> `results/pbta-fusion-recurrent-fusion-bysample.tsv` (included in data download)
| [`fusion-summary`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion-summary)| `pbta-histologies.tsv` <br> `pbta-fusion-putative-oncogenic.tsv` <br> `pbta-fusion-arriba.tsv.gz` <br> `pbta-fusion-starfusion.tsv.gz` | Generate summary tables from fusion files ([#398](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/398); [#623](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/623)) | `results/fusion_summary_embryonal_foi.tsv` (included in data download) <br> `results/fusion_summary_ependymoma_foi.tsv` (included in data download) <br> `results/fusion_summary_ewings_foi.tsv`
| [`gene-set-enrichment-analysis`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/gene-set-enrichment-analysis) | `analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` | *In progress*. Updated gene set enrichment analysis with appropriate RNA-seq expression data | `results/gsva_scores_stranded.tsv` <br> `results/gsva_scores_polya.tsv` <br> for stranded, polya expression data respectively
| [`hotspot-detection`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/hotspots-detection) | `pbta-snv-strelka2.vep.maf.gz` <br> `pbta-snv-mutect2.vep.maf.gz` <br> `pbta-snv-vardict.vep.maf.gz` <br> `pbta-snv-lancet.vep.maf.gz` | Scavenges cancer any hotspot calls from each caller and merges with consensus (3/3) calls if it was missed in snv-caller workflow. | `pbta-snv-hotspots-mutation.maf.tsv.gz`
| [`immune-deconv`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/immune-deconv) | `pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` | Immune/Stroma characterization across PBTA (part of [#15](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/15)) | `results/deconv-output.RData`
| [`independent-samples`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/independent-samples) | `pbta-histologies.tsv` | Generates independent specimen lists for WGS/WXS samples | `results/independent-specimens.wgs.primary.tsv` (included in data download) <br> `results/independent-specimens.wgs.primary-plus.tsv` (included in data download) <br> `results/independent-specimens.wgswxs.primary.tsv` (included in data download) <br> `results/independent-specimens.wgswxs.primary-plus.tsv` (included in data download)
| [`interaction-plots`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/interaction-plots) | `independent-specimens.wgs.primary-plus.tsv` <br> `pbta-snv-consensus-mutation.maf.tsv.gz` | Creates interaction plots for mutation mutual exclusivity/co-occurrence [#13](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/13); may be updated to include other data types (e.g., fusions) | N/A
Expand Down
2,764 changes: 14 additions & 2,750 deletions analyses/cnv-chrom-plot/cn_status_heatmap.nb.html

Large diffs are not rendered by default.

Loading

0 comments on commit 8d2d498

Please sign in to comment.