Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Fixing v14 breaking changes #523

Merged
merged 11 commits into from
Feb 7, 2020
58 changes: 32 additions & 26 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,47 @@ jobs:
name: Sample Distribution Analyses
command: ./scripts/run_in_ci.sh bash "analyses/sample-distribution-analysis/run-sample-distribution.sh"

# TODO: The data files for CI need to be fixed https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/527
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the subset was redone, this doesn't rectify this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it did not - more details in #527

# - run:
# name: TP53 NF1 classifier run
# command: OPENPBTA_POLYAPLOT=0 ./scripts/run_in_ci.sh bash "analyses/tp53_nf1_score/run_classifier.sh"

# The analysis no longer needs to be tested as it has been retired and is better covered by 'SNV Caller Analysis' below.
#- run:
# name: Mutect2 vs Strelka2
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/mutect2-vs-strelka2/01-set-up.Rmd', clean = TRUE);
# rmarkdown::render('analyses/mutect2-vs-strelka2/02-analyze-concordance.Rmd', clean = TRUE)"

- run:
name: Collapse RSEM
command: ./scripts/run_in_ci.sh bash analyses/collapse-rnaseq/run-collapse-rnaseq.sh

### MOLECULAR SUBTYPING ###

- run:
name: Molecular Subtyping - HGG
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-HGG/run-molecular-subtyping-HGG.sh

- run:
name: Molecular subtyping - Non-MB/Non-ATRT Embryonal tumors
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-embryonal/run-embryonal-subtyping.sh

- run:
name: Molecular Subtyping and Plotting - ATRT
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-ATRT/run-molecular-subtyping-ATRT.sh

- run:
name: Molecular subtyping Chordoma
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd', clean = TRUE)"


# Deprecated - these results do not include germline calls and therefore are insufficient by subtyping
# - run:
# name: SHH TP53 Molecular Subtyping
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/molecular-subtyping-SHH-tp53/SHH-tp53-molecular-subtyping-data-prep.Rmd', clean = TRUE)"

### END MOLECULAR SUBTYPING ###

- run:
name: Collapse RSEM
command: ./scripts/run_in_ci.sh bash analyses/collapse-rnaseq/run-collapse-rnaseq.sh

- run:
name: Immune deconvolution using xCell and MCP-Counter
command: OPENPBTA_DECONV_METHOD="mcp_counter" ./scripts/run_in_ci.sh bash analyses/immune-deconv/run-immune-deconv.sh
Expand Down Expand Up @@ -85,10 +112,6 @@ jobs:
name: Tumor mutation burden with TCGA
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/tmb-compare-tcga/compare-tmb.Rmd', clean = TRUE)"

- run:
name: Molecular subtyping - Non-MB/Non-ATRT Embryonal tumors
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-embryonal/run-embryonal-subtyping.sh

- run:
name: Copy number consensus
command: ./scripts/run_in_ci.sh bash "analyses/copy_number_consensus_call/run_consensus_call.sh"
Expand All @@ -104,32 +127,19 @@ jobs:
- run:
name: Comparative RNASeq - generate correlation matrix - rsem-tpm.stranded
command: ./scripts/run_in_ci.sh python3 analyses/comparative-RNASeq-analysis/01-correlation-matrix.py ../../data/pbta-gene-expression-rsem-tpm.stranded.rds --output-prefix rsem-tpm-stranded- --verbose

- run:
name: Molecular Subtyping and Plotting - ATRT
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-ATRT/run-molecular-subtyping-ATRT.sh


- run:
name: Process SV file
command: ./scripts/run_in_ci.sh Rscript analyses/sv-analysis/01-process-sv-file.R

- run:
name: Oncoprint plotting
command: ./scripts/run_in_ci.sh bash "analyses/oncoprint-landscape/run-oncoprint.sh"

- run:
name: TP53 NF1 classifier run
command: OPENPBTA_POLYAPLOT=0 ./scripts/run_in_ci.sh bash "analyses/tp53_nf1_score/run_classifier.sh"

- run:
name: GISTIC Plots
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/cnv-chrom-plot/gistic_plot.Rmd', clean = TRUE)"

# Deprecated - these results do not include germline calls and therefore are insufficient by subtyping
# - run:
# name: SHH TP53 Molecular Subtyping
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/molecular-subtyping-SHH-tp53/SHH-tp53-molecular-subtyping-data-prep.Rmd', clean = TRUE)"

- run:
name: Gene set enrichment analysis to generate GSVA scores
command: OPENPBTA_TESTING=1 ./scripts/run_in_ci.sh bash "analyses/gene-set-enrichment-analysis/run-gsea.sh"
Expand All @@ -142,10 +152,6 @@ jobs:
name: Fusion Summary
command: OPENPBTA_TESTING=1 ./scripts/run_in_ci.sh bash "analyses/fusion-summary/run-new-analysis.sh"

- run:
name: Molecular subtyping Chordoma
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd', clean = TRUE)"

- run:
name: Telomerase activity
command: ./scripts/run_in_ci.sh bash analyses/telomerase-activity-prediction/RUN-telomerase-activity-prediction.sh
Expand Down
9 changes: 3 additions & 6 deletions analyses/chromosomal-instability/01b-visualization-cnv-sv.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -88,11 +88,8 @@ metadata <- readr::read_tsv(file.path(data_dir, "pbta-histologies.tsv"))
Read in the CNV data.

```{r}
# TODO: update file path when consensus is added to data release
cnv_df <- data.table::fread(
file.path("..",
"copy_number_consensus_call",
"results",
file.path(data_dir,
"pbta-cnv-consensus.seg.gz"),
data.table = FALSE
)
Expand Down Expand Up @@ -307,7 +304,7 @@ circos_map_plot(
```{r}
circos_map_transloc(transloc_df,
add_track = FALSE, # We change this to true to add on to our already existing plot
sample_names = samples_for_examples[1],
sample_names = sample(transloc_df$biospecimen_id1, 1),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to do this because not all samples have BNDs so this will kick back an error.

samples_col = "biospecimen_id1",
chr_col_1 = "chrom1", # Need to specify which column is the first and second location for each
chr_col_2 = "chrom2",
Expand All @@ -326,7 +323,7 @@ png(file.path(plots_dir, "transloc_circos_plot.png"), width = 800, height = 800)
# Run function per usual
circos_map_transloc(transloc_df,
add_track = FALSE,
sample_names = samples_for_examples[1],
sample_names = sample(transloc_df$biospecimen_id1, 1),
samples_col = "biospecimen_id1",
chr_col_1 = "chrom1",
chr_col_2 = "chrom2",
Expand Down
164 changes: 92 additions & 72 deletions analyses/chromosomal-instability/01b-visualization-cnv-sv.nb.html

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions analyses/chromosomal-instability/util/circos-plots.R
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,13 @@ circos_map_plot <- function(df,
y_min <- min(bed_df$y_val, na.rm = TRUE)
y_max <- max(bed_df$y_val, na.rm = TRUE)

# Can't have identical y_min and y_max, this is just so CircleCI runs even if
# the subset data is wonky
if (y_min == y_max) {
y_max <- y_max + 0.001
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y_min and y_max are calculated based on the data, but with some of the subsets, you can get values that are identical and circlize doesn't like that. So this is my fix.

warning("ymax and ymin are identical")
}

# Tell them only one color is allowed
if (length(single_color) > 1) {
warning("Only a single color is allowed for the `single_color` argument,
Expand Down
44 changes: 18 additions & 26 deletions analyses/fusion-summary/01-fusion-summary.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -151,36 +151,28 @@ specimensUnion<- union(arribaDF$tumor_id, starfusionDF$tumor_id)
#### Write non-MB, non-ATRT embryonal fusions to file

```{r}
allFuseEmbry <- allFuseEmbry %>%
prepareOutput(specimensUnion)
if (!running_in_ci) {
allFuseEmbry <- allFuseEmbry %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know what this means, but if it works 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, we skipped the step for generating the fusions of interest file for EPN samples because they were not represented in the subset. Now I'm switching and running the EPN, skipping the non-ATRT, non-MB embryonal fusions and genes of interest because those are not represented but the EPN samples are.

prepareOutput(specimensUnion)
allFuseEmbry %>%
mutate(
`CIC--NUTM1` = 0,
`MN1--BEND2` = 0
) %>%
write_tsv(embryFile)
}
```

```{r}
# Are there any missing fusions?
setdiff(embryFuses, colnames(allFuseEmbry))
```
#### Write ependymoma fusions to file

```{r}
allFuseEmbry %>%
allFuseEpend %>%
prepareOutput(specimensUnion) %>%
mutate(
`CIC--NUTM1` = 0,
`MN1--BEND2` = 0
`C11orf95--YAP1` = 0,
`LTBP3--RELA` = 0,
`PTEN--TAS2R1` = 0,
`YAP1--MAMLD2` = 0
) %>%
write_tsv(embryFile)
```

#### Write ependymoma fusions to file

```{r}
if (!running_in_ci) {
allFuseEpend %>%
prepareOutput(specimensUnion) %>%
mutate(
`C11orf95--YAP1` = 0,
`LTBP3--RELA` = 0,
`PTEN--TAS2R1` = 0,
`YAP1--MAMLD2` = 0
) %>%
write_tsv(ependFile)
}
write_tsv(ependFile)
```
Loading