Fix v15 breaking changes #574

jaclyn-taroni · 2020-02-28T22:56:51Z

To quote the release notes being added in #569, we're changing the names of well-enough-used columns in the clinical file:

Change disease_type_old to pathology_diagnosis and disease_type_new to integrated_diagnosis per request in comment.

I know this change to pbta-histologies.tsv will break a number of things. The purpose of this issue is to track what will need to be changed as a result. Not only will the column names need to be updated, but we will also need to rerun any notebooks, change documentation, etc.

Anticipated issues

Here I'll list what I know needs to change in modules that are not deprecated.

Some of the modeling steps of gene-set-enrichment-analysis use disease_type_new:

OpenPBTA-analysis/analyses/gene-set-enrichment-analysis/02-model-gsea.Rmd

Line 123 in 286ff25

```{r, aov-perform}

Luckily the gsva_anova_tukey function is already flexible!

OpenPBTA-analysis/analyses/gene-set-enrichment-analysis/util/hallmark_models.R

Line 35 in 286ff25

gsva_anova_tukey <- function(df, predictor_variable, library_type, significance_threshold)
The first step of interaction-plots uses the disease_type_new column to generate lists of samples:

OpenPBTA-analysis/analyses/interaction-plots/scripts/01-disease-specimen-lists.R

Line 97 in 286ff25

tolower(disease_type_new) == tolower(opts$disease)) %>%

Documentation associated with that option will also need to change.
We filter out ATRT and MB samples in molecular-subtyping-embryonal using disease_type_old

OpenPBTA-analysis/analyses/molecular-subtyping-embryonal/01-samples-to-subset.Rmd

Line 128 in 286ff25

filter(!(disease_type_old %in% c("Medulloblastoma",

and check disease_type_new as well:

OpenPBTA-analysis/analyses/molecular-subtyping-embryonal/01-samples-to-subset.Rmd

Line 136 in 286ff25

group_by(disease_type_new) %>%

Also in this module, we use both the disease_type columns in the subtyping and generating final tables quite a bit starting around

OpenPBTA-analysis/analyses/molecular-subtyping-embryonal/04-table-prep.Rmd

Line 325 in 286ff25

### Add clinical data

The README for this module needs to change as well + this documentation:

OpenPBTA-analysis/analyses/molecular-subtyping-embryonal/02-generate-subset-files.R

Line 6 in 286ff25

# (broad_histology) but NOT an MB or ATRT tumor (disease_type_old) OR 2)
The subset files step of molecular-subtyping-EPN uses disease_type_new

OpenPBTA-analysis/analyses/molecular-subtyping-EPN/00-subset-for-EPN.R

Line 55 in 286ff25

disease_type_new == "Ependymoma") %>%
In molecular-subtyping-HGG, we use disease_type_new quite a bit for classification based on defining lesions, here's just one example from the first notebook:

OpenPBTA-analysis/analyses/molecular-subtyping-HGG/01-HGG-molecular-subtyping-defining-lesions.Rmd

Line 59 in 286ff25

disease_type_new)
disease_type_new is one of the "layers" associated with all of the plotting in sample-distribution-analysis: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/sample-distribution-analysis/01-filter-across-types.R, https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/sample-distribution-analysis/02-multilayer-plots.R and is used in the tables generated in 03-tumor-descriptor-and-assay-count:

OpenPBTA-analysis/analyses/sample-distribution-analysis/03-tumor-descriptor-and-assay-count.Rmd

Line 206 in 286ff25

## By histology
selection-strategy-comparison includes consideration of disease_type_new:

OpenPBTA-analysis/analyses/selection-strategy-comparison/01-selection-strategies.rmd

Line 169 in 286ff25

```{r sample types}

We may want to just deprecate this analysis at this point rather than try to maintain it?

Issues that have arisen as part of #576

molecular-subtyping-chordoma fails with the following:

Quitting from lines 159-168 (01-Subtype-chordoma.Rmd) 
    Error in .f(.x[[i]], ...) : object 'SMARCB1' not found
    Calls: <Anonymous> ... <Anonymous> -> vars_rename_eval -> map_if -> map -> .f

That's from this chunk:

OpenPBTA-analysis/analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

Line 166 in 286ff25

rename(SMARCB1_expression = SMARCB1)

I suspect what is actually happening is that there are no chordoma samples in the expression data used in CI and this step

OpenPBTA-analysis/analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

Line 147 in 286ff25

    
           smarcb1_expression <- smarcb1_expression[, which(colnames(expression_data) %in% chordoma_samples) ]

We may want to take an approach that is similar to other subtyping modules and have the first step be a script that generates files that consist only of chordoma samples that are committed to the repository.

The Add Shatterseek step of sv-analysis, which is Rscript analyses/sv-analysis/02-shatterseek.R fails with:

Error in file(file, "rt") : cannot open the connection
Calls: read.table -> file
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'scratch/sv-vcf/BS_K07KNTFY_withoutYandM.tsv': No such file or directory
Execution halted

analyses/sv-analysis/02-shatterseek.R uses an independent specimen file, which is included in its entirety in CI, to read in files:

OpenPBTA-analysis/analyses/sv-analysis/02-shatterseek.R

Line 48 in 286ff25

bioid <- unique(independent_specimen_list$Kids_First_Biospecimen_ID)

The step that would have generated scratch/sv-vcf/BS_K07KNTFY_withoutYandM.tsv comes prior to this one in CI

OpenPBTA-analysis/.circleci/config.yml

Line 142 in 286ff25

- run:

So it will only have access to the subsetted Manta file. See #449 (comment) and #449 (comment) for more context. The sv-analysis module should be more robust to "missing" samples.

Next steps

@cansavvy @cbethell @jashapiro I'd recommend splitting this up such that modifications to each module are in separate pull requests so you can go through any make sure you catch any documentation stuff I may not have come across.

The text was updated successfully, but these errors were encountered:

jaclyn-taroni · 2020-02-29T17:01:36Z

Because of the extent of these breaking changes, I believe it's prudent to handle each module in a separate pull request as stated above. Unfortunately, that means that CI will fail for a bunch of these fixes until the last one goes in. So here is the general procedure I think we should follow:

First and foremost, state which module you'll be working on on this issue so we don't duplicate effort!
When you file a pull request for a particular module, you should comment out all the modules that come before it in CI to quickly demonstrate that your fix works. This should be done in a single commit.
Once you're changes have been reviewed, you've made any changes as the result of review, and approved, revert the single commit that commented out all the modules before the step your PR pertains to.
We should merge the fix without the check passing.

This procedure has a significant weakness in that there may be changes introduced in any one fix that will cause CI to fail unexpectedly once the final fix goes in and this issue is closed. Once this issue gets closed, #569 should be updated such that it is in sync with master and that will test all steps in CI (except fusion-summary but that is tracked in #578). Any additional CI fixes can go into the AlexsLemonade:update-release-docs-v15 branch provided that they are small in scope.

cansavvy · 2020-03-02T14:19:30Z

So we can keep track of progress on these, I took @jaclyn-taroni 's list above and made it into a checklist. We can claim items and then check things off as we fix them. I'll start by claiming this first item. I'll put the PR number next to it too when I get it filed.

v15 breaking changes TODO list

gene-set-enrichment-analysis - @cansavvy v15 fix - gene-set-enrichment-analysis #585
interaction-plots - @jashapiro Update Interaction plots for v15 #582
molecular-subtyping-embryonal - @cbethell Update molecular-subtyping-embryonal module for v15 release #591
- 01-samples-to-subset.Rmd
- 04-table-prep.Rmd
- docs in 02-generate-subset-files.R
- README
molecular-subtyping-EPN - @jashapiro v15 update for EPN subtyping #592
molecular-subtyping-HGG - @cbethell Update molecular-subtyping-HGG module for v15 release #586
sample-distribution-analysis @jashapiro update Sample distribution analysis for v15 #584
- 01-filter-across-types.R
- 02-multilayer-plots.R
- 03-tumor-descriptor-and-assay-count
selection-strategy-comparison - deprecated in Deprecate Selection Strategy Comparison #589 @jashapiro
molecular-subtyping-chordoma error - @cansavvy V15 Fix molecular-subtype-chordoma analysis #590
sv-analysis 02-shatterseek.R error - @cansavvy Does Shatterseek work with v15 changes? #587 (no actual changes were made this was a false alarm. See comments below).

jaclyn-taroni · 2020-03-02T17:23:34Z

I just realized that the sv-analysis failure may simply be due to the fact that I had commented out the first script in that module. So the scope of that fix may be to group those near each other in .circleci/config.yml or to add a shell script to that module.

cansavvy · 2020-03-02T17:25:07Z

Okay. Well I just started working on it now, I'll see if that's it.

cansavvy · 2020-03-02T17:42:16Z

@jaclyn-taroni you were right. It is fine if the first script is ran. #587

jaclyn-taroni · 2020-03-02T17:45:15Z

Okay 👍 - would love to see those organized such that the step that was failing was immediately after the step that it depends on (perhaps after v15 is out out). I think that would have increased the chances I noticed that immediately.

cansavvy · 2020-03-02T17:47:52Z

Okay 👍 - would love to see those organized such that the step that was failing was immediately after the step that it depends on (perhaps after v15 is out out). I think that would have increased the chances I noticed that immediately.

I was about to just make this change when I had the branch open but I didn't know if there were particular sequential orders to some of the other tests and didn't want to throw another possible wrench in our testings here. But yeah, we may even want to have a bash script that calls both and make it one CircleCI test.

cansavvy · 2020-03-02T17:50:07Z

@jaclyn-taroni In regards to selection-strategy-comparison and your comment:

We may want to just deprecate this analysis at this point rather than try to maintain it?

I don't know enough about this analysis module to make an informed decision on this. Do we want to retire it though?

jaclyn-taroni · 2020-03-02T17:50:58Z

@jashapiro - what do you think, time to retire selection-strategy-comparison ?

jashapiro · 2020-03-02T18:31:14Z

I think it can be deprecated. Will do that now.

jashapiro · 2020-03-02T22:09:02Z

We did it, everyone! Changes incorporated to master in #569

jaclyn-taroni mentioned this issue Feb 29, 2020

Find any unanticipated v15 breaking changes #576

Closed

jaclyn-taroni changed the title ~~Anticipated v15 breaking changes~~ Fix v15 breaking changes Feb 29, 2020

jashapiro mentioned this issue Mar 2, 2020

Update Interaction plots for v15 #582

Merged

5 tasks

cansavvy mentioned this issue Mar 2, 2020

V15 fix - gene-set-enrichment-analysis #583

Closed

5 tasks

jashapiro mentioned this issue Mar 2, 2020

update Sample distribution analysis for v15 #584

Merged

cansavvy mentioned this issue Mar 2, 2020

v15 fix - gene-set-enrichment-analysis #585

Merged

5 tasks

cbethell mentioned this issue Mar 2, 2020

Update molecular-subtyping-HGG module for v15 release #586

Merged

5 tasks

cansavvy mentioned this issue Mar 2, 2020

Does Shatterseek work with v15 changes? #587

Closed

jashapiro mentioned this issue Mar 2, 2020

Deprecate Selection Strategy Comparison #589

Merged

cansavvy mentioned this issue Mar 2, 2020

V15 Fix molecular-subtype-chordoma analysis #590

Merged

5 tasks

cbethell mentioned this issue Mar 2, 2020

Update molecular-subtyping-embryonal module for v15 release #591

Merged

5 tasks

This was referenced Mar 2, 2020

v15 update for EPN subtyping #592

Merged

Test v15 changes so far #593

Closed

update docs for v15 release #569

Merged

jashapiro closed this as completed Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix v15 breaking changes #574

Fix v15 breaking changes #574

jaclyn-taroni commented Feb 28, 2020 •

edited

Loading

jaclyn-taroni commented Feb 29, 2020

cansavvy commented Mar 2, 2020 •

edited

Loading

jaclyn-taroni commented Mar 2, 2020

cansavvy commented Mar 2, 2020

cansavvy commented Mar 2, 2020

jaclyn-taroni commented Mar 2, 2020

cansavvy commented Mar 2, 2020

cansavvy commented Mar 2, 2020

jaclyn-taroni commented Mar 2, 2020

jashapiro commented Mar 2, 2020

jashapiro commented Mar 2, 2020 •

edited

Loading

Fix v15 breaking changes #574

Fix v15 breaking changes #574

Comments

jaclyn-taroni commented Feb 28, 2020 • edited Loading

Anticipated issues

Issues that have arisen as part of #576

Next steps

jaclyn-taroni commented Feb 29, 2020

cansavvy commented Mar 2, 2020 • edited Loading

v15 breaking changes TODO list

jaclyn-taroni commented Mar 2, 2020

cansavvy commented Mar 2, 2020

cansavvy commented Mar 2, 2020

jaclyn-taroni commented Mar 2, 2020

cansavvy commented Mar 2, 2020

cansavvy commented Mar 2, 2020

jaclyn-taroni commented Mar 2, 2020

jashapiro commented Mar 2, 2020

jashapiro commented Mar 2, 2020 • edited Loading

jaclyn-taroni commented Feb 28, 2020 •

edited

Loading

cansavvy commented Mar 2, 2020 •

edited

Loading

jashapiro commented Mar 2, 2020 •

edited

Loading