Plan: Road to v22 #1426

jaclyn-taroni · 2022-05-19T21:04:49Z

The purpose of this issue is to outline the order in which things need to be reviewed and run prior to the release of v22.

1. Reviews before release

Pull requests that are marked as blocking release and review before release should be reviewed & ideally merged in the following order. cc: @sjspielman

Refactoring the Docker image build and fixing the push to Dockerhub in CI: Group R package installation to try to speed up build time #1425

Analysis files generation

Use ploidy for copy.num when running GISTIC #1400
Update logic in modules where analysis files included in the data releases are generated #1419
Update snv-callers to be able to be run from places besides the root of repo #1420 – depends on Update logic in modules where analysis files included in the data releases are generated #1419
Add shell script to generate analysis files that are included in data releases #1421 ✅ Approved – depends on Update logic in modules where analysis files included in the data releases are generated #1419 and Update snv-callers to be able to be run from places besides the root of repo #1420
Rerun changes added #1412 includes the changes that come from running scripts/generate-analysis-files-for-release.sh

Subtyping modules need to use `data/`

Subtype script & docs

Revise shell script run for subtyping such that analysis file generation is removed #1422 ✅ Approved – Running these changes is blocked by the above section, so let’s review and merge those first!
Docs changes related to #1399 #1423 — this doesn’t strictly need to be reviewed prior to release because this is just docs, but it would make sense conceptually.

2. Running subtyping

Once #1422 is in, we can use the PBTA data and analysis files from v22 to run scripts/run-for-subtyping.sh. We’ll need to create a branch to commit back any changes to files in the repository (akin to what’s happening with #1412).

Then we’ll upload pbta-histologies.tsv to S3 (related: #1424) .

3. Creating CI files

Once all the v22 files are on S3, we’re ready to generate the CI files (#1369). Here’s what we need to do:

Create a branch to update the create-subset-files module as required to create the CI files. I’m going to call this v22-ci for illustrative purposes.

Specifically, we need to update this line to release-v22-20220505 reflect the new release:

OpenPBTA-analysis/analyses/create-subset-files/create_subset_files.sh

Line 10 in e72c11d

RELEASE=${RELEASE:-release-v21-20210820}

Then we should check to make sure it runs locally without issue with:

RUN_LOCAL=1 ./analyses/create-subset-files/create_subset_files.sh

Start an AWS instance with at least 128 GB RAM
Clone AlexsLemonade/OpenPBTA-analysis and checkout v22-release (v22-release (1/N release) #1424).
Run bash download-data.sh to obtain the v22 data while on that branch.
Add whatever the relevant remote is (whichever fork created v22-ci) and checkout v22-ci which contains the required changes to generate the v22 files for CI.
Run ./analyses/create-subset-files/create_subset_files.sh.
Snag the files in data/testing/release-v22-20220505; they are what we need for CI.
Commit to v22-ci + push the changes to analyses/create-subset-files/biospecimen_ids_for_subset.RDS
Upload the CI files to https://s3.amazonaws.com/d3b-openaccess-us-east-1-prd-pbta/data/testing
File a PR with the changes in v22-ci

We’ll use this PR to make sure there are no issues with the CI files. If there are issues, we may need to create a series of branches to fix them. If there are no issues, we can merge this.

4. Cutting a release

#1424 gets updated to include the required documentation & merged

The text was updated successfully, but these errors were encountered:

jharenza · 2022-05-19T22:19:17Z

@zhangb1 will work on #3 above when it's time. Cc @yuankunzhu

jaclyn-taroni · 2022-05-20T12:37:08Z

Wanted to note that we had some issue with medulloPackage not being installed on some Docker containers if the layer was cached (example linked over on #1425). When I went to try and look into that, it took almost 3 hours to build the image alone. That means that we would definitely have an issue with timeouts in CI and that could be a problem as we try to get a bunch of stuff in! So what I did in 43fa160 was to consolidate all of the R package installs from Bioconductor and CRAN. We can see if that speeds building the image up. If it does, we might want that to be the first thing to go in. #1425 is a draft for now.

jaclyn-taroni · 2022-05-21T22:11:15Z

The changes related to the analysis files generation and subtyping can go in independently. Since some of the PRs ensuring that subtyping modules are already approved & ready to merge, I'm going to start merging them.

jaclyn-taroni · 2022-05-22T17:51:16Z

Updated the initial comment, but to summarize what comes next:

Group R package installation to try to speed up build time #1425 should be reviewed and go in to fix the deploy step in CI
Update snv-callers to be able to be run from places besides the root of repo #1420 needs to be re-reviewed, approved, and merged
Add shell script to generate analysis files that are included in data releases #1421 needs to be merged (currently approved & comments addressed!)
We should probably rerun things with the code changes (Rerun changes added #1412) tbh
Make sure HGG subtyping uses data download #1414 needs to be re-reviewed, approved, and merged
Revise shell script run for subtyping such that analysis file generation is removed #1422 needs to be merged (currently approved and comments addressed)
We can start on everything else (e.g., running subtyping)

migbro · 2022-06-01T17:44:04Z

In progress - just a few notes as I work on this

It seems step one is really a summary, and not really a step. Doing that before the rest is probably not gonna pan out
With a fresh ec2 as a starting point, def gotta download first before running that test
This script create_subset_files.sh seems to look for a structure relative to itself, so gotta cd to it's location first before running

After making those adjustments, I got the error:

Reading in ../../data/release-v22-20220505/pbta-sv-manta.tsv.gz ...
            used   (Mb) gc trigger    (Mb)   max used    (Mb)
Ncells    619873   33.2   11776407   629.0    9048557   483.3
Vcells 269327097 2054.9 2323008920 17723.2 2753752554 21009.5
Error in sample.int(length(x), size, replace, prob) :
  invalid first argument
Calls: sample -> sample.int
Execution halted

jharenza · 2022-06-01T18:14:15Z

Ok, this may be coming from not having consensus_seg_with_status.tsv in the subset, but odd that this is a manta error. Updating the code to capture the new file and testing locally now.

jharenza · 2022-06-01T19:01:24Z

After updating the code to add the additional CNV file, I ran using

RUN_LOCAL=1 ./analyses/create-subset-files/create_subset_files.sh

but gets killed:

create_subset_files.sh: line 51:  7389 Killed                  Rscript --vanilla 01-get_biospecimen_identifiers.R --data_directory $FULL_DIRECTORY --output_file $BIOSPECIMEN_FILE --num_matched $NUM_MATCHED --local $RUN_LOCAL

If I try to just copy and not subset, I also get an error about directory creation, which @migbro also mentioned he didn't have in his run.

root@334c2bfa1312:/home/rstudio/kitematic/analyses/create-subset-files# bash create_subset_files.sh 
cp: target '../../data/testing/release-v22-20220505' is not a directory

2:55
but also if i try to not to subset, just to copy, it tells me dir not found
root@334c2bfa1312:/home/rstudio/kitematic/analyses/create-subset-files# bash create_subset_files.sh
cp: target '../../data/testing/release-v22-20220505' is not a directory

migbro · 2022-06-02T13:43:50Z

I think I am ready to push, however, I don't have that kind of access to this project. Can I be granted such access please? Thanks!

This was referenced May 26, 2022

V22 subtyping (1/N) #1433

Merged

V22 subtyping - chr instab (2/N) #1434

Merged

V22 run MB subtyping (3/N) #1435

Merged

V22 cranio subtyping (4/N) #1436

Merged

V22 chordoma subtyping (7/N) #1440

Merged

jharenza mentioned this issue Jun 1, 2022

V22 CI updates for consensus_seg_with_status.tsv (2/N release) #1449

Merged

5 tasks

jaclyn-taroni mentioned this issue Jun 1, 2022

v22 CI fixes #1450

Merged

jaclyn-taroni closed this as completed Jun 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan: Road to v22 #1426

Plan: Road to v22 #1426

jaclyn-taroni commented May 19, 2022 •

edited by jharenza

Loading

jharenza commented May 19, 2022 •

edited

Loading

jaclyn-taroni commented May 20, 2022

jaclyn-taroni commented May 21, 2022 •

edited

Loading

jaclyn-taroni commented May 22, 2022

migbro commented Jun 1, 2022

jharenza commented Jun 1, 2022 •

edited

Loading

jharenza commented Jun 1, 2022

migbro commented Jun 2, 2022

Plan: Road to v22 #1426

Plan: Road to v22 #1426

Comments

jaclyn-taroni commented May 19, 2022 • edited by jharenza Loading

1. Reviews before release

Analysis files generation

Subtyping modules need to use data/

Subtype script & docs

2. Running subtyping

3. Creating CI files

4. Cutting a release

jharenza commented May 19, 2022 • edited Loading

jaclyn-taroni commented May 20, 2022

jaclyn-taroni commented May 21, 2022 • edited Loading

jaclyn-taroni commented May 22, 2022

migbro commented Jun 1, 2022

jharenza commented Jun 1, 2022 • edited Loading

jharenza commented Jun 1, 2022

migbro commented Jun 2, 2022

jaclyn-taroni commented May 19, 2022 •

edited by jharenza

Loading

Subtyping modules need to use `data/`

jharenza commented May 19, 2022 •

edited

Loading

jaclyn-taroni commented May 21, 2022 •

edited

Loading

jharenza commented Jun 1, 2022 •

edited

Loading