re-run tp53 classifier with v18 #922

kgaonkar6 · 2021-01-22T21:15:16Z

Purpose/implementation Section

What scientific question is your analysis addressing?

Just a re-run of the tp53 classifier to add 5 new stranded samples added in v18

What was your approach?

Just a re-run:
docker exec -ti frosty_tesla bash -c "cd /home/rstudio/OpenPBTA && bash analyses/tp53_nf1_score/run_classifier.sh"

What GitHub issue does your pull request address?

Related to #837 since we need the updated tp53_score in some of the annotation scripts.

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

There seem to be a lot of changes in analyses/tp53_nf1_score/results/TP53_NF1_snv_alteration.tsv from the sample order changing I believe since I don't see any actual values changing.

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

tables

What is your summary of the results?

Manual review of the results of classifier with the 5 stranded samples seems to have changed the values a little. Will plot values in #886 to see if drastic changes occured.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

jharenza

Would this benefit from setting a seed or is the classifier set up such that it will always give us new results? cc @gwaygenomics

jaclyn-taroni · 2021-01-25T14:13:10Z

Would this benefit from setting a seed or is the classifier set up such that it will always give us new results?

One of the steps in the classifier is to scale by gene:

OpenPBTA-analysis/analyses/tp53_nf1_score/01-apply-classifier.py

Line 21 in 0ed49b6

* Take the z-score by gene

So it makes sense to me that adding a few samples would have a (minor) impact on the values. If you're referring to the scores for the shuffled data, a seed gets set here:

OpenPBTA-analysis/analyses/tp53_nf1_score/01-apply-classifier.py

Line 57 in 0ed49b6

np.random.seed(123)

I think we only apply a classifier to one shuffled version of the matrix - it might be worth doing that a few times to get an idea about variability in those scores, but I have not thought deeply about that.

jaclyn-taroni

👍 I think analyses/tp53_nf1_score/__pycache__/utils.cpython-35.pyc was probably an unintentional add - can you remove that and stop tracking (probably via a analyses/tp53_nf1_score/.gitignore) before this gets merged please?

jashapiro · 2021-01-25T14:18:55Z

👍 I think analyses/tp53_nf1_score/__pycache__/utils.cpython-35.pyc was probably an unintentional add - can you remove that and stop tracking (probably via a analyses/tp53_nf1_score/.gitignore) before this gets merged please?

We can safely add *.pyc to the root .gitignore (but be sure to remove the files FIRST, or annoying trouble will come).

jaclyn-taroni · 2021-01-25T14:52:34Z

I can make those changes and get this merged.

jaclyn-taroni · 2021-01-25T14:56:19Z

Ah, I see that file was already in master which is why it is showing up in the diff still. Will merge once CI finishes!

jharenza · 2021-01-25T15:04:07Z

I think we only apply a classifier to one shuffled version of the matrix - it might be worth doing that a few times to get an idea about variability in those scores, but I have not thought deeply about that.

I was referring to the classifier scores themselves, which change every time (wasn't referring to shuffle scores).

jaclyn-taroni · 2021-01-25T15:22:14Z

Taking a look at the history of the analyses/tp53_nf1_score/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded_classifier_scores.tsv, the last couple commits look like changes in precision that affect only a few samples:

d4b5b7a#diff-28518f4529127b6de6e6406ffccca67bfcc2b1b4f9e261058f8e88bdc350228c

3f8baa5#diff-28518f4529127b6de6e6406ffccca67bfcc2b1b4f9e261058f8e88bdc350228c

That leads me to believe that the more appreciable changes here are new samples + scaling related, which suggests no changes to the code are necessary.

re-run with v18

c5f2653

kgaonkar6 changed the title ~~re-run with v18~~ re-run tp53 classifier with v18 Jan 22, 2021

jharenza approved these changes Jan 22, 2021

View reviewed changes

jharenza requested a review from jaclyn-taroni January 23, 2021 15:07

Merge branch 'master' into rerun_v18_tp53_classifier

ee7aa35

jaclyn-taroni approved these changes Jan 25, 2021

View reviewed changes

jaclyn-taroni added 2 commits January 25, 2021 09:53

Remove .pyc file

5a4a7e9

Add *.pyc to root gitignore

f374300

jaclyn-taroni merged commit f85831f into AlexsLemonade:master Jan 25, 2021

kgaonkar6 mentioned this pull request Feb 15, 2021

Tp53 loss status update: bug fix #942

Merged

5 tasks

jharenza mentioned this pull request Apr 1, 2021

Updated analysis: investigate putative functionality of TP53 SNV and CNV calls #837

Closed

kgaonkar6 mentioned this pull request Apr 28, 2021

PBTA Histologies v19 Part 7 of N: Update tp53 score #1045

Merged

5 tasks

kgaonkar6 deleted the rerun_v18_tp53_classifier branch May 11, 2021 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

re-run tp53 classifier with v18 #922

re-run tp53 classifier with v18 #922

kgaonkar6 commented Jan 22, 2021 •

edited

Loading

jharenza left a comment

jaclyn-taroni commented Jan 25, 2021

jaclyn-taroni left a comment

jashapiro commented Jan 25, 2021 •

edited

Loading

jaclyn-taroni commented Jan 25, 2021

jaclyn-taroni commented Jan 25, 2021

jharenza commented Jan 25, 2021

jaclyn-taroni commented Jan 25, 2021

re-run tp53 classifier with v18 #922

re-run tp53 classifier with v18 #922

Conversation

kgaonkar6 commented Jan 22, 2021 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jharenza left a comment

Choose a reason for hiding this comment

jaclyn-taroni commented Jan 25, 2021

jaclyn-taroni left a comment

Choose a reason for hiding this comment

jashapiro commented Jan 25, 2021 • edited Loading

jaclyn-taroni commented Jan 25, 2021

jaclyn-taroni commented Jan 25, 2021

jharenza commented Jan 25, 2021

jaclyn-taroni commented Jan 25, 2021

kgaonkar6 commented Jan 22, 2021 •

edited

Loading

jashapiro commented Jan 25, 2021 •

edited

Loading