Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

re-run tp53 classifier with v18 #922

Merged

Conversation

kgaonkar6
Copy link
Collaborator

@kgaonkar6 kgaonkar6 commented Jan 22, 2021

Purpose/implementation Section

What scientific question is your analysis addressing?

Just a re-run of the tp53 classifier to add 5 new stranded samples added in v18

What was your approach?

Just a re-run:
docker exec -ti frosty_tesla bash -c "cd /home/rstudio/OpenPBTA && bash analyses/tp53_nf1_score/run_classifier.sh"

What GitHub issue does your pull request address?

Related to #837 since we need the updated tp53_score in some of the annotation scripts.

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

There seem to be a lot of changes in analyses/tp53_nf1_score/results/TP53_NF1_snv_alteration.tsv from the sample order changing I believe since I don't see any actual values changing.

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

tables

What is your summary of the results?

Manual review of the results of classifier with the 5 stranded samples seems to have changed the values a little. Will plot values in #886 to see if drastic changes occured.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@kgaonkar6 kgaonkar6 changed the title re-run with v18 re-run tp53 classifier with v18 Jan 22, 2021
Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this benefit from setting a seed or is the classifier set up such that it will always give us new results? cc @gwaygenomics

@jaclyn-taroni
Copy link
Member

Would this benefit from setting a seed or is the classifier set up such that it will always give us new results?

One of the steps in the classifier is to scale by gene:

So it makes sense to me that adding a few samples would have a (minor) impact on the values. If you're referring to the scores for the shuffled data, a seed gets set here:

I think we only apply a classifier to one shuffled version of the matrix - it might be worth doing that a few times to get an idea about variability in those scores, but I have not thought deeply about that.

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think analyses/tp53_nf1_score/__pycache__/utils.cpython-35.pyc was probably an unintentional add - can you remove that and stop tracking (probably via a analyses/tp53_nf1_score/.gitignore) before this gets merged please?

@jashapiro
Copy link
Member

jashapiro commented Jan 25, 2021

👍 I think analyses/tp53_nf1_score/__pycache__/utils.cpython-35.pyc was probably an unintentional add - can you remove that and stop tracking (probably via a analyses/tp53_nf1_score/.gitignore) before this gets merged please?

We can safely add *.pyc to the root .gitignore (but be sure to remove the files FIRST, or annoying trouble will come).

@jaclyn-taroni
Copy link
Member

I can make those changes and get this merged.

@jaclyn-taroni
Copy link
Member

Ah, I see that file was already in master which is why it is showing up in the diff still. Will merge once CI finishes!

@jharenza
Copy link
Collaborator

I think we only apply a classifier to one shuffled version of the matrix - it might be worth doing that a few times to get an idea about variability in those scores, but I have not thought deeply about that.

I was referring to the classifier scores themselves, which change every time (wasn't referring to shuffle scores).

@jaclyn-taroni
Copy link
Member

Taking a look at the history of the analyses/tp53_nf1_score/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded_classifier_scores.tsv, the last couple commits look like changes in precision that affect only a few samples:

d4b5b7a#diff-28518f4529127b6de6e6406ffccca67bfcc2b1b4f9e261058f8e88bdc350228c

3f8baa5#diff-28518f4529127b6de6e6406ffccca67bfcc2b1b4f9e261058f8e88bdc350228c

That leads me to believe that the more appreciable changes here are new samples + scaling related, which suggests no changes to the code are necessary.

@jaclyn-taroni jaclyn-taroni merged commit f85831f into AlexsLemonade:master Jan 25, 2021
@kgaonkar6 kgaonkar6 mentioned this pull request Feb 15, 2021
5 tasks
@kgaonkar6 kgaonkar6 deleted the rerun_v18_tp53_classifier branch May 11, 2021 13:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants