Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

update molecular subtyping pathology #854

Merged
merged 18 commits into from
Dec 1, 2020

Conversation

jharenza
Copy link
Collaborator

@jharenza jharenza commented Nov 24, 2020

Purpose/implementation Section

What scientific question is your analysis addressing?

Update molecular subtyping pathology module. NOTE: #851, #853, and #852 should be be merged and run first.

What was your approach?

What GitHub issue does your pull request address?

#719, #784, #667

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

  • I left integrated_diagnosis as NA if there is no subtype
  • PT_AQWDQW27 seems to be missing from the EPN subtyping module and was noted as a sample which is a rare meningioma but was subtyped as EPN. Will create a ticket to assess the EPN module.

Is there anything that you want to discuss further?

Note: PT_KTRJ8TFY still has two different subtypes for samples which are Progressive Disease Post-Mortem- @kgaonkar6 will you check that sample to be sure it should end up this way from clinical feedback? This was one sample which did not have a clinical sequencing report, so this may be expected.

Note: PT_6Q0NPVP3 also has duplicate subtypes, but this will be taken care of once #851 is merged.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

yes

Results

What types of results are included (e.g., table, figure)?

tables

What is your summary of the results?

1497 rows in final table, but this will change with modules being merged ahead

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jharenza jharenza requested a review from kgaonkar6 November 24, 2020 07:05
@kgaonkar6
Copy link
Collaborator

Oh yes, BS_EE73VE7V ( 1 of the sample Progressive Disease Post-Mortem for PT_KTRJ8TFY ) only had the histone variant in Vardict so is lost in consensus calling so got subtyped as H3 wildtype.

Ticket https://github.com/d3b-center/bixu-tracker/issues/463 shows the IGV for the variant, we should probably add that to the clinical/pathology module as H3 mutatant along with other updates?

Copy link
Collaborator

@kgaonkar6 kgaonkar6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use case_when() calls here for assigning subtypes since we have multiple options. I added some suggestions for changing ifelse calls to case_when()

Jo Lynne Rokita and others added 9 commits November 24, 2020 21:02
…sults.Rmd


change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>
…sults.Rmd


change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>
…sults.Rmd


change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>
…sults.Rmd


change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>
…sults.Rmd


change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>
…sults.Rmd


change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>
…sults.Rmd


change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>
update Notes to be in sync with new subtypes
update comment about adding int dx/broad/short hist here
@jharenza
Copy link
Collaborator Author

Oh yes, BS_EE73VE7V ( 1 of the sample Progressive Disease Post-Mortem for PT_KTRJ8TFY ) only had the histone variant in Vardict so is lost in consensus calling so got subtyped as H3 wildtype.

Ticket d3b-center/bixu-tracker#463 shows the IGV for the variant, we should probably add that to the clinical/pathology module as H3 mutatant along with other updates?

For now, since we do not have clinical sequencing to confirm this, let's leave this out of the clinical module. I expect this will be captured when #819 goes in by @migbro in a few weeks

update and remove duplicates from final file
@jharenza jharenza requested a review from kgaonkar6 November 25, 2020 03:53
@jharenza
Copy link
Collaborator Author

@kgaonkar6 I updated the code based on your comments, updated Notes, and added in the 03-script the changes due to pathology feedback. I then get 1487 rows in the final file, but 1481 biospecimens. It seems like 6 more have duplicates, but when I check for duplicates of the final file using

compiled_df[duplicated(compiled_df$Kids_First_Biospecimen_ID),]

I get the table below (which do not show me duplicate BS_IDs). (Maybe overlooking something in the code).

Kids_First_Participant_ID sample_id Kids_First_Biospecimen_ID molecular_subtype integrated_diagnosis  
PT_00G007DM 7316-272 BS_QWNBZ9RJ MB, To be classified NA  
PT_00G007DM 7316-272 BS_K07KNTFY MB, To be classified NA  
PT_9BZETM0M 7316-158 BS_5P1TN10Z HGG, H3 G35 High-grade glioma/astrocytoma, H3 G35-mutant  
PT_9BZETM0M 7316-158 BS_TV5B86ZD HGG, H3 G35 High-grade glioma/astrocytoma, H3 G35-mutant  
PT_9BZETM0M 7316-158 BS_STNH7YSX HGG, H3 G35 High-grade glioma/astrocytoma, H3 G35-mutant  

Copy link
Collaborator

@kgaonkar6 kgaonkar6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jharenza! From your comment above I also looked into 03 now and it seems the following updates will remove the duplicated rows.

annotation_cols <- unlist(str_split("integrated_diagnosis,broad_histology,short_histology,molecular_subtype", ","))

get_count<-function(x){
compiled_df %>% 
    dplyr::group_by(Kids_First_Biospecimen_ID,!!as.name(x)) %>%
    tally() %>%
    filter(n>1)
}
lapply(annotation_cols,function(x) get_count(x))

The above check had no rows so we only have 1 value per bs_id in complied_df now which is expected.

Additionally, EPN molecular_subtypes sometimes deviated from the format of like HGG, wildtype/ HGG, K28M values do we want to update that ( in EPN updated PR) ?

# A tibble: 5 x 1
  molecular_subtype    
  <chr>                
1 EPN, To be classified
2 ST_EPN_RELA          
3 ST_EPN_YAP1          
4 EPN, H3 K28          
5 PT_EPN_A 

@jharenza
Copy link
Collaborator Author

jharenza commented Dec 1, 2020

Additionally, EPN molecular_subtypes sometimes deviated from the format of like HGG, wildtype/ HGG, K28M values do we want to update that ( in EPN updated PR) ?

Sure, sounds like we can make that update.

Jo Lynne Rokita and others added 3 commits December 1, 2020 11:36
…y-feedback.Rmd


add updates from code review

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>
…y-feedback.Rmd


add updates from code review

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>
…y-feedback.Rmd


add updates from code review

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>
Jo Lynne Rokita added 2 commits December 1, 2020 11:39
 close chunk and rerun subtyping
@jharenza jharenza requested a review from kgaonkar6 December 1, 2020 17:09
@jharenza
Copy link
Collaborator Author

jharenza commented Dec 1, 2020

@kgaonkar6 applied changes, fixed typo and one missed chunk closure, and reran, so requested your review once more. Looks like all of the duplicates are now fixed.

Copy link
Collaborator

@kgaonkar6 kgaonkar6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Does all the required tasks in #854

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants