This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Ependymoma subtyping #490

Merged

jaclyn-taroni merged 34 commits into AlexsLemonade:master from tkoganti:ependymoma_subtyping

Feb 18, 2020

Collaborator

tkoganti commented Jan 30, 2020

Purpose/implementation

ependymoma subtyping

What scientific question is your analysis addressing?

Create file with fusion results, gene expression, GISTIC, breaks density and gsva scores got ependymoma samples according to the ticket here (#245)

What was your approach?

Take only ependymoma samples from pbta-histologies.tsv.
Based on primary site, classify them as supratentorial/ infratentorial (Some as "NA" since they s ay Ventricles as primary site)
Match corresponding RNA and DNA sample BSID's based on common sample_id column in pbta-histologies.tsv
Based on the files mentioned above in scientific question, fill up table for ependymoma samples based on specific fusion, gene expression, broad CNA etc.

Which areas should receive a particularly close look?

All scripts should run from OpenPBTA folder. Any feedback on that?
Column headers and file formats for final files okay?

Results

OpenPBTA/OpenPBTA-analysis/analyses/molecular-subtyping-EPN/results/EPN_all_data.tsv

What types of results are included (e.g., table, figure)?

tables

What is your summary of the results?

The information in final EPN_all_data.tsv will help subgroup samples into different subtypes

Teja Koganti added 3 commits

January 29, 2020 15:43


          Initial files added to ependymoma subtyping folder

34cd0a8


          Added bash script and changed the paths for all files to run from Ope…

25228f2

…nPBTA directory


          Added bash script and changed the paths for all files to run from Ope…

9313bda

…nPBTA directory

tkoganti requested a review from jaclyn-taroni

January 30, 2020 17:27

jaclyn-taroni requested review from jashapiro and jaclyn-taroni and removed request for jaclyn-taroni

January 30, 2020 18:16

jaclyn-taroni added the molecular subtyping label

jashapiro reviewed

View reviewed changes

Member

jashapiro left a comment

Hi @tkoganti , thank you for this contribution!

I think you have added all the data in the ticket, so the output file looks like it is in pretty good shape (pending updated data for some things)

My suggestions are mostly style and efficiency, with an eye toward making it easier for somebody coming in to easily find and change any sections that might need updating later with minimal effort.

My first big suggestion is that you move the file names out of the individual scripts, and into the master shell script that runs them: run-molecular-subtyping-EPN.sh, specifying the files needed for each script and the output files via command line options. The reason for this is that some of these files are likely to move or be renamed as new releases come in, so having all of the file names in one place can be very helpful when debugging.

If you are familiar with argparse, that is generally the most standard way to add command line options to a python script. One example of its use is here: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/copy_number_consensus_call/scripts/merged_to_individual_files.py

My other suggestions are mostly about reducing complexity by taking advantage of some built-in functions to reduce the number of lines of code. Some of these are pretty simple, like using string join methods, and others are more complex, like the pandas "merge" method.

I am happy to help out with any changes you want to implement but might not have experience with. Just let me know!

analyses/molecular-subtyping-EPN/00-subsetting-files-for-EPN.py Outdated

+              EP = pbta_histologies[pbta_histologies["disease_type_new"]=="Ependymoma"]
+              EP_rnaseq_samples = EP[EP["experimental_strategy"] == "RNA-Seq"][["Kids_First_Biospecimen_ID", "primary_site"]]
+              EP_rnaseq_samples["disease_group"] = ["infratentorial" if "Posterior Fossa" in primary else "infratentorial" if "Optic" in primary else "supratentorial" if "Frontal Lobe" in primary else "supratentorial" if "Parietal Lobe" in primary else "infratentorial" if "Spinal Cord" in primary else "supratentorial" if "Occipital Lobe" in primary else "infratentorial" if "Tectum" in primary else "infratentorial" if "Spine" in primary else "supratentorial" if "Temporal Lobe" in primary else "infratentorial" if "Spinal" in primary else  "None" for primary in EP_rnaseq_samples["primary_site"]]

Member

jashapiro Jan 30, 2020

This line is very long, making it hard to parse what is going on.

I would suggest that you break the logic of this line out into a separate function, and then calling that in the list comprehension. If you were to name the function group_disease() you could reduce this line to:

EP_rnaseq_samples["disease_group"] = [group_disease(primary) for primary in EP_rnaseq_samples["primary_site"]]

Which would make it much more clear what this step is doing.

You can also make use of or statements to make things a bit more discoverable within the function. Something like:

if("Posterior Fossa" in primary or "Optic" in primary):
  return "intratentorial"
elif("Frontal Lobe" in primary ...

Collaborator Author

tkoganti Feb 6, 2020

There is an R script replacing this now

analyses/molecular-subtyping-EPN/00-subsetting-files-for-EPN.py Outdated

Comment on lines 23 to 25

+              import rpy2.robjects as robjects
+              from rpy2.robjects import pandas2ri
+              pandas2ri.activate()

Member

jashapiro Jan 30, 2020

You already imported rpy2.robects, so you don't need to do this again. I would also suggest that you don't really need to import pandas2ri separately, as you only use it once, so

robjects.pandas2ri.activate()

should be sufficient.

analyses/molecular-subtyping-EPN/00-subsetting-files-for-EPN.py Outdated

+              from rpy2.robjects import pandas2ri
+              pandas2ri.activate()
+              # Reading in pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds file to subset (All ependymoma samples are  stranded, so ignoring polyA gene expression file in subsetting )

Member

jashapiro Jan 30, 2020

I would appreciate if you could split this comment across multiple lines (Strive for lines no longer than ~90 characters, with 100 as an upper bound. It makes code revisions much easier to track, and improves readability)

analyses/molecular-subtyping-EPN/00-subsetting-files-for-EPN.py Outdated



		#Reading in pbta histologies file to subset just the ependymoma samples
		pbta_histologies = pd.read_csv("data/pbta-histologies.tsv", sep="\t")

Member

jashapiro Jan 30, 2020

In case these filenames change or the files move, it might be a good idea to have file names be passed into the scripts here as arguments, set in the shell.

Collaborator Author

tkoganti Feb 6, 2020

There is an R script for this now

Member

jashapiro Feb 7, 2020

We should probably remove this version of the script to avoid confusion.

analyses/molecular-subtyping-EPN/00-subsetting-files-for-EPN.py Outdated

Comment on lines 37 to 46





		# In[158]:

Member

jashapiro Jan 30, 2020

You may want to clear the blank lines?

analyses/molecular-subtyping-EPN/02_ependymoma_generate_all_data.py Outdated



		# Looping through EPN sample file with both RNA and DNA BSID's
		with open("analyses/molecular-subtyping-EPN/results/EPN_molecular_subtype.tsv", "r") as notebook:

Member

jashapiro Jan 31, 2020

here too, I think you may be more efficient to read the whole file in with pandas, rather than going through it line by line; that can also save you a lot of casting strings to ints that you are doing manually now.

analyses/molecular-subtyping-EPN/02_ependymoma_generate_all_data.py Outdated

+              CNA=pd.read_csv(zip.open('2019-12-10-gistic-results-cnvkit/broad_values_by_arm.txt'), sep="\t")
+              CNA = CNA.set_index("Chromosome Arm")
+              count=0

Member

jashapiro Jan 31, 2020

Is this count used anywhere?
I see you adding to it later, but nowhere that the final value is used.

Collaborator Author

tkoganti Feb 6, 2020

Removed this now

analyses/molecular-subtyping-EPN/01-make_notebook_RNAandDNA.py Outdated

+              WGS_dnaseqsamples = WGSPT[["Kids_First_Biospecimen_ID", "Kids_First_Participant_ID", "sample_id"]]
+              outnotebook.write("Kids_First_Participant_ID\tsample_id\tKids_First_Biospecimen_ID_DNA\tKids_First_Biospecimen_ID_RNA\tsubtype\n")
+              count =0

Member

jashapiro Jan 31, 2020

Is this count used anywhere?

analyses/molecular-subtyping-EPN/02_ependymoma_generate_all_data.py Outdated

+                              else:
+                                  line.append("0")
+                              # Adding all the data collected so far for every sample with RNA and DNA BSID to "line"
+                              line.extend((nfkb_gsva_score, C11orf95_RELA, LTBP3_RELA, PTEN_TAS2R1, C11orf95_YAP1, YAP1_MAMLD1, YAP1_FAM118B, C11orf95_MAML2, breaks_density, RELA_zscore, L1CAM_zscore,ARL4D_zscore, CLDN1_zscore, CXorf67_zscore, TKTL1_zscore, GPBP1_zscore, IFT46_zscore))

Member

jashapiro Jan 31, 2020

You may want to break up the contents of the extend here into logical groups that you can separate onto their own lines.
Something like the below, just to keep the line lengths in check and improve readability. If you do the same with the headers it will be easier to figure out if you accidentally change the order and the header and contents get out of sync.

Suggested change

      
                            line.extend((nfkb_gsva_score, C11orf95_RELA, LTBP3_RELA, PTEN_TAS2R1, C11orf95_YAP1, YAP1_MAMLD1, YAP1_FAM118B, C11orf95_MAML2, breaks_density, RELA_zscore, L1CAM_zscore,ARL4D_zscore, CLDN1_zscore, CXorf67_zscore, TKTL1_zscore, GPBP1_zscore, IFT46_zscore))
          
                            line.extend((nfkb_gsva_score, C11orf95_RELA, LTBP3_RELA, PTEN_TAS2R1, 
          
                                         C11orf95_YAP1, YAP1_MAMLD1, YAP1_FAM118B, C11orf95_MAML2, breaks_density,
          
                                         RELA_zscore, L1CAM_zscore, ARL4D_zscore, CLDN1_zscore, 
          
                                         CXorf67_zscore, TKTL1_zscore, GPBP1_zscore, IFT46_zscore))

analyses/molecular-subtyping-EPN/02_ependymoma_generate_all_data.py Outdated

Comment on lines 162 to 164

+                              for i in line:
+                                  outfile.write(str(i)+"\t")
+                              outfile.write("\n")

Member

jashapiro Jan 31, 2020

For these lines, you can do the same as I suggested before, but if you have non-string elements in line (which I assume you do with the call to str() you will want to apply the str function to each element as well, which you can do with python's map function.

"\t".join(map(str, line))

Collaborator

jharenza commented Feb 2, 2020

Hi @tkoganti! This looks like a great start! Can I suggest creating a notebook with some additional comments/checks as you go through each step/series of steps, similar to those in the embryonal subtyping here? I think this will help a lot with review :). Thanks!

jashapiro added 2 commits

February 3, 2020 11:41


          Add Ependymoma subtyping to CI

24d8a6f


          Merge branch 'master' into ependymoma_subtyping

fb09448

Member

jashapiro commented Feb 3, 2020

Hi @tkoganti ! I just added the script for this file to CI, so make sure you pull down the latest changes before pushing anything back to the server. That will help us catch errors that might come up with edits or updates to the data files.

jashapiro added 7 commits

February 3, 2020 14:34


          Add to analyses/README.md

8f80aec


          Test removing rpy2 and using pyreadr exclusively

5c5119c


          Change to use pyreadr properly

5d4c32c


          Revert pyreadr changes

e55ea34


          Add subset flag to CI

1491b2d


          Use R to generate subset file & shell to specify filenames

4d83601


          Add results file

5f119b1

Member

jashapiro commented Feb 4, 2020

Hi again @tkoganti! Your script was failing CI, and after some investigation, it seems like there is a problem with the version of rpy2 that we are using in that it doesn't support some of the features you were using. I tried to fix it by manually converting to a pandas dataframe, and by using pyreadr, but neither of those worked. So I decided to break out the big guns and just the subset step in R and create a gzipped tsv that pandas can read in directly.

While I was changing file names, I also made a few changes that should make things a bit more flexible as files change, moving all the file paths to the shell script. I also added a line to the shell script so that all of the paths can be specified relative the location of the file, no matter where the shell script is invoked from.

That means that I implemented the argparse changes that I had suggested, but I left the rest of your code as similar to the original as I could. If you have questions about the changes that I did make, or about any of my previous review questions, let me know!


          Move Ependymoma subtyping up in CI

408c02c

Collaborator Author

tkoganti commented Feb 5, 2020 •

edited

Loading

Hi @jashapiro I am trying to run this script with the changes you made (02_ependymoma_generate_all_data.py) and I see that when you changed to argparse, you are reading expression file (.rds file) with pd.read_csv (fpkm_df = pd.read_csv(args.expression, sep = "\t")). I was getting an error at that line since that file should be read using pyreadr?

Member

jaclyn-taroni commented Feb 5, 2020 •

edited

Loading

Hi @tkoganti - can I ask if you are using the project Docker image for development? I am wondering if it is a software version issue.

The subset step is not run in CI because we use files in CI that contain a limited amount of samples to save on download time and RAM and those samples may not overlap with this histology very much or at all. The subsetting step will eventually need to be run in the project Docker container. The idea is to run all steps for the project on AWS once we have a final freeze of the data. For that reason, we will want to ensure it runs with the version of the software that is available on that Docker image.

Edit: Sorry, I was responding to the earlier version of the comment I read in my email!

Member

jashapiro commented Feb 5, 2020

Hi @jashapiro I am trying to run this script with the changes you made (02_ependymoma_generate_all_data.py) and I see that when you changed to argparse, you are reading expression file (.rds file) with pd.read_csv (fpkm_df = pd.read_csv(args.expression, sep = "\t")). I was getting an error at that line since that file should be read using pyreadr?

In the changes I made, I also changed the EPR expression rds file that is generated by the 00 script (now in R) to a tsv file. This was to get around limitations in pyreader and Rpy2 that couldn’t be easily solved with the versions that we have in the docker container. So the shell script now specifies that the expression file is a .tsv.gz file, and that is what is read in by the 02 script. See line 21 of the shell script.

Really we should delete the .rds subset file from the repo, as it isn’t used at all anymore.

Does that all make sense? Let me know if I need to explain anything in more detail.

Teja Koganti added 3 commits

February 6, 2020 15:36


          Responding to pull request reviews

d02c4d0


          Adding jupyter notebook

ba0e00a


          Merge branch 'ependymoma_subtyping' of https://github.com/tkoganti/Op…

b5d1a77

…enPBTA-analysis into ependymoma_subtyping

Collaborator Author

tkoganti commented Feb 6, 2020 •

edited

Loading

@jashapiro I missed there was a .tsv.gz file too in the subsetted data. I removed the .rds file now and the script will use .tsv.gz file. Please look at the new pull request(b5d1a77) and let me know your feedback.

Collaborator Author

tkoganti commented Feb 6, 2020

Hi @jaclyn-taroni I was not using docker for this development. I saw that all the modules I was using are already in the docker file here (https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/Dockerfile) Please let me know if I need to update any versions.

cansavvy approved these changes

View reviewed changes

Collaborator

cansavvy left a comment •

edited

Loading

@jashapiro I only have reviewed the Rscript. The code itself looks fine, there's some documentation things that need updating. Otherwise 👍

analyses/molecular-subtyping-EPN/00-subset-for-EPN.R Outdated

+                  c("-i", "--histology"),
+                  type = "character",
+                  default = NULL,
+                  help = "hisology file tsv",

Collaborator

cansavvy Feb 10, 2020

Suggested change

      
                help = "hisology file tsv",
          
                help = "histology file tsv",

analyses/molecular-subtyping-EPN/00-subset-for-EPN.R Outdated

+                       disease_type_new == "Ependymoma") %>%
+                pull(Kids_First_Biospecimen_ID)
+              # Subsetting expression columns with column names/BSIDs that are  in the  list  of ependymoma samples

Collaborator

cansavvy Feb 10, 2020

Suggested change

      
            # Subsetting expression columns with column names/BSIDs that are  in the  list  of ependymoma samples
          
            # Subsetting expression columns with column names/BSIDs that are  in the list of ependymoma samples

analyses/molecular-subtyping-EPN/00-subset-for-EPN.R Outdated

+                  c("-o", "--outfile"),
+                  type = "character",
+                  default = NULL,
+                  help = "output file"

Collaborator

cansavvy Feb 10, 2020

Suggested change

      
                help = "output file"
          
                help = "File path and name for output tsv file"

analyses/molecular-subtyping-EPN/00-subset-for-EPN.R Outdated

+              # -o, --output_file : path for output file
+              #
+              # example invocation:
+              # Rscript scripts/bed_to_segfile.R \

Collaborator

cansavvy Feb 10, 2020

This example doesn't seem to be updated for this script and options.

jashapiro and others added 6 commits

February 10, 2020 10:57


          Update 00-subset-for-EPN.R with changes from @cansavvy code review

aca978b


          Zscore column names changed

d9c44d6


          Changed how merge is done between RNA and DNA tables

f68966a


          Removed comment lines

415b5e7


          remove duplicate commented code.


          Merge branch 'master' into ependymoma_subtyping

33bacc2

Collaborator

jharenza commented Feb 12, 2020

Hi @tkoganti and @jashapiro. Great job on this so far! Sorry it took me a while to get to this, but I just have a few comments:

In EPN_all_data.tsv, for disease group, if None, can we call it undetermined or ambiguous to be a bit more accurate?
You could possibly add CDKN2A focal status based on ST-EPN-RELA harboring:

Common CN changes: CDKN2A deletions. Chr9 or Chr9p loss
This gene is on 9p21.3, and if chr9p loss is 0, then it is still possible just the gene is deleted, so having the focal information would be good here.

I see 1q loss, but not 1q gain, which is mentioned for PF-EPN-A, so I would add that:

1q gain most frequent CNA

Is the EPN_molecular_subtype.tsv table a placeholder for the molecular subtypes? It seems that it is a selection of columns from EPN_all_data.tsv, but does not include subtypes. I think this table as named would consist of the patient ID, biospecimen IDs, and the molecular subtypes: ST-EPN-RELA, ST-EPN-YAP1, PF-EPN-A, and PF-EPN-B. Was this the next step/ another PR, or did you need some guidance here?

Collaborator Author

tkoganti commented Feb 12, 2020 •

edited

Loading

Thanks for the feedback @jharenza!

I will add 1q gain column. I meant to add it earlier but missed it accidentally.
I will use focal_data_by_genes.txt file from GISTIC to add the CDKN2A focal changes
So I wrote another script to categorize these samples to different groups. There were total of 93 samples and after using fusion, chromosomal broad values and expression Z-scores, I was able to categorize some samples but there were still 42 samples that did not belong to any category after I set these rules. Two columns I have not used is NFKB_pathway_GSEAscore and breaks_density-chromosomal_instability. Do you have any suggestions for cut-off values for these columns?

Member

jaclyn-taroni commented Feb 12, 2020

I will use focal_data_by_genes.txt file from GISTIC to add the CDKN2A focal changes

I wanted to note that we have not used the GISTIC gene data in other subtyping analyses, but have instead used the results of analyses/focal-cn-file-preparation which don't take into account the width of a copy number alteration. GISTIC, based on my cursory understanding, "rewards" recurrence and we've run it on the entire cohort. It seems that EPN samples make up ~10% of the available WGS samples. Now that I've added GISTIC to the container (#531; related to #529), I am curious about comparing results when we limit the cohort to a specific histology. I don't disagree with using focal_data_by_genes.txt here, but wanted to document how I'm thinking about potential limitations.

jashapiro reviewed

View reviewed changes

analyses/molecular-subtyping-EPN/02_ependymoma_generate_all_data.py Outdated

Comment on lines 106 to 108

+              # Adding breakpoints density for chromosomal instability  to the dataframe
+              EPN_notebook["breaks_density-chromosomal_instability"] = EPN_notebook.apply(lambda x: breakpoint_density.loc[x["Kids_First_Biospecimen_ID_DNA"], "breaks_density"]
+                      if x["Kids_First_Biospecimen_ID_DNA"] is not np.nan  else "NA", axis=1)

Member

jashapiro Feb 12, 2020 •

edited

Loading

The file we were using here ../chromosomal-instability/breakpoint-data/union_of_breaks_densities.tsv has been removed from that analysis (which is why CI is currently failing). There are still files for CNV and SV separately at ../chromosomal-instability/breakpoint-data/cnv-breaks-densities.tsv and ../chromosomal-instability/breakpoint-data/sv-breaks-densities.tsv, respectively. Having two data columns for chromosomal instability seems reasonable at this stage, and they can perhaps be useful for determining reliability of this measure.

Note: The file name is set in run-molecular-subtyping-EPN.sh

Collaborator

jharenza commented Feb 12, 2020

@jaclyn-taroni good point - @tkoganti can you explore both focal files?

Collaborator Author

tkoganti commented Feb 13, 2020

Hi @jashapiro and @jaclyn-taroni

I tried to implement CNV and SV file with breaks density score and I found that these samples were missing from the file -

BS_0W8AWY10',
'BS_2E81A3FT',
'BS_4PAE19ZC',
'BS_5ZRZC3ZM',
'BS_62XRDTM6',
'BS_6NPSZJ4C',
'BS_6Z4WHYDG',
'BS_99PPRCW4',
'BS_9GJHMA3J',
'BS_9JVGXA2W',
'BS_AGD2ATY1',
'BS_B4DY7ET3',
'BS_BBHFKBE7',
'BS_QSMFVHSB',
'BS_R82MYPKB',
'BS_R8ZKDFWH',
'BS_W5P7SPDH',
'BS_XMNBSBGN

Member

jashapiro commented Feb 13, 2020

I tried to implement CNV and SV file with breaks density score and I found that these samples were missing from the file -

BS_0W8AWY10',
'BS_2E81A3FT',
'BS_4PAE19ZC',

I think those are likely samples with 0 (or NA) breaks by CNV consensus, which may have been accidentally excluded in the current versions of those files. If so, I think this is fixed by #532, which should be merged into master very soon. Tagging @cansavvy who worked on that analysis.

In the mean time, and in case that is not the sole cause of the missing samples, it probably makes sense to use NA for such missing data.

jashapiro mentioned this pull request

Chr instability: PR 3 of 3: Histology plots #532

Merged

5 tasks


          Added some columns as per comments from 02-12-2020

141183c

Collaborator Author

tkoganti commented Feb 13, 2020

Hi @jashapiro This file also had missing BSID's from DNA samples - analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz
I used "NA" for now and pushed all the changes mentioned above


          Merge remote-tracking branch 'upstream/master' into ependymoma_subtyping

6bfb636

Member

jashapiro commented Feb 13, 2020

I just merged the changes in master into your branch. For some reason doing it on github wasn't working, so I had to do it manually. I don't think the changes will fix the missing data problem, as I don't think the update has hit the master branch yet, but lets see if it gets us past the CI failure.


          update invocation of 02_ependymoma_generate_all_data.py

e81a424

Add line continuation characters to bash script
Remove no longer used --breakpoints option

Member

jashapiro commented Feb 14, 2020

Hi @jashapiro This file also had missing BSID's from DNA samples - analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz
I used "NA" for now and pushed all the changes mentioned above

I would not have expected the focal-cn file file to have NA in it, but I could be wrong about its contents. I am going to tag @jaclyn-taroni who was more involved in its creation to see if she has thoughts.

Member

jaclyn-taroni commented Feb 14, 2020

Hi @tkoganti - there will be no neutral calls in the analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz, so if I BSID is present in the consensus SEG file but not this file, a status would not be missing but neutral.

jaclyn-taroni mentioned this pull request

Improve focal-cn-file-preparation documentation, filter copy neutral out of sex chromosome files #537

Merged

5 tasks

jashapiro added 2 commits

February 14, 2020 14:43


          Handle missing data, and some refactoring

20cda5f

I made some substantial changes here in structure, but the results should be largely unchanged.

I did some transposing when constructing data frames so we can use the same function (fill_df)  to extract data more often, and moved the ID column specification out of the function so that RNA and DNA-derived data are handled the same way.

The function then allows a set of samples to be specified, and if the request is for a a sample that does not fall in there, it is set as NA for that column in the output data, otherwise it is filled in with a default value.


          Delete unused full table zscore

57884c5

Member

jashapiro commented Feb 14, 2020

Hi @tkoganti-

I made some more updates/revisions to the code here to deal with the missing data questions, and while I was doing it I ended up doing a bit of extra refactoring.

I did some transposing when constructing data frames so we can use the same function (fill_df()) to extract data more often: in most cases the samples are now the row indexes of the data frame. I also and moved the ID column specification out of the function so that RNA and DNA-derived data are handled the same way by the function.

fill_df() then allows a set of included_samples to be specified, and if the request is for a a sample that does not fall in that set, the returned values is 'NA', otherwise it is filled in with a default value. This covers the case that came up in consensus_seg_annotated_cn_autosomes.tsv.gz where some missing samples should be missing, but others just had no change.

I think all of it is correct, but please do check carefully that it all makes sense to you and that the results are as expected.

kgaonkar6 mentioned this pull request

Planned release: v15 #543

Closed

5 tasks

jashapiro added 2 commits

February 18, 2020 10:29


          Merge remote-tracking branch 'upstream/master' into ependymoma_subtyping

d65e360


          Rerun with updated data

5d05922

AlexsLemonade#532 changed results.

jaclyn-taroni merged commit 3b2ae77 into AlexsLemonade:master

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

molecular subtyping