Updated analysis: EPN subtyping to use pathology diagnosis #755

jharenza · 2020-08-29T00:22:53Z

What analysis module should be updated and why?

https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-EPN

What changes need to be made? Please provide enough detail for another participant to make the update.

Currently, this module uses integrated_diagnosis to search for the subset of samples to be subtyped. However, with the rework of rules to determine integrated_diagnosis post molecular subtyping (#748), we should rework this module to search for samples using pathology_diagnosis and the new field for free text pathology diagnosis, which will be present in `pbta-histologies.tsv in #732.

What input data should be used? Which data were used in the version being updated?

pbta-histologies.tsv from V17 instead of from V16

When do you expect the revised analysis will be completed?

unsure

Who will complete the updated analysis?

unsure

The text was updated successfully, but these errors were encountered:

jaclyn-taroni · 2020-09-15T12:27:05Z

CI failed on #764 with:

Generating analyses/molecular-subtyping-EPN/results/EPN_molecular_subtype.tsv that maps DNA and RNA ID's
Generating analyses/molecular-subtyping-EPN/results/EPN_all_data.tsv  that has all the relevant data needed for subtyping
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexes/base.py", line 2897, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'BS_J8VX4D17'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "02_ependymoma_generate_all_data.py", line 180, in <module>
    EPN_notebook = fill_df_with_fpkm_zscores(EPN_notebook, fpkm_df, gene)
  File "02_ependymoma_generate_all_data.py", line 115, in fill_df_with_fpkm_zscores
    zscore_list = stats.zscore(np.array(df.apply(lambda x: fpkmdf.loc[gene_name, x["Kids_First_Biospecimen_ID_RNA"]], axis=1)))
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 6928, in apply
    return op.get_result()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 186, in get_result
    return self.apply_standard()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 292, in apply_standard
    self.apply_series_generator()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 321, in apply_series_generator
    results[i] = self.f(v)
  File "02_ependymoma_generate_all_data.py", line 115, in <lambda>
    zscore_list = stats.zscore(np.array(df.apply(lambda x: fpkmdf.loc[gene_name, x["Kids_First_Biospecimen_ID_RNA"]], axis=1)))
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 1418, in __getitem__
    return self._getitem_tuple(key)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 805, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 961, in _getitem_lowerdim
    return getattr(section, self.name)[new_key]
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 1424, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 1850, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 156, in _get_label
    return self.obj._xs(label, axis=axis)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 3737, in xs
    loc = self.index.get_loc(key)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexes/base.py", line 2899, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('BS_J8VX4D17', 'occurred at index 23')

Exited with code exit status 1

CircleCI received exit code 1

I am going to comment that out in an upcoming commit in the interest of discovering what else may need to be fixed as a result of the v17 release. When this gets addressed, we will need to revert that change such that this step is run in CI.

* v17 files 1. updated files description 2. updated release notes 3. Updated download script to point to v16/v17 * Update release-notes.md Added clinical columns that changed * update release-notes.md added notes for integrated dx * Update doc/release-notes.md add PBTA TMB file `pbta-snv-consensus-mutation-tmb-coding.tsv` to release notes * Comment out HGG step that's broken * Comment out EPN subtyping; see #755 * Use a different histology (LGAT) outside of the logic for running all * Add EWS to the histology_color_palette.tsv so cnv_heatmap.Rmd runs * Ignore collapsed counts from telomerase activity * Add logic to make more robust to CI subset data * Add messages to make debugging easier * Temporarily comment out steps before telomerase activity * Revert "Temporarily comment out steps before telomerase activity" This reverts commit 65b6fb1. Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com> Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com> Co-authored-by: Candace Savonen <cansav09@gmail.com>

jashapiro · 2020-09-18T14:17:30Z

I will be starting on this today. Will need to look at apparent errors as well as the pathology_free_text_diagnosis field.

jharenza added the updated analysis label Aug 29, 2020

jaclyn-taroni added the molecular subtyping Related to molecular subtyping of tumors label Sep 12, 2020

jaclyn-taroni added a commit to baileyckelly/OpenPBTA-analysis that referenced this issue Sep 15, 2020

Comment out EPN subtyping; see AlexsLemonade#755

a1d3630

jaclyn-taroni mentioned this issue Sep 18, 2020

Updated analysis: Overhaul molecular subtyping results compilation step in molecular-subtyping-pathology #784

Closed

jashapiro self-assigned this Sep 18, 2020

jashapiro mentioned this issue Sep 18, 2020

molecular subtyping EPN update #785

Merged

5 tasks

jashapiro closed this as completed in #785 Sep 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated analysis: EPN subtyping to use pathology diagnosis #755

Updated analysis: EPN subtyping to use pathology diagnosis #755

jharenza commented Aug 29, 2020

jaclyn-taroni commented Sep 15, 2020

jashapiro commented Sep 18, 2020

Updated analysis: EPN subtyping to use pathology diagnosis #755

Updated analysis: EPN subtyping to use pathology diagnosis #755

Comments

jharenza commented Aug 29, 2020

What analysis module should be updated and why?

What changes need to be made? Please provide enough detail for another participant to make the update.

What input data should be used? Which data were used in the version being updated?

When do you expect the revised analysis will be completed?

Who will complete the updated analysis?

jaclyn-taroni commented Sep 15, 2020

jashapiro commented Sep 18, 2020