Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Updated analysis: EPN subtyping to use pathology diagnosis #755

Closed
jharenza opened this issue Aug 29, 2020 · 2 comments · Fixed by #785
Closed

Updated analysis: EPN subtyping to use pathology diagnosis #755

jharenza opened this issue Aug 29, 2020 · 2 comments · Fixed by #785
Assignees
Labels
molecular subtyping Related to molecular subtyping of tumors updated analysis

Comments

@jharenza
Copy link
Collaborator

What analysis module should be updated and why?

https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-EPN

What changes need to be made? Please provide enough detail for another participant to make the update.

Currently, this module uses integrated_diagnosis to search for the subset of samples to be subtyped. However, with the rework of rules to determine integrated_diagnosis post molecular subtyping (#748), we should rework this module to search for samples using pathology_diagnosis and the new field for free text pathology diagnosis, which will be present in `pbta-histologies.tsv in #732.

What input data should be used? Which data were used in the version being updated?

pbta-histologies.tsv from V17 instead of from V16

When do you expect the revised analysis will be completed?

unsure

Who will complete the updated analysis?

unsure

@jaclyn-taroni jaclyn-taroni added the molecular subtyping Related to molecular subtyping of tumors label Sep 12, 2020
@jaclyn-taroni
Copy link
Member

CI failed on #764 with:

Generating analyses/molecular-subtyping-EPN/results/EPN_molecular_subtype.tsv that maps DNA and RNA ID's
Generating analyses/molecular-subtyping-EPN/results/EPN_all_data.tsv  that has all the relevant data needed for subtyping
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexes/base.py", line 2897, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'BS_J8VX4D17'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "02_ependymoma_generate_all_data.py", line 180, in <module>
    EPN_notebook = fill_df_with_fpkm_zscores(EPN_notebook, fpkm_df, gene)
  File "02_ependymoma_generate_all_data.py", line 115, in fill_df_with_fpkm_zscores
    zscore_list = stats.zscore(np.array(df.apply(lambda x: fpkmdf.loc[gene_name, x["Kids_First_Biospecimen_ID_RNA"]], axis=1)))
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 6928, in apply
    return op.get_result()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 186, in get_result
    return self.apply_standard()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 292, in apply_standard
    self.apply_series_generator()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 321, in apply_series_generator
    results[i] = self.f(v)
  File "02_ependymoma_generate_all_data.py", line 115, in <lambda>
    zscore_list = stats.zscore(np.array(df.apply(lambda x: fpkmdf.loc[gene_name, x["Kids_First_Biospecimen_ID_RNA"]], axis=1)))
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 1418, in __getitem__
    return self._getitem_tuple(key)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 805, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 961, in _getitem_lowerdim
    return getattr(section, self.name)[new_key]
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 1424, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 1850, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexing.py", line 156, in _get_label
    return self.obj._xs(label, axis=axis)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 3737, in xs
    loc = self.index.get_loc(key)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexes/base.py", line 2899, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('BS_J8VX4D17', 'occurred at index 23')

Exited with code exit status 1

CircleCI received exit code 1

I am going to comment that out in an upcoming commit in the interest of discovering what else may need to be fixed as a result of the v17 release. When this gets addressed, we will need to revert that change such that this step is run in CI.

jaclyn-taroni added a commit to baileyckelly/OpenPBTA-analysis that referenced this issue Sep 15, 2020
jaclyn-taroni added a commit that referenced this issue Sep 15, 2020
* v17 files

1. updated files description
2. updated release notes
3. Updated download script to point to v16/v17

* Update release-notes.md

Added clinical columns that changed

* update release-notes.md

added notes for integrated dx

* Update doc/release-notes.md

add PBTA TMB file `pbta-snv-consensus-mutation-tmb-coding.tsv` to release notes

* Comment out HGG step that's broken

* Comment out EPN subtyping; see #755

* Use a different histology (LGAT) outside of the logic for running all

* Add EWS to the histology_color_palette.tsv so cnv_heatmap.Rmd runs

* Ignore collapsed counts from telomerase activity

* Add logic to make more robust to CI subset data

* Add messages to make debugging easier

* Temporarily comment out steps before telomerase activity

* Revert "Temporarily comment out steps before telomerase activity"

This reverts commit 65b6fb1.

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
Co-authored-by: Candace Savonen <cansav09@gmail.com>
@jashapiro jashapiro self-assigned this Sep 18, 2020
@jashapiro
Copy link
Member

I will be starting on this today. Will need to look at apparent errors as well as the pathology_free_text_diagnosis field.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
molecular subtyping Related to molecular subtyping of tumors updated analysis
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants