Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider interoperation/ingest with Ontologymaps #222

Open
mellybelly opened this issue Aug 23, 2022 · 14 comments
Open

Consider interoperation/ingest with Ontologymaps #222

mellybelly opened this issue Aug 23, 2022 · 14 comments

Comments

@mellybelly
Copy link

Especially this entry that we really need focusing on ICDO: https://progenetix.org/service-collection/ontologymaps/
@mbaudis

@mcourtot
Copy link

leaving a comment to watch this issue

@mbaudis
Copy link

mbaudis commented Aug 23, 2022

See older discussions monarch-initiative/mondo#1148 (comment)

Briefly: For progenetix.org we have done "a bit" of mapping between NCIt and ICD-O Morphology+Topography combinations. This mostly has been done to utilize the NCIt hierarchies in contrast to the unwieldy dual-arm ICD-O (which otherwise is rather well suited to capture cancer diagnoses ...) for the Progenetix cancer genomics resource (which contains >100k individual samples from literature etc.).

However, our work partially was based on older NCIt cancer core codes & more refinement has been done there. Also, this covers only some hundred codes & needs a systematic extension.

@matentzn
Copy link
Collaborator

@mbaudis thanks for your input! I am looking the the ontology maps API, and wondering if we should pull and republish your mappings in a suitable mapping commons in SSSOM format? I am wondering right now about mapping precision. The API returns a tuple with three ids, and I wonder how they related? Are they all supposed to be mutually exact? Does the order in this tuple matter?

[
                        {
                            "id": "NCIT:C9383",
                            "label": "Rectal Adenocarcinoma"
                        },
                        {
                            "id": "pgx:icdom-81403",
                            "label": "Adenocarcinoma, NOS"
                        },
                        {
                            "id": "pgx:icdot-C20.9",
                            "label": "Rectum, NOS"
                        }
                    ],

@mbaudis
Copy link

mbaudis commented Aug 23, 2022

@matentzn Feel free - as a first step ... I'm really from a different area & haven't found time to work on mapping formalities etc. But there has been a veeeerrrryyy long need for these mappings & this now seems like a good opportunity to pick it up again.

Order: doesn't matter. It is basically (icdom+icdot) <=> NCIT
Exclusivity: No, since we miss many mappings. Some entities do not exist in NCIT or not at the best corresponding level; for others, we just haven't done the best assessment w/ the newest NCIT. Relevant reviews had started just before COVID & stalled there (though the mapping service came then later).

IMO it would be worth a real project to do this systematically - happy to help! And to learn, how to best express such mappings formally correct ¯\_(ツ)_/¯.

@mbaudis
Copy link

mbaudis commented Aug 23, 2022

Also seems like a great opportunity to do this in conjunction with ICGC ARGO metadata work @mcourtot ?!

@mbaudis
Copy link

mbaudis commented Aug 24, 2022

@matentzn ... and FYI all term groups for NCIT / ICD-O we have are in response.results.termGroups through https://progenetix.org/services/ontologymaps/?filters=NCIT,pgx:icdo&filterPrecision=start

(2022-08-26: fixed wrong icdo partial)

@matentzn
Copy link
Collaborator

@mbaudis sorry to be daft, could you elaborate what this endpoint provides? What is a term group?

@mbaudis
Copy link

mbaudis commented Aug 26, 2022

@matentzn Not daft at all - this is just an ad hoc way to express equivalency of terms from different classification systems, w/o assuming a 1:1. I.e. for NCIt <=> ICD-O you will have two terms from the different ICD-O arms corresponding to a single NCIt term:

            {
              "id": "NCIT:C4017",
              "label": "Ductal Breast Carcinoma"
            },
            {
              "id": "pgx:icdom-85003",
              "label": "Infiltrating duct carcinoma, NOS"
            },
            {
              "id": "pgx:icdot-C50.4",
              "label": "Upper-outer quadrant of breast"
            }

Alternative mappings are expressed as separate groups, e.g. here (with a not-so-granular topography):

            {
              "id": "NCIT:C4017",
              "label": "Ductal Breast Carcinoma"
            },
            {
              "id": "pgx:icdom-85003",
              "label": "Infiltrating duct carcinoma, NOS"
            },
            {
             "id": "pgx:icdot-C50.9",
              "label": "Breast, NOS"
            }

For ICD-O T <=> UBERON there would just be 1:1 groups.

This is obviously an "internal format" and could be expressed much more systematically ...

CAVE: There is a lot of noise here - some earlier systematic work on cleaning up mappings has been blurred by A new samples w/ diagnoses sometimes not properly adjusted, & B great advances in the NCIt cancer codes since we did a bit of systematic work, last in early 2020 ... Therefore this is mostly for prototyping - e.g. how to ingest this conceptually - and needs cleanup & extension.

Another point: There are many 1:1 mappings between single NCIt and ICD-O M(orphology) terms where then the ICD-O M+T doublets would have to list all topography options (e.g. "Adenocarcinoma").

@matentzn
Copy link
Collaborator

Awesome thanks, got it. One problem I see with simply converting your mappings is that the term groups do not capture semantic precision, without which we cannot guess the appropriate semantic mapping relation (a prerequisite for SSSOM). For example, NCIT:C4017 ("Ductal Breast Carcinoma") seems to be a broad match of Infiltrating duct carcinoma, NOS. Maybe I am mistaken. Would you be confident to assign all mappings the skos:exactMatch mapping relation, which means that both concepts mean the exact same thing?

Secondly, I think while icdot->Uberon is definitely sssom material, icdom->icdot is not really. We were getting into the realms of knowledge graphs there. But just to think this issue through to the end: what is the relationship between icdom and icdot terms that co-occur in the same term group?

@cmungall
Copy link
Contributor

The formal way to represent the ICDO tuples is OWL expressions of the form pgx:icdom-8500 and has-location some pgx:icdot-C50.9

We have a general ticket on post-composition of concepts in #108.

One approach we could take here is to create an OWL file that materializes these expressions. They could have IDs that are essentially concatenations. We would publish a simple 3 column DOSDP TSV. Users would need to join to get the relationship between NCIT and each ICDO axis. Would could also material the join as SSSOM using predicates such as anatomic_aspect_has_exact_match, morphological_aspect_has_exact_match

@mbaudis
Copy link

mbaudis commented Aug 27, 2022

That's what I thought but wanted some confirmation ...

One approach we could take here is to create an OWL file that materializes these expressions. They could have IDs that are essentially concatenations:

  • icdom-85003~icdot-C50.9 <-> NCIT:C4194

... but a question for me would be if something like an "Adenocarcinome" w/o addtl.topographic information (NCIT:C2852 - Adenocarcinoma) which corresponds to ICD-O 8140/3 (pgx:icdom-81403) should be represented just as icdom-81403 or as icdom-81403~icdot-C80.9 (combining adenocarcimoma w/ the code for unknown primary site)1?

I, preferably, would do a "complete" representation of ICD-O 3 that would both include all unique M & T codes as well as all sane pairs. I.e. all primary codes and all post-compositions.

But This is more of a question towards how this should be done (from a non-ontologist). Precedence?

Also: Similarly expressed here...

Footnotes

  1. IMO that is different since it has information that the site isn't known...

@mbaudis
Copy link

mbaudis commented Aug 29, 2022

@matentzn Regarding UBERON <-> ICD topographies: This had been done by @qingyao and is documented at https://github.com/progenetix/icdot2uberon. OBO file & score etc. available - so this should be usable...

@mbaudis
Copy link

mbaudis commented Aug 29, 2022

I have created a map with concatenated codes which uses:

  • the simple pairs for all ICD-O morphologies from the NCIT-ICD-O M mappings provided by NCI here https://evs.nci.nih.gov/ftp1/NCI_Thesaurus/Mappings/ICD-O-3_Mappings/About.html
  • the different morphology~topography pairs we have in Progenetix and their assigned (by us) NCIt codes, together with a random example sample link & description for each
    • unique combinations here are keyed by icdom::icdot::ncit - therefore, several instances of the same ICD pairs may exist, with different NCIT codes; this will need work...

The file is hosted in our working byconeer repo which is a bit of a "procedurally maintained" place; please consider the table as test for further procedural discussions, not as a final product.

@matentzn
Copy link
Collaborator

@mbaudis As a representation like this is currently beyond the scope of SSSOM, we will need to circle back to this after #108 and #36 are addressed in some way.

There is quite a few things to consider when folding composed expression into any mapping vocabulary. Technically its not hard (as evidenced by your used of ~ in your mappings), but socio-technically it is not at all straight forward, because we need to ensure the sssom extension is general enough to cover for all future cases of complex mappings. This is tough, because no one can forsee all possible variations, but see #108 for an idea using template expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants