-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider interoperation/ingest with Ontologymaps #222
Comments
leaving a comment to watch this issue |
See older discussions monarch-initiative/mondo#1148 (comment) Briefly: For progenetix.org we have done "a bit" of mapping between NCIt and ICD-O Morphology+Topography combinations. This mostly has been done to utilize the NCIt hierarchies in contrast to the unwieldy dual-arm ICD-O (which otherwise is rather well suited to capture cancer diagnoses ...) for the Progenetix cancer genomics resource (which contains >100k individual samples from literature etc.). However, our work partially was based on older NCIt cancer core codes & more refinement has been done there. Also, this covers only some hundred codes & needs a systematic extension. |
@mbaudis thanks for your input! I am looking the the ontology maps API, and wondering if we should pull and republish your mappings in a suitable mapping commons in SSSOM format? I am wondering right now about mapping precision. The API returns a tuple with three ids, and I wonder how they related? Are they all supposed to be mutually exact? Does the order in this tuple matter?
|
@matentzn Feel free - as a first step ... I'm really from a different area & haven't found time to work on mapping formalities etc. But there has been a veeeerrrryyy long need for these mappings & this now seems like a good opportunity to pick it up again. Order: doesn't matter. It is basically (icdom+icdot) <=> NCIT IMO it would be worth a real project to do this systematically - happy to help! And to learn, how to best express such mappings formally correct |
Also seems like a great opportunity to do this in conjunction with ICGC ARGO metadata work @mcourtot ?! |
@matentzn ... and FYI all term groups for NCIT / ICD-O we have are in (2022-08-26: fixed wrong icdo partial) |
@mbaudis sorry to be daft, could you elaborate what this endpoint provides? What is a term group? |
@matentzn Not daft at all - this is just an ad hoc way to express equivalency of terms from different classification systems, w/o assuming a 1:1. I.e. for NCIt <=> ICD-O you will have two terms from the different ICD-O arms corresponding to a single NCIt term:
Alternative mappings are expressed as separate groups, e.g. here (with a not-so-granular topography):
For ICD-O T <=> UBERON there would just be 1:1 groups. This is obviously an "internal format" and could be expressed much more systematically ... CAVE: There is a lot of noise here - some earlier systematic work on cleaning up mappings has been blurred by A new samples w/ diagnoses sometimes not properly adjusted, & B great advances in the NCIt cancer codes since we did a bit of systematic work, last in early 2020 ... Therefore this is mostly for prototyping - e.g. how to ingest this conceptually - and needs cleanup & extension. Another point: There are many 1:1 mappings between single NCIt and ICD-O M(orphology) terms where then the ICD-O M+T doublets would have to list all topography options (e.g. "Adenocarcinoma"). |
Awesome thanks, got it. One problem I see with simply converting your mappings is that the term groups do not capture semantic precision, without which we cannot guess the appropriate semantic mapping relation (a prerequisite for SSSOM). For example, Secondly, I think while icdot->Uberon is definitely sssom material, icdom->icdot is not really. We were getting into the realms of knowledge graphs there. But just to think this issue through to the end: what is the relationship between icdom and icdot terms that co-occur in the same term group? |
The formal way to represent the ICDO tuples is OWL expressions of the form We have a general ticket on post-composition of concepts in #108. One approach we could take here is to create an OWL file that materializes these expressions. They could have IDs that are essentially concatenations. We would publish a simple 3 column DOSDP TSV. Users would need to join to get the relationship between NCIT and each ICDO axis. Would could also material the join as SSSOM using predicates such as anatomic_aspect_has_exact_match, morphological_aspect_has_exact_match |
That's what I thought but wanted some confirmation ...
... but a question for me would be if something like an "Adenocarcinome" w/o addtl.topographic information ( I, preferably, would do a "complete" representation of ICD-O 3 that would both include all unique M & T codes as well as all sane pairs. I.e. all primary codes and all post-compositions. But This is more of a question towards how this should be done (from a non-ontologist). Precedence? Also: Similarly expressed here... Footnotes
|
@matentzn Regarding UBERON <-> ICD topographies: This had been done by @qingyao and is documented at https://github.com/progenetix/icdot2uberon. OBO file & score etc. available - so this should be usable... |
I have created a map with concatenated codes which uses:
The file is hosted in our working |
@mbaudis As a representation like this is currently beyond the scope of SSSOM, we will need to circle back to this after #108 and #36 are addressed in some way. There is quite a few things to consider when folding composed expression into any mapping vocabulary. Technically its not hard (as evidenced by your used of |
Especially this entry that we really need focusing on ICDO: https://progenetix.org/service-collection/ontologymaps/
@mbaudis
The text was updated successfully, but these errors were encountered: