Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate ICDO mappings from progenetix #1148

Open
cmungall opened this issue Feb 9, 2020 · 22 comments
Open

Incorporate ICDO mappings from progenetix #1148

cmungall opened this issue Feb 9, 2020 · 22 comments

Comments

@cmungall
Copy link
Member

cmungall commented Feb 9, 2020

Currently we have ICDO mappings souirced fro NCIT, of the form `ICDO:nnnn/n'

progenetix has more complete mappings
https://github.com/progenetix/ICDOntologies/tree/master/current

TODO: determine consistency of these two

We will likely want to map each cancer term in mondo to a pair icdot/icdom as in the above.

E.g.

https://github.com/progenetix/ICDOntologies/blob/master/current/icdom-84303%2Cicdot-C34.9.yaml

equivalents:
- label: Lung mucoepidermoid carcinoma
  id: ncit:C45544
examples:
- labels: Lung mucoepidermoid carcinoma [cell line EPLC-272H]
input:
- label: Mucoepidermoid carcinoma
  id: icdom-84303
- label: Lung and bronchus
  id: icdot-C34.9

This could be formalized by an equivalence axiom between precomposed mondo class and class expression icdom AND disease-has-location some icdot

However, for convenience we may want to make simple xrefs to a conjoined string and have this resolve to URLs like https://progenetix.org/api/ncitcodes/icdom-85003,icdot-C50/

todo: determine license of progenetic mappings

cc @mbaudis

Map each mondo class with an ncit equivalent to a icdot/icdom combo, see https://github.com/progenetix/ICDOntologies/tree/master/current

What is the license of these mappings?

@mbaudis
Copy link

mbaudis commented Feb 11, 2020

Nice summary! Regarding the licensing: The primary icdo[mt]-.... codes are derived/correspond to ICD-O 3 terms. Current link (IACR). Additional information (disease interpretation / ICD-O code assignments) come from the WHO Classification of Tumours (WHO Blue Books).

Our mappings can get any open terms.

@mbaudis
Copy link

mbaudis commented Feb 11, 2020

Pinging @paulacarrio who did most of the mappings.

@cmungall
Copy link
Member Author

We have identified cases where the mapping is too general e.g
https://github.com/progenetix/ICDOntologies/blob/master/current/icdom-80003%2Cicdot-C44.9.yaml

@nicolevasilevsky will examine all 512 mappings. We will first need to turn this into a google sheet. @paulacarrio has scripts in the tools folder to turn into ods, but we could also do a quick yaml2tsv and load that into a sheet

@nicolevasilevsky
Copy link
Member

Note mappings as:

  • ok
  • too general
  • wrong

@cmungall
Copy link
Member Author

I made a spreadsheet:

https://docs.google.com/spreadsheets/d/1_6ZX715m3A0Iy9pmPjD_7OtmAAIuxAutFZEuA5XXNdY/edit#gid=356020133

can you move to mondo drive nicole?

this was the script I used:

import yaml
import click

@click.command()
@click.argument('input', nargs=-1)
def create_tsv(input):
    for f in input:
        #print(f)
        with open(f, 'r') as f:
            obj = yaml.load(f, Loader=yaml.SafeLoader)
            if len(obj['equivalents']) > 1:
                raise "too many equivalents"
            x = obj['equivalents'][0]
            icdom = ('_','_')
            icdot = ('_','_')
            for i in obj['input']:
                label = i['label']
                id = i['id']
                if id.startswith('icdom'):
                    icdom = (id, label)
                elif id.startswith('icdot'):
                    icdot = (id, label)
                else:
                    raise "unknown ID type"
                vals = [icdom[0], icdom[1], icdot[0], icdot[1], x['id'], x['label']]
            print("\t".join(vals))
            #print(obj)

if __name__ == "__main__":
    create_tsv()

@nicolevasilevsky
Copy link
Member

@mbaudis I am half way done with reviewing the mappings. I'll keep going but do you want to review what I have done so far?

You could highlight cells with any issues or create a new column with notes for issues

@mbaudis
Copy link

mbaudis commented Feb 26, 2020

@nicolevasilevsky Great - I had done some "random" annotations but will switch to systematic notes.

@mbaudis
Copy link

mbaudis commented Feb 28, 2020

@nicolevasilevsky line 140 ... break for today; pls. have a look.

@nicolevasilevsky
Copy link
Member

@cmungall said this should be lower priority for me to review these mappings. @mbaudis do you have enough feedback from me, at the moment?

Do you know how to make new term requests to NCIt?

@mbaudis
Copy link

mbaudis commented Dec 10, 2020

@nicolevasilevsky AFAIK @paulacarrio & @qingyao have submitted a list of term requests. But we have now a icdom + icdot <-> NCIt service online, w/ GH repo etc.:

There is also some UBERON <-> ICD-O topography.

We actually would much appreciate if:

  • this is being re-used
  • becomes interactive w/ contributions for those mapping types - right now we have only subsets which are driven mostly by our own data

  • updated 2022-08-29 to fix changed urls

@nicolevasilevsky
Copy link
Member

is there any action still needed for this?

@nicolevasilevsky
Copy link
Member

I am going to close this as there hasn't been any response in a couple months. Please reopen if still needed, thanks!

@mbaudis
Copy link

mbaudis commented Dec 22, 2021

@nicolevasilevsky @cmungall Stale item, but the issue still remains that there is no good representation of ICDO T+M pairs w/ the corresponding NCIT terms.

NCIT now covers most of our combinations (+1), but a direct mapping does not exist anywhere beyond our resource (?). So no idea how to go about this; as indicated, happy to provide/extend the mappings if someone has a way to integrate them in a lookup (?) service or annotation for term equivalence.

@nicolevasilevsky
Copy link
Member

I'll bring this up with Chris at one of our Mondo calls in the new year.

@mbaudis
Copy link

mbaudis commented Dec 22, 2021

Great - thanks! Happy to get looped in if needed.

@nicolevasilevsky
Copy link
Member

Hi @mbaudis
I talked to Chris about this and he said we could probably an OWL version of ICD-O with Koza or LinkML (similar to the way we did this with monochrome) and host it on OLS. We have a lot of other competing priorities though, so I'll come back to this in a couple months and see if our development team can work on this. Thanks!

@mbaudis
Copy link

mbaudis commented Jan 17, 2022

@nicolevasilevsky Great - please keep me posted; I'd like to help... And preferably LinkML :-)

@mellybelly
Copy link
Collaborator

see also mapping-commons/sssom#222 - can we bump the priority on this?

@nicolevasilevsky
Copy link
Member

@mellybelly I'll bring this up with Nico on a future call.

@nicolevasilevsky
Copy link
Member

@mellybelly should we bump the priority on this in favor of other work, like ICD10 mappings, NCIT mappings, MedGen mappings, etc?

If @mbaudis can provide a sssom mapping file instead of the spreadsheet, we can easily add these into Mondo. However, if we need to review all the mappings and create a file ourselves, it will be a big lift and we'll to need deprioritize other work.

@mbaudis
Copy link

mbaudis commented Aug 26, 2022

@nicolevasilevsky I don't have the resources to provide a sssom'd version; but more than that I wouldn't know how to express the ICD-O pairs correctly. Internally we just concatenate them to get unique keys (icdom-85032::icdot-C50.9 ...) but is there a way to do this in the sssom schema? (real not my area...)

Also: IMO many of the mappings could be done better w/ the current version of NCIt.

So:

  • I see this as a basis for some collaborative project
  • an obvious target (besides our data) would be to do this for TCGA (data existing) and ICGC ARGO (data emerging ... @mcourtot ...)
  • we can easily provide a subset of sane mappings for prototyping/setting this up
  • I'm not sure how you'd handle the high level mappings where you have the 1:1 correspondence between single ICD-O topography or morphology codes (e.g. "Adenocarcinoma, NOS - 8140/3" == "Adenocarcinoma - NCIT:C2852"); in our concept which requires ICD-O pairs we'd just map all the topography options we encounter - but that is just one of many conceptual points I'm not knowledgeable about ¯\_(ツ)_/¯

@nicolevasilevsky
Copy link
Member

Thanks @mbaudis. Let me discuss with Nico and we can come up with a plan to move forward. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants