-
Notifications
You must be signed in to change notification settings - Fork 112
Format of ids for GAOntologyTerm #165
Comments
Please be aware that we'll be working on this in a coordinated effort of DWG-MTT and CWG. From some discussions with Melissa Haendel I have the understanding that ontology + id + name + version + URI/CURIE seems to cover most concepts; but we want to do some model implementations. For human disease descriptions, there are also a number of classification systems which will have to be accommodated. |
OK, I will discuss this more with Melissa (@mellybelly) later today |
The MME is hoping to converge on a compatible representation. Curious if there are any updates? |
Not sure if there are updates from other WGs, but I think MME should continue to use CURIEs of the form HP:nnnnnnn, these are at least compatible with what major databases are using, and will the semweb stack (assuming default prefix declarations) |
Okay, thanks. Will do. |
IMHO specific implementations may define their more restrictive use of specific formats, e.g. it is fine for MME to restrict to CURIEs. In the general context, we can not restrict to use only specific ontologies. |
I agree, we should not restrict to specific ontologies, though we can certainly recommend and test using a given set. Ideally we can stick to CURIEs and standardize prefixs as we see lots of messes where this has not been done. |
CURIEs with standardized prefixes (as @mellybelly suggested) appear to be a viale solution for MME groups right now. Using JSON-LD sounds interesting though. Maybe this could be an optional part / use as part of the GA4GH schemas. |
This has been dormant since January. I'm closing this in 2 days unless there are objections. |
It's still not resolved. https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/ontologies.avdl still calls the OBI URI an id. Unless the terminology is resolved and a standard way of identifying ontology classes is specified everyone will choose a different convention and complex ID processing code will be required to interoperate. |
@cmungall Agreed. There was a presentation before the easter break from the team behind the "FAIR data" principles. I was really impressed with their work, and I wonder if they have come up with a way to resolve this issue. I will direct him towards this. |
For discussion: split into id and url in https://github.com/ga4gh/metadata-team/blob/master/avro-playground/metadata_redo.avdl |
I think there is still the possibility for large confusion here. It mixes the concept of a URL fragment with an ID; there are sometimes but not always the same thing. Also "ID defined by external ontology source" doesn't really mean anything in a lot of cases. OBI do not define IDs anywhere. Their currency is URIs. There is nothing here to prevent the scenario where people refer to the same OBI class as
|
Where is the authoritative list of CURIEs |
Define authoritative? Several large and equally reputable groups use different formats. |
Precisely. Is there one which will work well? If we can't refer to a set of these is it practical to use CURIEs - I am not against this, seems like the next obvious question. |
It's difficult for sure. I know EBI are currently working on the next itteration of their ontology lookup service. As part of this, the data is searchable. For HPO, the term IDs are for example "hpo:http://purl.obolibrary.org/obo/HP_0200117". They also store an "id_annotation" as "HP:0200117", but also a list of "short_form" terms, which includes "HP_0200117". As far as very useful sources of data, EBI comes up pretty near the top. They've been looking at this, and it looks like the result is there's no concensus. Currently, you need to use the format HP:0200117 to find the term by ID, but the current system is old, and a large update is coming. |
I'm increasingly thinking we need an Ontology task team / working group (possibly as a sub of the DWG). /cc @ga4gh/global-alliance-contributors |
this could be the perfect connector to ELIXR Barend Mons
|
Hi Ben, You have echoed a couple of recent conversations from other groups. This is a subject that covers so many different areas of work and is a real restriction on progress. There is a dedicated meeting at Leiden (on 9th June) to discuss it and I wouldn’t be surprised to see a new task team come out of that. Cheers
|
Ben - Happy that you find my group's resources useful and yes we are rewriting OLS. I am not convinced that splitting the ontology effort further is desirable - meta data has a large group of ontologists included already. More ontologists is not always a good thing. |
@helenp Sure! I'm not nessecerally suggesting more people are involved, but more specifically that it's made clear to everyone that the issues around ontologies is being looked at. Something like defining a list of CURIEs or a method of ontology term identification, that's then pushed out to all the working groups, would be hugely benificial. I felt that formalising the work on ontologies would also allow the other working groups to know where to go to get answers / spear discussion about topics, and have clear document products as a result. @D-lloyd I saw this on the agenda. I don't know what time has been allocated within that section, but I think it's key to discuss / hear what's happening with the OLS rewrite. I'm currently using one of the OLS tools in development to extract an ontology file to Solr for searching. A standardised way of importing data, not only is very benifical to others by making the use of ontologies easier, but also well help inform agreements on which of the multiple term id representations (CURIEs) the Global Alliance should suggest / push for. |
I have pointed @simonjupp at this thread (OLS dev) we should be able to do something that helps, Ben come and see us if it helps. |
Well, we have an ontologies task team in the CWG already, but I agree that there are technical specifications that need work up that are likely out of scope for the CWG, which is focused more on use cases. We should discuss this in Leiden. I prefer use of CURIES and it is likely that the GA4GH would want to have a registry of whatever is in use throughout all the schemas. It doesn't have to be about authority (as there are many overlapping "authoritative" sources) but rather what is required by any users/contributors of the GA4GH schemas to share their data. Perhaps we should consider some process by which anyone sharing data via GA4GH can register their CURIES in a shared repo. We'd need some inclusion/exclusion criteria and guidelines for contributors. There is also work to be done to specify where and when certain ontology sources should or could be used. This is the much harder part ;-). @nlwashington @kshefchek @cmungall perhaps we can do an example in G2P for how this might work with a diversity of disease and phenotype ontology sources, we are on the way towards that already. |
@helenp Already in contact with Simon! Very helpful! Using the java code in the new OLS project to load in ontology data to Solr. Waiting for his return to ask further question on how I can integrate this data! =] @mellybelly Another repository would most probably add to the confusion. Versioning is a major issue with ontologies! It's already a very messy problem. It looks like the OLS will allow you to look up terms based on multiple formats. Directing people to the new OLS should hopefully be really helpful. I understand it will be updated nightly, but of course one can always fix a versioning and run their own solr install. |
@Relequestual - I'll answer in more detail later, but answering your original question and the HP example. In a semantic web toolchain the URI is canonical. For any OBO library ontology, there is an authoritative deterministic way to map this to an identifier in OBO format, which is what is used by all bioinformatics dbs not based on a semweb tool chain, and the id would be
More later... |
Not much time to write, but one of the distinctions we discussed needing to make is regarding the semantics for when multiple terms are chosen. For example, if two disease terms are indicated, it would likely mean that the patient has two diagnoses (or two family members with them, or whatever the context). This is distinct from assigning two terms from different vocabs as alternatives, as @mbaudis indicates above. Then there are the semantics that might already be present between two terms indicated in this way. We also agreed that some uses of ontologyTerm would specify a single entity (e.g. you can only have one geneticSex), whereas others would expect an array (a set of phenotypes). Everyone largely seemed to agree to use of CURIES and a CURIE map Also, @mbaudis @diekhans and @nlwashington and I discussed compliance testing that would leverage OWL reasoning as part of the reference implementation to ensure best use. Use of non-registered CURIES would go through a registration request to check appropriate usage (more than constraining people) or could be a local extension. I think we'd largely want to discourage local extensions, but some good documentation about how to best include and document them could go a long way. The compliance suite would also check for consistent ID formats and unregistered CURIES, pointing people to the registration page or make alternative suggestions based on existing OWL file equivalencies/xrefs. |
@helenp OK. I'd be interested to see the minutes from this meeting as I'm not part of the Metadata TT. |
Hi Chris, Since we area creating a data exchange API, we need to be able As you suggest, creating validation programs is a great solution. Cheers, Chris Mungall notifications@github.com writes:
|
Hi Helen, Thank you for the minutes, which are very helpful in getting me caught up I was unaware of the MTT minutes, which I think many would find very Thank you, On Tue, Jun 16, 2015 at 5:20 PM, Helen Parkinson notifications@github.com
|
@pgrosu - there's a lot of process documentation in the minutes. The MTT is now ticketing all relevant items and better documenting these so that they are standalone. My preference is to use tickets as they are cleaner. |
@helenp Ah, makes sense. Would these tickets be through https://github.com/ga4gh/schemas/labels/MetadataTaskTeam Knowing the method would able people to quickly get a glance on the status, and not fall behind on the progress. Thanks, |
@pgrosu Labels. We have done some clean up |
@helenp Super, thank you :) |
On Tue, Jun 16, 2015 at 12:35 PM, Chris Mungall notifications@github.com
Yes, I'll be on the call. But I have a lot of homework to do to catch up
In the long history of humankind (& animal-kind too) those who learned to |
@mbaudis you mentioned a few days ago "Still, there hasn't been implementation work on the exact format of the ontologyTerm object; everybody is welcome, regarding the notes above ...". in creating the FuGE standard a few years back, we thought long and hard about this. the UML model we came up with was this: |
Aside: not sure what the GA4GH protocol is here but it feels like we should be spinning new issues here? @mdmiller53 - thanks for sharing the doc. I'm not sure it precisely aligns to GA4GH requirements (though we may all have different ideas about what these are). The typical usage would be to represent an ontology class (rather than property or individual, if by individual you mean something like owl individual). There are situations where we may want to denote a property (aka relation) too (for example, in a generic functional annotation model). There may be situations where we want to model composition of ontology terms (see this UML ) but this is probably best discussed as a separate issue from the format of the class references. |
Hi I have been traveling like crazy (could not even attend the meetign in my home town) and apologize for not being in many calls lately, but I suppose that once Beacons go 'ontology' we adhere to FAIR and ELIXIR interop. developments? Barend Mons
|
Hi All
mdmiller53 wrote:
|
@antbro Can we keep to the issue of the topic please? =] Do by all means create a new issue! |
Was it really so off topic? (any more than other posts in the thread,
|
@antbro What you refer is not part of the ontologyTerm object itself, but could be defined through some kind of "evidence" objects. This is under development in G2P, I think, but should be moved "mainline". Can we maybe start this over, through a PR against https://github.com/ga4gh/schemas/blob/master/src/main/resources/avro/ontologies.avdl ? I have moved the metadata implementations to this branch. |
Agree with @mbaudis and @Relequestual @antbro please review G2P schema that was recently accepted and see if this addresses your questions sufficiently Please make tickets there for gaps/issues, much appreciated. |
New test for effect search, update schemas
After having worked with the existing Ontology model for a while, we've proposed some small changes that should close this issue. #694 |
Current docs state:
This is fairly open ended and we can imagine confusion and inconsistent usage here.
For the ontologies currently referenced in the metadata schema, e.g.
Terms are typically referenced in two ways.
URIs/IRIs
For many biological ontologies these are typically obolibrary purls, which follow:
See: http://www.obofoundry.org/id-policy.shtml
OBO-Style identifiers
Typically follow the form
Options
Option 1 is probably the conceptually simplest. Option 2 is not very future proof as it doesn't allow open-ended expansion to any ontology out there on the semantic web. Option 3 is probably overkill.
I would advocate option 4. To elaborate, we allow the field to contain either a URI or a CURIE (https://en.wikipedia.org/wiki/CURIE see also http://www.w3.org/TR/curie/), without the brackets. We then assume the existence of a number of implicit qname prefixes. E.g.
This could potentially live in a separate JSON-LD context file.
This is also consistent with the translation in the OBO-Format spec: http://oboformat.googlecode.com/svn/trunk/doc/obo-syntax.html#5.9.1
I would be happy to branch and make a pull request, but I thought it worthwhile polling for opinions. Need this to be future-proof, consistent - but also not over-engineered.
The text was updated successfully, but these errors were encountered: