-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OBO parser: Treat value of replaced_by
tag as an IRI
#1072
Conversation
The value of the `replaced_by` tag in a OBO file should be an ID according to the OBO Flat File Format specification, so we treat it as such.
When the header frame of a OBO file contains `idspace` tags, use them to translate Prefixed-IDs (aka CURIEs) into full IRIs.
@gouttegd this is great! I suggest handling |
Process `consider` tags in a OBO file the same way as `replaced_by` tags.
@gouttegd - does your PR fix idspace expansion for ALL CURIEs, or only CURIEs that are used in replaced_by? |
@cmungall For now, only CURIEs that are used in I have another branch (not pushed anywhere yet) in which I test replacing CURIEs also in |
I did some tests and found that this change also deals with term IDs using non-OBO namespaces. PRO (
And terms:
With the old code this is parsed into:
With this PR it becomes:
That's really nice. How much do we want to try to handle roundtrip with these changes? If you read in that OBO file, and then write it out as OBO, you get a huge
and the term stanzas look like:
|
Do we care about round-tripping (OBO -> OWL -> OBO results in difference) |
To clarify:
So in this case, “round-tripping” a OBO file (converting from OBO to any other format and then back to OBO again) can and most likely will produce a different OBO file, where IRIs are either condensed into different CURIEs than the ones in the original file, or not condensed at all.
This is the “opt-in” bit mentioned by @shawntanzk above: Non-OBO CURIEs are only expanded to non-OBO IRIs if the editors explicitly asked for it by using a
So I guess the options are: A) Statu quo. Don’t expand CURIEs in B) Expand CURIEs whenever possible, but without using C) Expand CURIEs whenever possible, using prefix maps from D) Expand CURIEs whenever possible, using prefix maps from |
I also want to add that if we go for option D, this should allow to also expand CURIEs in If we have a way to always let the OBO writer know how to condense arbitrary IRIs into CURIEs, then this would not be a problem anymore. |
Let's talk a bit about option D before diving into xref, which is furiously discussed here.
I have a slide bias to 3 right now, but I am totally open to arguments in all directions. |
I am strongly in favor of @matentzn's option 2. If the shacl namespace vocabulary had existed when we wrote the spec we would have used it. I tried |
Proof-of-concept (mostly to check that I understand option 2 correctly, the code is ugly): The following OBO file:
gives the following OFN file, where the
When converting back to OBO, the SHACL annotations are used to (a) condense back the expanded IRI into its original CURIE form, and (b) restore the original
Is that what you both had in mind? |
perfecto!!!! |
Great. In that case I don’t see any obstacle against option 2. The first part ( The second part (SHACL to Of course, if it can be done with SHACL annotations, then logically it could also be done with any other annotation intended to store prefix mappings, so option 3 remains a possibility. But I don’t see the point of creating another annotation, when we already have the SHACL ones. |
Out of curiosity, how does:
work with owlapi? what is the type of |
It’s a It would be possible to use an explicit IRI instead (the SHACL specification explicitly allows to use either a IRI or a blank node), but since its only purpose is to “link” the |
Would be interesting to experiment how this would affect module extraction for example. Most likely it would not - since they are ontology annotations, you will have to copy them forward if you want to extract -> convert. Also, a default behaviour for namespace clashes is necessary when multiple ontologies with prefix maps are merged. |
About conflicting prefix maps: ontologies are merged after they have been parsed, so by the time two (or more) OBO-format ontologies with conflicting maps are merged, the CURIEs they contain will have already been expanded into IRIs — so the resulting merged ontologie will at least contain correct IRIs. The problem will be when the merged ontology is saved in OBO format. Then:
The case of conflicting namespaces is obviously the most problematic one, since it will lead to incorrect CURIEs. Not sure what the best course of action would be to deal with that case:
|
Independently of the risk of conflicts, I note that something interesting is happening when ontologies are merged with Let’s say we have the following SHACL annotations in ontology A:
And the following annotations in ontology B:
The merged ontology will contain:
Notice how only the ontology annotation from ontology A (the one linking to the Because they are not “linked” to an ontology annotation, they would not be found when we are looking for SHACL prefix maps, so they cannot cause a conflict (so the meaning of the |
@gouttegd - are you sure that is the behavior on merging with blank nodes? That would indicate a problem with the owlapi AFAIK I think it's good to think through behavior with these annotations. We need to define behavior in the case of conflicting prefixes. I suggest
However it should be left as a problem for the user to make sure the shacl namespaces are well behaved. Merging is one way to introduce problem states, but I don't think this should be treated as an obo format issue. We can have some lightweight checks and operations at the robot level. |
This is what I observe when merging the following file: test1.ofn
with this one (almost identical, but different prefix map): test2.ofn
with merged.ofn
Tested with ROBOT 1.8.4 and 1.9.0 (identical results). |
@gouttegd - apologies, I didn't see the two bnodes differed by one digit, I thought that the two bnodes from the different graphs were merged into one, which would have been a mistake, had it actually happened. I don't see any major problems here one thing that might be annoying is that if I merge N ontologies that all have: [ sh:prefix "ex"; sh:namespace "https://example.org/"] then I will get [ sh:prefix "ex"; sh:namespace "https://example.org/"]
[ sh:prefix "ex"; sh:namespace "https://example.org/"]
...
[ sh:prefix "ex"; sh:namespace "https://example.org/"] Things would get silly if these were included in extracts as we would end up with exponential growth of parasitic duplicative declarations but I don't think these would be included with extract by default this will be fixed by a laundry cycle through an id-space aware obo serializer. We can imagine having some lightweight robot commands for removing redundant bnodes, detecting bijectivity conflicts etc |
Very interesting and long discussion. Re merging of blank nodes, parsing two ontologies where the blank nodes have the same ids doesn't cause the in memory blank nodes to clash - blank node ids are remapped during parsing. The node on file will match the triples of the node in memory but the id will be recomputed. This is precisely to avoid clashes (think for example during import resolution; same concept during merges). The behaviour is optional and can be disabled - removing remapping can be useful in some situations, but it brings with it the risk of clashes and is not compliant with the specs, so by default remapping is on. |
@ignazio1977 It depends on how we answer @balhoff ’s question above: “How much do we want to try to handle roundtrip with these changes?” For now, we do not try to “handle roundtrip”. As a result, if an ontology in OBO format has a prefix declaration in its header:
and uses IDs with this declared
then this is handled correctly when parsing from OBO to OWL, but not in the other way around, because for now the OBO writer doesn’t know how to “CURIfy” the expanded IRI back to its original prefixed form. We already agreed that the solution to that is to save the prefix mappings in the ontology as SHACL declarations, and I have some code to do that in another branch, but it’s not ready to merge yet. |
@gouttegd what's the status of your SHACL prefix mappings code? Want to merge it into here? |
@balhoff Sorry, I haven’t found any time to work on it since my last message. My code is still nowhere near ready to be merged. |
FYI, I've rebased this PR onto the |
This PR is superseded by #1102. |
This PR is a proof-of-concept fix for the issue mentioned in INCATools/ontology-development-kit#642, i.e. the fact that the value of a
replaced_by
tag in a OBO file is translated into a literal string, instead of an IRI as expected.The fix has two parts:
replaced_by
clauses are translated by converting the value into an IRI, instead of leaving it as a string. This covers both the case where the value is already a full URL and the case where the value is a OBO CURIE, thanks to theloadIdToIRI
method. This does not, however, cover the case of non-OBO CURIEs.idspace
tags optionally found in the header frame, by feeding the specified prefix and corresponding base URL into theidSpaceMap
hash map, which is already used by theloadIdToIRI
method when expanding CURIEs. This covers the case of non-OBO CURIEs, by making sure they are expanded to the correct IRI instead of the defaulthttp://purl.obolibrary.org/obo/
.