-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for prefix declarations in OBO format #1102
Conversation
The value of the `replaced_by` tag in a OBO file should be an ID according to the OBO Flat File Format specification, so we treat it as such.
When the header frame of a OBO file contains `idspace` tags, use them to translate Prefixed-IDs (aka CURIEs) into full IRIs.
Process `consider` tags in a OBO file the same way as `replaced_by` tags.
…ely never useful.
oboformat/src/main/java/org/obolibrary/obo2owl/OWLAPIOwl2Obo.java
Outdated
Show resolved
Hide resolved
oboformat/src/main/java/org/obolibrary/obo2owl/OWLAPIOwl2Obo.java
Outdated
Show resolved
Hide resolved
@ignazio1977 @cmungall @gouttegd I think this is ready. |
I built a ROBOT jar which can be used to test the prefixes support: https://github.com/balhoff/owlapi/releases/download/prefixes-test/robot.jar |
results in a very large
and also has ID CURIEs translated to obo:...
I don't think there is anything unusual about this ontology - cl.owl exhibits the same behavior. |
If the ontology is converted from .obo, there is no problem I though that somehow the oboInOwl:id annotations were the issue, removing them gets rid of the giant |
In fact this seems to happen only if the source ontology is in RDF/XML. I don’t see anything wrong if the source ontology is in Manchester or functional syntax. |
The ID issue seems more precisely caused by the DefaultPrefixManager pm = new DefaultPrefixManager();
pm.setPrefix("obo:", "http://purl.obolibrary.org/obo/");
pm.setPrefix("CL:", "http://purl.obolibrary.org/obo/CL_");
IRI iri = IRI.create("http://purl.obolibrary.org/obo/CL_0001");
System.err.printf("IRI to CURIE: %s -> %s\n", iri, pm.getPrefixIRI(i)); This gives: IRI to curie: http://purl.obolibrary.org/obo/CL_0001 -> obo:CL_0001 Not sure why the prefix manager does not recognise the longer, more specific CL URL prefix. |
If I understand correctly, this is because the OWL API prefix manager does not simply search for the longest URL prefix in the prefix map. Before doing that, it first searches for a URL prefix that corresponds to the namespace of the IRI. If it finds one, it always uses that prefix without searching any further. The IRI namespace is determined completely independently of any prefix map, solely on the basis of some XML rules about names (it’s the longest prefix such that whatever remains after the prefix is a XML NCName). With Now because the RDF/XML file contains a |
@gouttegd thanks for looking into that. I actually had originally implemented my own search for the longest URL prefix, and then swapped in the prefix manager because it seemed proper to reuse code. But I much prefer always using the longest match! I wonder if I should put in special case handling of namespaces that are |
The ‘curies’ package for Java is available on maven and has an implementation of uri compression that correctly handles getting the longest match |
To be clear, the OWL API prefix manager does correctly handle getting the longest match. It’s just that it does so only if it does not find a prefix that matches the namespace of the IRI (where the namespace is defined as explained above) – if a match for the namespace is found, this takes precedence over everything else. Overall, it seems like the OWL API DefaultPrefixManager has been designed with the particular constraints of XML tags in mind. Nothing wrong with that, but I think that makes it unsuitable for our particular use case here. I’d suggest either reverting to the original custom longest prefix search (which was working just fine last time I tested this PR, if I recall correctly) or adopting |
I think the main problem @cmungall is seeing is the result of automatic insertion of
An annotation is injected into the OWL:
If there is an idspace defined like
And then the OBO writer doesn't know what to do with the
I'm not sure, but I don't think the |
@gouttegd @cmungall do you understand the point of this check for underscore? Trying to wrap up this PR and this is a sticking point: owlapi/oboformat/src/main/java/org/obolibrary/obo2owl/OWLAPIObo2Owl.java Lines 1722 to 1723 in 13a2d43
I don't want to append the |
I think this is for subset "IDs". |
I believe this is a (poor) way of checking whether the ID is a “non-canonical prefixed ID” (canonical prefixed ID don’t contain an underscore in the local ID part). Such IDs MUST be expanded with a '#' character between the URL prefix and the local ID, according to the OBO Flat File Format specification (§5.9.2 Translation of identifiers). I say a “poor way” because a “non-canonical prefixed ID” is defined (§2.5 Identifiers) basically as any prefixed ID that contains characters that are not allowed in a “canonical prefix ID” – so, any prefixed ID where the prefix part contains something else than alphanumeric characters and underscores and/or where the local part contains something else than digits. In other words, that check fails to identify many ”non-canonical prefixed IDs” (all those that do not contain underscores in their local part, but may contain any other character not allowed in a canonical ID), which are then translated as if they were canonical. Whether it would be a good idea to fix the parser (implement a proper check for “non-canonical prefixed IDs”) to make it really compliant with that section of the OBO Flat File Format specification, I am not sure. |
I have tested this jar and it looks like it injects labels for oio properties. While this is not so harmful and even mildly useful in some circumstances, it's generally undesirable and not supported by the spec. E.g. format-version: 1.4
ontology: comment
[Term]
id: X:1
comment: "This is a comment about term X:1." generates <?xml version="1.0"?>
<rdf:RDF xmlns="http://purl.obolibrary.org/obo/comment.owl#"
xml:base="http://purl.obolibrary.org/obo/comment.owl"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#">
<owl:Ontology rdf:about="http://purl.obolibrary.org/obo/comment.owl">
<oboInOwl:hasOBOFormatVersion>1.4</oboInOwl:hasOBOFormatVersion>
</owl:Ontology>
<!--
///////////////////////////////////////////////////////////////////////////////////////
//
// Annotation properties
//
///////////////////////////////////////////////////////////////////////////////////////
-->
<!-- http://www.geneontology.org/formats/oboInOwl#hasOBOFormatVersion -->
<owl:AnnotationProperty rdf:about="http://www.geneontology.org/formats/oboInOwl#hasOBOFormatVersion">
<rdfs:label>has_obo_format_version</rdfs:label>
</owl:AnnotationProperty>
<!-- http://www.geneontology.org/formats/oboInOwl#id -->
<owl:AnnotationProperty rdf:about="http://www.geneontology.org/formats/oboInOwl#id">
<rdfs:label>id</rdfs:label>
</owl:AnnotationProperty>
<!-- http://www.w3.org/2000/01/rdf-schema#comment -->
<owl:AnnotationProperty rdf:about="http://www.w3.org/2000/01/rdf-schema#comment"/>
<!--
///////////////////////////////////////////////////////////////////////////////////////
//
// Classes
//
///////////////////////////////////////////////////////////////////////////////////////
-->
<!-- http://purl.obolibrary.org/obo/X_1 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/X_1">
<oboInOwl:id>X:1</oboInOwl:id>
<rdfs:comment>"This is a comment about term X:1."</rdfs:comment>
</owl:Class>
</rdf:RDF>
<!-- Generated by the OWL API (version 4.5.26) https://github.com/owlcs/owlapi --> |
@cmungall This does not seem to be caused by this PR. I observe exactly the same behaviour with the current released version of ROBOT (1.9.5). |
Using robot prerelease from owlcs/owlapi#1102
Trying to merge the PR with the tip of the version4 branch results in the RoundTripTestCase#shouldRoundTripVersionInfo() test failing. The test loads a test ontology in functional syntax, which contains the default prefix declarations (OWL, RDFS, etc.), and converts it to OBO by calling the What I absolutely do not understand is why this test only fails upon merging this PR with the version4 branch. From what I understand it should also fail right here, before we even try to merge… |
Not sure what the correct behaviour of the For now, it is set to “preserve prefix mappings loaded from a previous serialization”, which automatically results in the typical default prefixes (OWL, RDF, etc.) being converted into IDSPACE tags. Should we actively prevent those prefixes from being converted? |
So it seems this has to do with the version of the SureFire plugin used to run the test suite. Jim’s Changing just the version of SureFire in Jim’s branch to use the same 3.2.5 as in the current |
Good to know, cheers.
Now, it's a good question whether the bug is real or in the test. I'll run
the same code without the junit machinery and check the results.
I.
…On Mon, 6 May 2024, 18:07 Damien Goutte-Gattat, ***@***.***> wrote:
What I absolutely do not understand is why this test only fails upon
merging this PR with the version4 branch. From what I understand it should
also fail right here, before we even try to merge…
So it seems this has to do with the version of the SureFire plugin used to
run the test suite.
Jim’s replaced-by-value-as-iri-v4 branch still uses version 2.20 (same
version as the one used in OWLAPI 4.5.26). The current version4 branch
(upcoming 4.5.27) uses SureFire 3.2.5.
Changing *just* the version of SureFire in Jim’s branch to use the same
3.2.5 as in the current version4 branch leads to the aforementioned test
failure.
—
Reply to this email directly, view it on GitHub
<#1102 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAT2AJKC4D64Z53XH2QYUGTZA6TDNAVCNFSM6AAAAAAYABVVQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJWGQYDIMRQGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
My view:
needs to exclude the standard prefixes, as you do not want them as idspace declarations (please confirm if I've misunderstood this). Implementing a simple list of namespaces to exclude from idispace declarations makes the test pass. Another failing test is to do with an idspace declaration not being includes ( I'm experimenting with these two fixes to see if that will allow the tests to pass. Please let me know if you think I'm going the wrong way about this. |
I’ve come to the same conclusion.
No, I think you’re right. @balhoff what do you think? |
Regarding the failure/non failure due to surefire, I'm thinking the problem might be this file is named Test rather than TestCase and it's Junit 5 (as all other test cases now are), perhaps the old surefire was missing those altogether. |
I got a clean build and have pushed up the commit, any feedback welcome. I'll check other existing PRs and bug reports, if there are any easy wins I'll add them, then I can release later today or tomorrow |
I built the latest ROBOT with the current tip of the (FWIW I ran into some Log4J-related issues when building ROBOT against the new OWLAPI, but this has nothing to do with this PR and is almost certainly a problem on the ROBOT side.) |
Thanks @ignazio1977! Where can I see your changes? My intent with the standard prefixes was that I didn't want them automatically injected into a fresh namespace manager, but if they were actually purposely declared then I would include them. Maybe it's an impossible distinction to work out. |
@balhoff I think it would be quite hard to tell apart automatic from standard namespaces, especially on existing ontologies. On freshly created ones the prefix map would be empty, so the presence of namespaces would mean manual insertion, but at renderer level there would be no way of knowing that. Latest patch set is a squashed commit: 1f4764d |
I'll check the current libraries for known vulnerabilities and proceed with the release. |
No known vulnerabilities, but quite abit of pain updating GPG keys for the new release. Now I'm stuck with credentials for sonatype, to release to central. My credentials don't seem to work. I'll try again tomorrow, if no joy I'll raise a ticket with sonatype. |
Release claims to have completed, after two or three unsuccessful attempts. Maven Central will take a bit to synchronise, I'll check tomorrow. |
Using robot prerelease from owlcs/owlapi#1102
Whats the significance of all these "conflicting files" in this PR? Is this PR merged / taken into account for the release? |
Not significant. Part of the PR was already merged, I've rebased and squashed locally before merging the fixed branch, that left GitHub with two conflicting branches. The PR was merged. The release doesn't seem to have made it to maven central though. I'll go check sonatype. |
Looks like yesterday my attempts managed to publish only one artifact, and today that artifact is stopping the release form continuing. I've incremented the version to 4.5.28 and released again. This seems to have been successful, I can see the release in sonatype. Not visible yet on maven central, that takes usually a couple of hours. |
I can see 4.5.28 on Maven Central now: https://central.sonatype.com/artifact/net.sourceforge.owlapi/owlapi-api/overview. Thanks! |
THAAAANK you @ignazio1977!! It works: ontodev/robot#1200 |
Can this be closed? |
This adds support for reading and writing prefix declarations in OBO format (
idspace
tags), and using them for expanding and compacting identifiers. It builds on initial work done by @gouttegd in #1072, moved to theversion4
branch. These changes also treat values ofreplaced_by
andconsider
tags as IRIs rather than strings.OBODocumentFormat
now extendsPrefixDocumentFormatImpl
.This should stay as a draft PR until some tests are added.