Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF/XML behaves weird if RDF is the default namespace #468

Closed
joernhees opened this issue Mar 4, 2015 · 5 comments · Fixed by #470
Closed

RDF/XML behaves weird if RDF is the default namespace #468

joernhees opened this issue Mar 4, 2015 · 5 comments · Fixed by #470
Labels
bug Something isn't working critical parsing Related to a parsing.
Milestone

Comments

@joernhees
Copy link
Member

Noticed this while trying to parse mozilla addon install.rdf files. They use xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" so they don't have to prefix things like <Description> or about with <rdf:Description> or rdf:about.

Parsing such a file the current rdflib will not understand the rdf basics anymore and introduce BNodes to capture the xml.

Examples:

Here we define the rdf namespace as commonly done:

In [64]: my_data = '''
<rdf:RDF
  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
  xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'
>
  <rdf:Description rdf:about="urn:mozilla:install-manifest">
    <rdfs:label>Example</rdfs:label>
    <rdfs:comment>This is really just an example.</rdfs:comment>
  </rdf:Description>
</rdf:RDF>
'''

In [65]: g = rdflib.Graph()

In [66]: g.parse(data=my_data)
Out[66]: <Graph identifier=N84e0685a0c7c4dd2be6a59ad9cbadac9 (<class 'rdflib.graph.Graph'>)>

In [67]: list(g)
Out[67]:
[(rdflib.term.URIRef(u'urn:mozilla:install-manifest'),
  rdflib.term.URIRef(u'http://www.w3.org/2000/01/rdf-schema#label'),
  rdflib.term.Literal(u'Example')),
 (rdflib.term.URIRef(u'urn:mozilla:install-manifest'),
  rdflib.term.URIRef(u'http://www.w3.org/2000/01/rdf-schema#comment'),
  rdflib.term.Literal(u'This is really just an example.'))]

this is ok.

let's use something other than rdf, e.g. fdr:

In [76]: my_data = '''
   ....: <fdr:RDF
   ....:   xmlns:fdr='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
   ....:   xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'
   ....: >
   ....:   <fdr:Description fdr:about="urn:mozilla:install-manifest">
   ....:     <rdfs:label>Example</rdfs:label>
   ....:     <rdfs:comment>This is really just an example.</rdfs:comment>
   ....:   </fdr:Description>
   ....: </fdr:RDF>
   ....: '''

In [77]: g = rdflib.Graph()

In [78]: g.parse(data=my_data)
Out[78]: <Graph identifier=N877052e9ac414a52841a8157ab590fb2 (<class 'rdflib.graph.Graph'>)>

In [79]: list(g)
Out[79]:
[(rdflib.term.URIRef(u'urn:mozilla:install-manifest'),
  rdflib.term.URIRef(u'http://www.w3.org/2000/01/rdf-schema#label'),
  rdflib.term.Literal(u'Example')),
 (rdflib.term.URIRef(u'urn:mozilla:install-manifest'),
  rdflib.term.URIRef(u'http://www.w3.org/2000/01/rdf-schema#comment'),
  rdflib.term.Literal(u'This is really just an example.'))]

this looks ok as well.

now let's make rdf the default namespace so we don't need to prefix everything with rdf (2nd line):

In [68]: my_data = '''
<RDF
  xmlns='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
  xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'
>
  <Description about="urn:mozilla:install-manifest">
    <rdfs:label>Example</rdfs:label>
    <rdfs:comment>This is really just an example.</rdfs:comment>
  </Description>
</RDF>
'''

In [69]: g = rdflib.Graph()

In [70]: g.parse(data=my_data)
Out[70]: <Graph identifier=Nc29cd64640354624874a897aea509b12 (<class 'rdflib.graph.Graph'>)>

In [71]: list(g)
Out[71]:
[(rdflib.term.BNode('N6da6e3fa36bc4a898e3ef99e67e29cc7'),
  rdflib.term.URIRef(u'about'),
  rdflib.term.Literal(u'urn:mozilla:install-manifest')),
 (rdflib.term.BNode('N6da6e3fa36bc4a898e3ef99e67e29cc7'),
  rdflib.term.URIRef(u'http://www.w3.org/2000/01/rdf-schema#label'),
  rdflib.term.Literal(u'Example')),
 (rdflib.term.BNode('N6da6e3fa36bc4a898e3ef99e67e29cc7'),
  rdflib.term.URIRef(u'http://www.w3.org/2000/01/rdf-schema#comment'),
  rdflib.term.Literal(u'This is really just an example.'))]

this looks as if rdflib doesn't understand rdf anymore, as it introduces a BNode above all :(

@joernhees joernhees added bug Something isn't working parsing Related to a parsing. labels Mar 4, 2015
@joernhees joernhees added this to the rdflib 4.2.1 milestone Mar 4, 2015
@uholzer
Copy link
Contributor

uholzer commented Mar 5, 2015

I think you always have to prefix the rdf:about attribute. This is because the parent element of an unprefixed attribute defines its meaning. In fact, an unprefixed attribute is always in the empty namespace.

@kmaglione
Copy link
Contributor

The RDF spec disagrees:

http://www.w3.org/TR/REC-rdf-syntax/#eventterm-attribute-URI

For attributes without namespaces, "if local-name is ID, about, resource, parseType or type, set to a string value of the concatenation of the RDF namespace URI reference and the value of the local-name accessor".

The parsers/rdfxml.py plugin has code to handle this case, in its convert method. It's broken, however, because it converts the attribute name to a URIRef too early, which breaks the lookups which rely on it being a string. The following change produces the correct behavior:

--- rdfxml.py   2015-03-04 12:02:02.555737327 -0800
+++ -   2015-03-05 14:16:19.582665301 -0800
@@ -235,9 +235,9 @@
         atts = {}
         for (n, v) in attrs.items(): #attrs._attrs.iteritems(): #
             if n[0] is None:
-                att = URIRef(n[1])
+                att = n[1]
             else:
-                att = URIRef("".join(n))
+                att = "".join(n)
             if att.startswith(XMLNS) or att[0:3].lower()=="xml":
                 pass
             elif att in UNQUALIFIED:

@joernhees
Copy link
Member Author

hmm, several thoughts on this:

here's an example install.rdf:
https://github.com/protz/GMail-Conversation-View/blob/master/install.rdf

let's put it through the W3C RDF validator:
http://www.w3.org/RDF/Validator/rdfval?URI=https%3A%2F%2Fraw.githubusercontent.com%2Fprotz%2FGMail-Conversation-View%2Fmaster%2Finstall.rdf&PARSE=Parse+URI%3A+&TRIPLES_AND_GRAPH=PRINT_TRIPLES&FORMAT=PNG_EMBED

Error: {W102} unqualified use of rdf:about is deprecated.[Line = 7, Column = 53]

Also looking at its output it becomes quite apparent that it does not interpret the install.rdf correctly (similar to rdflib it inserts a blank node above and doesn't seem to understand rdf:Description, rdf:about, etc.) @kmaglione, @magopian. So i'd really recommend to change the default install.rdf

There are several reasons we should still deal with this:

  1. it seems that jena's riot can handle it (with a deprecation warning though):

    $ riot install.rdf
    14:33:56 WARN  riot                 :: [line: 7, col: 53] {W102} unqualified use of rdf:about is deprecated.
    <urn:mozilla:install-manifest> <http://www.mozilla.org/2004/em-rdf#id> "gconversation@xulforum.org" .
    <urn:mozilla:install-manifest> <http://www.mozilla.org/2004/em-rdf#version> "2.9pre" .
    <urn:mozilla:install-manifest> <http://www.mozilla.org/2004/em-rdf#targetApplication> _:BX2D4e60610dX3A14bef4addc9X3AX2D7fff .
    
  2. @kmaglione's link above can actually also be found in a section of our own rdfxml parser:
    https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/parsers/rdfxml.py#L22
    To me this seems as if we wanted to have handling for this, but as he explains the lookup in convert fails as it tries to find URIRef('about') in a dict which only contains 'about'.

  3. Shockingly @kmaglione's fix (see fix RDF/XML problem with unqualified use of rdf:about #470) breaks our prominent doctest example in https://github.com/RDFLib/rdflib/blob/master/rdflib/__init__.py#L21 . Interestingly his fix is right and the test is just wrong, as there isn't 9 but only 4 statements in http://www.w3.org/2000/10/swap/test/meet/blue.rdf . Seems as if for a very long time rdflib wasn't actually able to correctly parse http://www.w3.org/2000/10/swap/test/meet/blue.rdf :(

@kmaglione
Copy link
Contributor

So i'd really recommend to change the default install.rdf

Unfortunately, there isn't such a thing. People copy install.rdf templates from all over the place. They mostly come from other add-ons, or are generated by frameworks.

@uholzer
Copy link
Contributor

uholzer commented Mar 7, 2015

@kmaglione: Thanks for clearing up and fixing the bug.

The W3C validator is rather strict to reject the install.rdf, however, validators are supposed to be strict. Also, I found the same problem in the install.rdf of a Firefox extension I wrote myself. So yes, people copy templates from all over the place, for example (sarcasm) from the Mozilla Developer Network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working critical parsing Related to a parsing.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants