added support for nt in utf-8 encoding #447

Kinmarui · 2015-01-03T15:22:06Z

By default it will work like previously but now you can add encoding parameter when parsing file

usage

g is standard Graph
g = rdflib.Graph()

old syntax
result = g.parse( 'c:\\sample_ascii_file.nt', format="nt") # still works

new syntax
result = g.parse( 'c:\\sample_unicode_file.nt', format="nt", encoding="utf-8")

joernhees · 2015-01-04T13:31:52Z

thanks for the attempt, but build fails on py2.6 & py2.7

aside from this i think an encoding flag like this is a bit confusing: (also see #400)

The NTriples parser should be able to automatically handle old style NTriples 1.0 ASCII "\u..." encoding as well as NTriples 1.1 UTF-8 encoding. As NTriples 1.1 is backwards compatible (#400 as well), the default should be accepting UTF-8 encoding and also resolving the "\u..." escapes, making the encoding flag useless.

You could maybe argue that one might want to check if a file is NTriples 1.0, but that case is the opposite of what you tried to implement here. Also, in that case i'd say it's rather part of the format than specifying an encoding, so the format string in the parse method should be something like format=nt1.0 instead of specifying an encoding.

As mentioned in #400 the story looks different for serialization: there one should have a possibility to specify that one needs a NTriples 1.0 file, default should be NTriples 1.1.

Kinmarui · 2015-01-04T14:30:49Z

thanks for response,

encoding flag was only option that came to my mind because I didn't find a way to determine file encoding inside parser.
If you know how to determine encoding when I have file stream(BufferedReader) in parser please share even partial solution. Or can I safely use Arthur-VaisseLesteven suggestion posted in #400 because ascii is a subset of UTF-8 encoding, and I don't need any flag to handle ascii files in any special way?

I also fixed issue with python 2.x.
Anyway this is duplicate of #400 so you can close this thread and we can move there if you prefer.

joernhees · 2015-01-05T10:33:22Z

yupp, the suggestion in #400 should work... it's in master as per #449

added support for nt in utf-8 encoding

03a08e1

fixed python 2 support

7bf6fd1

joernhees closed this Jan 5, 2015

joernhees added this to the rdflib 4.2.0 milestone Feb 19, 2015

ocefpaf mentioned this pull request Apr 3, 2015

Updated rdflib. ioos/conda-recipes#177

Merged

pyup-bot mentioned this pull request Nov 8, 2016

Update rdflib to 4.2.1 mytardis/mytardis#733

Closed

This was referenced Jan 16, 2017

Initial Update mozilla/addons-server#4303

Closed

Update rdflib to 4.2.1 mozilla/addons-server#4390

Closed

pyup-bot mentioned this pull request Jan 29, 2017

Update rdflib to 4.2.2 mytardis/mytardis#815

Merged

This was referenced Mar 16, 2017

Initial Update mozilla/amo-validator#510

Closed

Update rdflib to 4.2.2 mozilla/amo-validator#515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added support for nt in utf-8 encoding #447

added support for nt in utf-8 encoding #447

Kinmarui commented Jan 3, 2015

joernhees commented Jan 4, 2015

Kinmarui commented Jan 4, 2015

joernhees commented Jan 5, 2015

added support for nt in utf-8 encoding #447

added support for nt in utf-8 encoding #447

Conversation

Kinmarui commented Jan 3, 2015

usage

joernhees commented Jan 4, 2015

Kinmarui commented Jan 4, 2015

joernhees commented Jan 5, 2015