Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added support for nt in utf-8 encoding #447

Closed
wants to merge 2 commits into from

Conversation

Kinmarui
Copy link

@Kinmarui Kinmarui commented Jan 3, 2015

By default it will work like previously but now you can add encoding parameter when parsing file

usage

g is standard Graph
g = rdflib.Graph()

old syntax
result = g.parse( 'c:\\sample_ascii_file.nt', format="nt") # still works

new syntax
result = g.parse( 'c:\\sample_unicode_file.nt', format="nt", encoding="utf-8")

@joernhees
Copy link
Member

thanks for the attempt, but build fails on py2.6 & py2.7

aside from this i think an encoding flag like this is a bit confusing: (also see #400)

The NTriples parser should be able to automatically handle old style NTriples 1.0 ASCII "\u..." encoding as well as NTriples 1.1 UTF-8 encoding. As NTriples 1.1 is backwards compatible (#400 as well), the default should be accepting UTF-8 encoding and also resolving the "\u..." escapes, making the encoding flag useless.

You could maybe argue that one might want to check if a file is NTriples 1.0, but that case is the opposite of what you tried to implement here. Also, in that case i'd say it's rather part of the format than specifying an encoding, so the format string in the parse method should be something like format=nt1.0 instead of specifying an encoding.

As mentioned in #400 the story looks different for serialization: there one should have a possibility to specify that one needs a NTriples 1.0 file, default should be NTriples 1.1.

@Kinmarui
Copy link
Author

Kinmarui commented Jan 4, 2015

thanks for response,

encoding flag was only option that came to my mind because I didn't find a way to determine file encoding inside parser.
If you know how to determine encoding when I have file stream(BufferedReader) in parser please share even partial solution. Or can I safely use Arthur-VaisseLesteven suggestion posted in #400 because ascii is a subset of UTF-8 encoding, and I don't need any flag to handle ascii files in any special way?

I also fixed issue with python 2.x.
Anyway this is duplicate of #400 so you can close this thread and we can move there if you prefer.

@joernhees
Copy link
Member

yupp, the suggestion in #400 should work... it's in master as per #449

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants