This repository has been archived by the owner on Jan 10, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Due to the end-of-life for Python 2, we are moving the SLING project to Python 3. Since Python 2.7 will no longer be maintained after January 1, 2020, including no security updates, we will not support Python 2.7 going forward.
I have migrated the
pyapi
library to use Python 3 and updated the Python modules in the SLING wheel accordingly.There are still some parts that need to be migrated, notably the parser trainer (sling/nlp/parser/trainer), the wikibot (python/wikibot) and the category browser (sling/nlp/wikicat).
One major change form Python 2 to 3 is that strings are now Unicode and the new API supports Unicode in all appropriate places. However, this introduces some subtle ambiguities where we used strings for both binary and text.
Strings in the frames stores are returned as Unicode strings in the Python API, but this can lead to
UnicodeEncodeError
exceptions for invalid UTF8 strings. You can now use theget(role, binary=True)
method to get the value of a string in a frame store asbytes
.The key and value returned from a record file are returned as
bytes
, so you have to manually convert these to strings, e.g.key.decode()
, if you need these as strings. You can use bothstring
andbytes
when writing to record files.Another subtle issues is the
Document.text
property. The indices into the document text, e.g.Token.begin
andToken.end
, are offsets into the UTF-8 encoded document text. To prevent any UTF8/Unicode round trip conversion issues, I have keptDocument.text
asbytes
.We use Python 3.5 as the default Python version for now, but the code seems to work for Python 3.6 and 3.7 as well. You can set
PYVER
insling/pyapi/BUILD
to 35, 36 or 37 to change the Python runtime version used for building.I have made a
setup.sh
script to make it easier to set up the SLING development environment. Please run this to get your SLING environment updated to the new Python3-based setup.