Skip to content
This repository has been archived by the owner on Jan 10, 2023. It is now read-only.

Python 3 support for SLING API #366

Merged
merged 4 commits into from
May 9, 2019
Merged

Python 3 support for SLING API #366

merged 4 commits into from
May 9, 2019

Conversation

ringgaard
Copy link
Contributor

Due to the end-of-life for Python 2, we are moving the SLING project to Python 3. Since Python 2.7 will no longer be maintained after January 1, 2020, including no security updates, we will not support Python 2.7 going forward.

I have migrated the pyapi library to use Python 3 and updated the Python modules in the SLING wheel accordingly.

There are still some parts that need to be migrated, notably the parser trainer (sling/nlp/parser/trainer), the wikibot (python/wikibot) and the category browser (sling/nlp/wikicat).

One major change form Python 2 to 3 is that strings are now Unicode and the new API supports Unicode in all appropriate places. However, this introduces some subtle ambiguities where we used strings for both binary and text.

Strings in the frames stores are returned as Unicode strings in the Python API, but this can lead to UnicodeEncodeError exceptions for invalid UTF8 strings. You can now use the get(role, binary=True) method to get the value of a string in a frame store as bytes.

The key and value returned from a record file are returned as bytes, so you have to manually convert these to strings, e.g. key.decode(), if you need these as strings. You can use both string and bytes when writing to record files.

Another subtle issues is the Document.text property. The indices into the document text, e.g. Token.begin and Token.end, are offsets into the UTF-8 encoded document text. To prevent any UTF8/Unicode round trip conversion issues, I have kept Document.text as bytes.

We use Python 3.5 as the default Python version for now, but the code seems to work for Python 3.6 and 3.7 as well. You can set PYVER in sling/pyapi/BUILD to 35, 36 or 37 to change the Python runtime version used for building.

I have made a setup.sh script to make it easier to set up the SLING development environment. Please run this to get your SLING environment updated to the new Python3-based setup.

@ringgaard ringgaard requested a review from anders-sandholm May 9, 2019 09:40
@ringgaard ringgaard self-assigned this May 9, 2019
Copy link
Contributor

@anders-sandholm anders-sandholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for upgrading the code to Python3. LGTM. Looking fwd to getting all of SLING upgraded to Python3.

* Remove the Python 2.7 SLING pip package if it is installed.
* Set up link to the SLING development enviroment for SLING Python 3 API.

The parser trainer uses PyTorch for training, so they it to be installed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they it -> it needs

@ringgaard ringgaard merged commit 2591d18 into google:master May 9, 2019
@ringgaard ringgaard deleted the pyapi3 branch May 9, 2019 16:09
@ringgaard ringgaard mentioned this pull request May 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants