-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Localizing Notebooks - continued #16
Changes from 8 commits
572242e
7b1a756
23130a4
8402ba4
bc4ec54
cab3960
52b7393
99450c8
6c00c53
bff04d5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
# Jupyter Notebook Translation and Localization | ||
|
||
## Problem | ||
|
||
There is currently no standard approach for translating the GUI of Jupyter notebook. | ||
This has driven some people to do a | ||
[single language translation for Jupyter 4.1](https://twitter.com/Mbussonn/status/685870031247400960). | ||
|
||
For information: previous attempts and related issues: | ||
|
||
- https://github.com/ipython/ipython/issues/6718 | ||
- https://github.com/ipython/ipython/pull/5922 | ||
- https://github.com/jupyter/notebook/issues/870 | ||
|
||
## Proposed Enhancement | ||
|
||
Use [Babel](http://babel.pocoo.org/en/latest/) | ||
to extract translatable strings from the Jupyter code, creating `.pot` files that can be updated | ||
whenever the code base changes. The `.pot` file can be thought of as the source from which all translations are derived. | ||
|
||
Translators and/or interested contributors can then use utilities such as [Poedit](https://poedit.net/) to create | ||
translated `.po` files from the master `.pot` file for the desired languages. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We did experiment with https://www.transifex.com/ipython/ipython-notebook/ which is online, and we hoped at some point to integrate this into the workflow. |
||
|
||
At install time, convert the translated `.po` into two runtime formats: | ||
* Convert to `.mo` which can be used by Python code using the gettext() APIs in Python, and can also be used by | ||
the i18n extensions in [Jinja2](http://jinja.pocoo.org/docs/dev/extensions/#i18n-extension) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what do you mean by "install", does this require code execution ? If so we should find an alternative way, as wheels cannot execute code at install time. |
||
|
||
* Convert to JSON using [po2json](https://github.com/mikeedwards/po2json) for consumption by the Javascript | ||
code within Jupyter. | ||
|
||
## Detailed Explanation | ||
|
||
The Jupyter notebook code presents a significant challenge in terms of enablement for translation, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A link to https://github.com/jupyter/notebook over the text "Jupyter notebook code" might help clarify that the document is explicitly talking about the classic notebook UI and not Jupyter Lab. Given the differences between the two implementations, I think it wise to focus on the former (if that is indeed your goal) and, based on lessons learned from that, tackle Lab localization as a separate task. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea, Peter. I added these links and a scope section that makes it clear that this proposal is only for classic notebook. See 6c00c53 |
||
mostly because there are multiple different types of source code from which translatable UI strings | ||
are derived. | ||
|
||
In Jupyter, translatable strings can come from one of three places: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "In Jupyter Notebook" |
||
|
||
1. Directly from Python code | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nitpicking, and not really important to address here, but we might need to clarify that this will not/cannot address translations of message coming from the kernel. IE |
||
|
||
2. As part of a Jinja2 HTML template, which is consumed by Python code | ||
|
||
3. From Javascript code | ||
|
||
For each of these three types, it is necessary to follow a few simple steps in order to allow the code | ||
to work properly in a translated environment. These steps are: | ||
|
||
1. Use an established API to identify those strings in the source code that are presented as UI | ||
and thus should be translatable. | ||
2. Provide the hooks in the source code that will allow the code to access translated strings | ||
at run time. | ||
|
||
Once these have been done, the [Babel](http://babel.pocoo.org/en/latest/) utilities provide an easy to | ||
use mechanism to identify the files in the Jupyter code base that contain translatable strings, and | ||
extract all of them into a single file ( a `.pot` file ) that is used as the basis for translation. | ||
|
||
Let's look at how this would look for each of the three types mentioned above: | ||
|
||
### Python files | ||
|
||
Some UI strings in the Jupyter notebook come directly from Python code. For these strings, the most | ||
widely accepted way to make them translatable is to use Python's gettext() API. When using gettext(), | ||
the developer simply needs to enclose the translatable Python string within _(), and add the appropriate | ||
calls in the Python to retrieve that string from the message catalog at run time. So for example: | ||
|
||
```python | ||
return info + "The Jupyter Notebook is running at: %s" % self.display_url | ||
``` | ||
|
||
becomes | ||
|
||
```python | ||
return info + _("The Jupyter Notebook is running at: %s") % self.display_url | ||
``` | ||
|
||
After this step is complete, then hooks must be put in place in order to tell Python to use gettext() | ||
to retrieve the string from the message catalog at runtime. This is simply a matter of adding | ||
```python | ||
import gettext | ||
``` | ||
at the top of the python code and then adding | ||
```python | ||
# Set up message catalog access | ||
trans = gettext.translation('notebook', localedir=os.path.join(base_dir, 'locale'), fallback=True) | ||
trans.install() | ||
``` | ||
|
||
Once this is complete, any calls to `_()` in the code will retrieve a translated string from | ||
${base_dir}/locale/**xx**/LC_MESSAGES/notebook.mo, where **xx** is the language code in use | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm assuming the locales will ship with notebook, not be external packages that users install. If so, it would be good to clarify this by defining There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. See bff04d5 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think allowing to ship translation as external package would be good. Otherwise releases might get hold on by missing translations. I think trying to pull that from |
||
when the notebook is launched. | ||
For example "de" for German, "fr" for French, etc. If no message catalog is available, or if | ||
the string doesn't exist in the catalog, then the string passed as the argument to _() is | ||
returned. | ||
|
||
|
||
### HTML Templates | ||
The majority of the language of the GUI is contained in [html template files](https://github.com/jupyter/notebook/tree/master/notebook/templates) | ||
and accessed via [Jinja2](http://jinja.pocoo.org). | ||
|
||
For the HTML templates, I recommend that we use the [Jinja2 i18n extension] | ||
(http://jinja.pocoo.org/docs/dev/extensions/#i18n-extension) that allows us to specify which portions of each template contain translatable | ||
strings, and is quite compatible with gettext() as described above. | ||
The extension contains some features that allow for things like variable substitution and simple plural handling, | ||
but in it's simplest form it uses tags `{% trans %}` and `{% endtrans %}` to delimit those strings that are translatable. | ||
Thus, the message at the top of the first screen you see when starting Jupyter looks like this in the template: | ||
|
||
```html | ||
<div class="dynamic-instructions"> | ||
{% trans %}Select items to perform actions on them.{% endtrans %} | ||
</div> | ||
``` | ||
|
||
After properly externalizing all the strings, hooking it all up to work with [Jinja2](http://jinja.pocoo.org) is a matter of | ||
loading the translations from the **SAME** message catalog as was defined in the Python example above, into Jinja2. | ||
Here's an example of how that would be done within Jupyter: | ||
|
||
```python | ||
env = Environment(loader=FileSystemLoader(template_path), extensions=['jinja2.ext.i18n'], **jenv_opt) | ||
env.install_gettext_translations(trans, newstyle=False) | ||
``` | ||
|
||
Note here that **trans** is the same variable initialized by `gettext.translation()` in the Python example earlier. | ||
|
||
### Javascript files | ||
|
||
For Javascript, there are no established APIs that can consume a compiled `.mo` file directly in the same way as gettext(). | ||
However, there is a library called [Jed](https://slexaxton.github.io/Jed/) that provides an API set similar to gettext(). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is Jed compatible with webpack and/or requirejs (both of which are used in Jupyter Notebook)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm no javascript guru, but I think I would be, since the example on its main page shows using it as an AMD module. |
||
[Jed](https://slexaxton.github.io/Jed/) uses JSON as it's input file instead of `.mo` files, but | ||
the good news here is that there are plenty of 3rd party utilities that can convert from `.po` into a form of JSON that | ||
can be consumed by Jed. The author of [Jed](https://slexaxton.github.io/Jed/) recommends | ||
[po2json](https://www.npmjs.com/package/po2json), which can be used | ||
either as a command line utility, or directly from Javascript. I suspect that the conversion from `.po` to either | ||
`.mo` for Python or Jinja2, or conversion to JSON for Javascript would be something we would want to do at install time. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably at build time to be precise - i.e. at least the wheel packages will contain pre-built |
||
|
||
Identification of strings for translation using this method is done similarly to the way we would do it for Python: | ||
by creating a function named `_()`, enclosing all translatable strings as an argument to this function, and then | ||
binding it to the `gettext()` API. The binding in Javascript would look like this: | ||
|
||
```javascript | ||
var i18n = new Jed(nbjson); | ||
var _ = function (text) { | ||
return i18n.gettext(text); | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this have to be done before any JS code uses translatable strings? If all UI construction has to wait for an asynchronous request to get the JSON file, I can see that causing problems. Maybe we can stick the JSON data into the page when we fill the template to avoid that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think with caching it should be ok, as we can query the file by its hash and make it expire in 99years. It depends on the size though. |
||
|
||
``` | ||
|
||
Then to externalize a string, just enclose it in `_()`, for example: | ||
|
||
```javascript | ||
if (selectable !== undefined) { | ||
checkbox = $('<input/>') | ||
.attr('type', 'checkbox') | ||
.attr('title', _('Click here to rename, delete, etc.')) | ||
.appendTo(item); | ||
} | ||
``` | ||
|
||
## Use of Babel to extract translatable strings | ||
|
||
[Babel](http://babel.pocoo.org/en/latest/) allows us to define a set of rules that define | ||
where all the extractable strings are (whether Python, HTML template, or Javascript), what the | ||
extraction methods are (i.e. `_()` for Python or Javascript, and `<% trans >`/`<% endtrans %>` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
for the HTML templates, and do all the necessary extractions to create a single `.pot` in one | ||
easy step. This definition is done by creating a `babel.cfg` file, which looks something like this: | ||
|
||
``` | ||
[python: **/**.py] | ||
[jinja2: notebook/templates/**.html] | ||
encoding = utf-8 | ||
[extractors] | ||
jinja2 = jinja2.ext:babel_extract | ||
[javascript: notebook/static/tree/js/*.js] | ||
extract_messages = $._ | ||
``` | ||
|
||
Once this is defined, message extraction can be performed using the [pybabel -extract](http://babel.pocoo.org/en/latest/cmdline.html) | ||
command. This should be done anytime the English source strings are modified. | ||
|
||
## Translation Files | ||
|
||
We haven't yet determined exactly which set of languages we plan to contribute to the project once | ||
the enablement work is complete, but this framework should allow any other interested parties to | ||
perform the translation and testing work at their discretion. | ||
|
||
|
||
## Pros and Cons | ||
|
||
Pros associated with this implementation include: | ||
* Using established APIs such as `gettext()`, which are stable and have been around for many years. | ||
* Message extraction can be done in such a way that a single file, or perhaps a small set of files | ||
can be delivered to a translator. | ||
* We don't have to have a different file format for each different technology used (Python / HTML template / Javascript ) | ||
* Much of the tooling needed has already been written, we just need to make use of it. | ||
|
||
Cons associated with this implementation include: | ||
* There are many external dependencies: | ||
[Babel](http://babel.pocoo.org/en/latest/), | ||
[Jed](https://slexaxton.github.io/Jed/), | ||
[po2json](https://github.com/mikeedwards/po2json). There may be some difficulties with the licensing. | ||
* Jupyter notebook developers have to be made aware of the proper ways to externalize any new strings | ||
that are added, and to perform message extraction via [Babel](http://babel.pocoo.org/en/latest/) | ||
whenever strings change. | ||
|
||
|
||
## Prototype - Proof of Concept | ||
I have created a **VERY** preliminary prototype of Jupyter notebook at (https://github.com/JCEmmons/notebook/tree/intl) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, awesome! 🍰 |
||
using the concepts presented here. Only a handful of the strings are externalized, but there are some | ||
from each of the 3 types, and I used [Poedit](https://poedit.net/) to create some hopefully reasonable translations | ||
into German, Japanese, and Russian. I haven't yet had time to add the necessary code to the Javascript | ||
in order to do dynamic loading of the JSON, but that would be the next step. | ||
|
||
## Interested Contributors | ||
@twistedhardware @rgbkrk @captainsafia @JCEmmons @srl295 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be wrong, but I think @Carreau's tweet was pointing to a translation of the release announcement, not the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I often translate at least the release announce in French.