Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter Notebook Translation and Localization #10

Closed
wants to merge 5 commits into from

Conversation

TwistedHardware
Copy link

Please let me know what do you think of this approach.

@takluyver
Copy link
Member

Thanks, I'm glad someone has finally got round to doing something about this.

Procedural note: you've modified the template file - can you instead make your proposals a new file and leave the template unchanged?

@TwistedHardware
Copy link
Author

Thanks. I guess I didn't quite understand the JEP instructions.


1. Subclass `web.StaticFileHandler` and call it `JupyterStaticFileHandler`
2. Overide `get()` function to make it render static files if they end with .js
3. Use `JupyterStaticFileHandler` instead of the `web.StaticFileHandler` in the RequestHander for static files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't especially like this bit - being able to serve static files with the regular StaticFileHandler has real benefits that we lose if that content becomes dynamically generated.

I wonder about integrating different languages when the JS is built, and shipping 'language packs' with the minified JS containing each language. I imagine we'd run into bugs with the JS version not matching the server, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I bet there are libraries to handle translations in the JS. Then we'd have static JS files, and we'd add a handler to the server to fetch the messages file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you regarding serving static files after rendering. The real issue is, it will be slower to render static files instead of serving them as-is. But this is the price that we have to pay for putting "human language" in JS files. The good thing is we are only rendering JS files and not all static files.

Regarding your second point, we will not ship minified JS files containing any languages except English. The files will be rendered just before they are served. The language packs are only (.mo) files and not JS files with other languages. Maybe I didn't quite get your second point so if that doesn't answer your concern please let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting alternative ways we might handle translated strings in the JS files. I think my last suggestion is roughly the same as what @blink1073 suggested.

Speed is part of the issue with turning static files into templates, but it also has more complex effects on things like caching: using a plain StaticFileHandler, tornado can be quite smart about sending HTTP headers so that the browser does the right thing. Anything we put in between that makes it more awkward, so it's nice to have as much as possible served from static files.

@rgbkrk
Copy link
Member

rgbkrk commented Feb 5, 2016

@takluyver There are standard JS APIs for simple ones, like Number and Money display:

> new Intl.NumberFormat().format(1000.0)
"1,000"

There's a more general project called formatjs which aligns with the ECMAScript Internationalization API Specification (ECMA 402) among others, adding support for string translations. At least in React land, there are some standard libraries like react-intl. This video has a great introduction to some of these APIs too.

@blink1073
Copy link
Contributor

Why not use the Config Jupyter REST API for the client instead of trying to put everything in a template? The whole translation table for a consumer could be retrieved at once.

@TwistedHardware
Copy link
Author

@rgbkrk @blink1073 This could be done from the front-end or the back-end. If you want to do this with no extra libraries or dependencies, Tornado has everything that we need and the modifications are going to be minimal to the templates and Tornado request handlers. I'm not saying doing the translation from the back-end is better than the front-end but would personally prefer debugging Python over javascript.

@sccolbert
Copy link

@TwistedHardware The UI is going to be moving away from server-side templates and relying much more heavily on dynamic client-side rendering. We'll certainly need an approach for localizing on the client-side.

@Carreau
Copy link
Member

Carreau commented Feb 5, 2016

For information: previous attempts and related issues:

ipython/ipython#6718
ipython/ipython#5922
jupyter/notebook#870

The reason why it previously did not go through is the lack of manpower, and people who did the implementation and translation.

@rgbkrk
Copy link
Member

rgbkrk commented Feb 5, 2016

The reason why it previously did not go through is the lack of manpower, and people who did the implementation and translation.

That to me is a sign that we didn't help scope the work into smaller chunks so we don't burn out contributors and leave it more open to multiple people coming in to help.

Will be done like this

```javascript
var text = "{{ _('You are using Jupyter notebook.<br/><br/>') }}";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would definitely be very problematic for caching, since now you have one variant of JS for each language supported, and places that utilize caching will need appropriate purging / vary for those as well. This also is mixing Python (Jinja?) and JS which makes it feel bad/sad. This also makes things very messy for pure JS generated messages.

There's a wide variety of JS localization helper libraries + quick ways of loading them (such as https://github.com/wikimedia/jquery.i18n which wikimedia uses), so I'd rather we use them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt Tornado translation templates will work for messages in Javascript files. I think this is going to have to be done with a Javascript translation utility.

In fact, as we move forward, I think significantly more of this UI will be produced by Javascript and not tornado templates, so perhaps an exclusively js solutionis the way to go.

It seems like i18n would be able to address messages that come from html templates as well as js messages.

@yuvipanda
Copy link

Awesome - thank you for starting this :)

@captainsafia
Copy link
Member

Great stuff, @TwistedHardware! Thanks for opening the PR. As someone who speaks a couple of different languages (albeit some badly) this is an exciting proposal. Feel free to add me under the Potential Contributors and I'll see how I can help out.

Also, can you add the links that @Carreau mentioned under a resource section in the proposal?

@TwistedHardware
Copy link
Author

@yuvipanda Thank you for your input. I think I can solve the cache problem with one of these approaches:

Disable caching static files if they were rendered.

self.set_header("Cache-control", "no-cache")

This will ensure the server will not maintain the files in cache and the browser will also not maintain it in cache.

But even better, we can actually cache these files (server-side and browser-side) with no problems if we label them by adding a suffix to the file names in HTML with the locale code

<script src="{{static_url("components/requirejs/require.js") }}.{{ locale }}" type="text/javascript" charset="utf-8"></script>

This will make the file name:
require.js.en_US

Or if you prefer not to cache them, add a parameter

<script src="{{static_url("components/requirejs/require.js") }}?locale={{ locale }}" type="text/javascript" charset="utf-8"></script>

This will keep the file name but will make the HTTP get look like this:
.../require.js?locale=en_US

Finally, regarding mixing Jinja with javascript, it is a very common practice. Let's only looks at examples from Jupyter code:

    <script>
      require.config({
          {% if version_hash %}
          urlArgs: "v={{version_hash}}",
          {% endif %}

You can see it is being used inside the script and even inside a string. I know it looks sad but think of it as a template and not a JS file and it will make more sense.

@TwistedHardware
Copy link
Author

@Carreau Thank you for the list. There were a functioning solution that people wanted to be even better and ended up failing under it own weight. How many languages would have been implemented if this was finalized back May, 2014?

@Carreau
Copy link
Member

Carreau commented Feb 6, 2016

That to me is a sign that we didn't help scope the work into smaller chunks so we don't burn out contributors and leave it more open to multiple people coming in to help.

Yes and No, the PR actually too big, and was committing internationalisation and partial localisation in 1 giant patch.

We agreed to separate internationalisation from localisation as sub-packages that could be released independently by each translation team independently of IPython (at the the time release). This also respond to the following question:

How many languages would have been implemented if this was finalized back May, 2014 ?

We don't know. Zero into the core, (well technically 1, english), but the point that you didn't need to implement it in the core to have it. Anyone could have published ipython-cs_cz, or ipython-fr_FR, and done ipython notebook --lang=fr to get it in french (or any other method to select lang)

I think it is important for traduction team to be able to improve translation after a release.
We even went through using transiflex to see wether pure "in browser I have nothing to install" translation process were possible.

The other reason is that maintaining internationalized code is more manpower, and might not be worth it if there are no actual maintained effort for translation. And I just want to make sure that the translation process have a constant stream of people working on it.

@TwistedHardware
Copy link
Author

@Carreau

We don't know. Zero into the core, (well technically 1, english), but the point that you didn't need to implement it in the core to have it.

I agree with you. It is also my approach to have these languages as separate packages installable as a nb-extinsions. Regarding the maintenance of the translation files, I think it should be controlled by the core team to some level (I know you are all so busy but hear me out). Most of these web-based crowd-sourced PO edits can be hooked to github to regenerate your PO files as you push new commits or merge pull requests. We can get a free PO editor and host it like nbviewer. This will allow the core team to have control over the process but not do it on their own. I'm sure many people would not mind helping with the translation if it was a simple web-based process where they can translate one or two words (or go for a coffee-powered bench translation night and finish a complete language).

Assuming this gets no traction what so ever, we will end up with no major changes to anything and English will be always automatically generated.

@jankatins
Copy link

I'm not a fan of translated technical apps (but I see the need...), so I would appreciate if this proposal includes an easy to use switch to disable all translations which results in a english UI no matter what my OS says....

@TwistedHardware
Copy link
Author

@JanSchulz Good point. If you don't actually install any language packages you will never have to deal with this. Actually, to get the system to work in any other language you will have to install a language package and change your configuration file.

I also hate it when I randomly get some websites in Arabic just because of my browser headers.

@minrk
Copy link
Member

minrk commented Feb 6, 2016

Thanks for picking this up, @TwistedHardware! I think language-packs is a great approach to allow sustainability. @JanSchulz we certainly should allow picking the locale, rather than forcing it to auto-detected. I expect there will be many who live in a non-English locale who will want to stick with the default UI. We can make sure that is the case.

@TwistedHardware
Copy link
Author

@minrk Thank you. Regarding the PO files, I will use command line to generate them for now and I'll signup for a web-based PO editor temperately until we setup our own PO web editor and setup github webhooks to work with it.

If we are in agreement for this approach, can we get this "accepted" so I can start coding and stop arguing.

@takluyver
Copy link
Member

I may be mistaken, but I don't think we have agreement on turning all the JS files into templates. I think we should look into ways to have a separate 'messages' handler that loads all the translated strings the JS needs, and then looking up localised strings from JS code.

@minrk
Copy link
Member

minrk commented Feb 6, 2016

I don't think we have agreement on turning all the JS files into templates

We certainly don't, and with the js package split, I'm pretty sure this will be impossible. I think this has to be a solution that does not involve templating in js files.

@TwistedHardware
Copy link
Author

@takluyver @minrk I actually didn't know about the JS package split. I can see why this will not work. Let me work on a work around.

@Carreau
Copy link
Member

Carreau commented Feb 6, 2016

I'm not a fan of translated technical apps (but I see the need...), so I would appreciate if this proposal includes an easy to use switch to disable all translations which results in a english UI no matter what my OS says....

Agreed on all points.

This will allow the core team to have control over the process but not do it on their own. I'm sure many people would not mind helping with the translation if it was a simple web-based process where they can translate one or two words (or go for a coffee-powered bench translation night and finish a complete language).

The point was not to keep control, but to be flooded with PR. I'd like the translation to be in separate repos/organisation/workflow, so that translators can add/remove contributor, and publish translation without having to go through us. But that's not too important and we can figure the exact process later.

Translation for JS/TS is definitively tricky. I guess that adding dynamic lookup of translation can be significant overhead and @sccolbert designed phosphor to be fast.

Maybe the translation process will need some rebuild of the JS for each language on server side.

@minrk
Copy link
Member

minrk commented Feb 6, 2016

I think if we can leverage a standard solution, such as tornado's localization or i18n, that's much more important than any optimal performance we can achieve by implementing our own localization tooling. I think if we do that, we've failed.

@yuvipanda
Copy link

https://www.mediawiki.org/wiki/Localisation has more overview information about i18n/l10n in mediawiki - both technical and process.

@takluyver
Copy link
Member

Thanks @yuvipanda for pointing to how MediaWiki does this. I think that example looks highly relevant to us.

  • What are the processes around updating these JSON files? Most translation tools I've seen work with the mo/po files. Does MediaWiki use a tool that edits the JSON directly, or is there a build step to convert from another format to JSON?
  • Are the JSON files requested whole by the Javascript for the translations it needs? Is the JS framework for using them custom, or a standardised module? If the JS is making an AJAX request to get the translations, does it delay anything that might display text until after the AJAX request returns, or does it have hard-coded fallbacks?

(Apologies if the link you provided already answers these questions - there's quite a lot of information there, and I'm hoping you can efficiently distil some of it into this thread :-)

@JCEmmons
Copy link

JCEmmons commented May 4, 2016

Hi everyone - John Emmons from the IBM Globalization Foundations Technologies Team. (and chair of the Unicode CLDR TC )....

I've been reviewing this proposal on behalf of my company, and we are interested in seeing this move ahead, and also have some amount of resources ( i.e. some time from yours truly ) to help make it happen. Seems there has been little or no discussion on this topic in the last few months, so I'm wondering exactly where things stand at the moment.

I am kind of surprised that there is no mention in this proposal of Jinja2 templates ( which appear to run the bulk of Jupyter's UI at the moment ), nor of the i18n extensions available there ( see http://jinja.pocoo.org/docs/dev/templates/#i18n-in-templates ). I"m no Python expert - I'm an i18n guy exclusively, but it seems to me that use of the i18n extensions in Jinja2 along with converting the messages that are just hard coded Python strings over to use gettext() should get us very close to a translatable UI at least for the "classic" UI.

There's been a lot of discussion here about various approaches for doing the enablement of the Javascript side. I would think that we would want to take a serious look at jQuery/globalize for this part, (https://github.com/jquery/globalize), which has a pretty robust API set, particularly for messaging including support for pluralization, and should be able to do everything we would need it to.

@minrk
Copy link
Member

minrk commented May 4, 2016

@JCEmmons thanks for joining the issue! I think we just don't have enough.

The tricky bit is that we produce UI strings both in Javascript and in Python with jinja, and we just don't have the experience to know how much we can share between the two. Ideally, we will be able to use a single translation source for both, so it doesn't matter which side of the code handled the string. It looks like Jinja's i18n + jquery globalize might let us do this. Does this seem to be true?

If you could help sketch out a concrete proposal for how to start:

  1. single, standard storage format for translations
  2. setting up jinja i18n for applying them Python-side
  3. setting up jQuery/globalize for applying them on the client-side

Then I think we can really get the ball rolling.

@JCEmmons
Copy link

JCEmmons commented May 6, 2016

I'll try to get this rolling, but it may be a tall order to get a single format that will work for all 3 ( Python, Jinja2, and JavaScript ). I'm doing some work locally here to see how much of this I can get working, and I want to talk to a few more folks on my end to see if jQuery/globalize is our best option, or if there is something else we should be looking at.

@jasongrout
Copy link
Member

@JCEmmons, thanks very much for looking into this!

@JCEmmons
Copy link

Also taking a serious look at Babel (http://babel.pocoo.org/en/latest/) - This package apparently is pretty mature and was designed to help localize Python apps. The nice thing is that it claims to be able to handle message extraction from all three types of sources ( Python, Jinja2 templates, and JavaScript ), and do it in a seamless fashion. This may be the "unified approach" that @minrk was looking for. I'm going to do some prototyping to see if I can get this to work on a small scale. If so, then this would be the way to go for Jupyter.

@takluyver
Copy link
Member

Babel looks very promising. It can certainly extract messages from JS for translation, but it looks like a bit of extra work will be needed to serve the translated strings to the browser for the JS to use them at runtime.

@minrk
Copy link
Member

minrk commented May 12, 2016

That's great! I don't have a problem with needing some tooling to adapt the translations into one environment or another. I'm mainly interested in having a single format / repository / mechanism for storing and contributing translations, regardless of where the strings ultimately get used. Translators shouldn't have to care where their strings are coming from.

@TwistedHardware
Copy link
Author

Hello Everyone,
I'm glad this is picking up steam again. Count me in any development you want to do. I will help with development once you agree on an approach. I can help with Python and JS.

@Carreau
Copy link
Member

Carreau commented May 12, 2016

Hello Everyone,
I'm glad this is picking up steam again. Count me in any development you want to do. I will help with development once you agree on an approach. I can help with Python and JS.

Thanks, just FYI we had some more meetings with some organisations that in the long term might also be interested in translation/will need translation.

@yuvipanda
Copy link

yuvipanda commented May 12, 2016

On Thu, May 12, 2016 at 5:53 PM, Matthias Bussonnier <
notifications@github.com> wrote:

Thanks, just FYI we had some more meetings with some organisations that in
the long term might also be interested in translation/will need translation.

Count Wikimedia as a big one of those :) (I could still also help get this
on translatewiki.net or some other similar crowdsourced translation
service)

Yuvi Panda T
http://yuvi.in/blog

@JCEmmons
Copy link

Did some more digging here. In looking at Babel, I'm able to determine that we can use Babel's tooling for our development purposes ( i.e. message extraction from .py, Jinja2 html templates, or .js ) without necessarily creating a dependency at runtime. However, as @takluyver mentioned previously, we've got to figure out a decent way to serve the proper JS at runtime. I'm looking at 2 possibilities right now, Jed ( https://slexaxton.github.io/Jed/ ) and also jQuery.Globalize ( https://github.com/jquery/globalize ). There are pros and cons to each.... If any of you have experience ( good or bad ) with either of these, I'd love to hear it.

@takluyver
Copy link
Member

AIUI, there are three pieces needed for the JS-side translations:

  1. Message extraction: it sounds like Babel can do this.
  2. A JS library to look up translated messages - Jed and jquery.globalize look like this piece (and no doubt there are many more). I don't have experience with this.
  3. Some Python server side code to read message catalogues, translate the data to JSON and serve it so that the JS has access to it. I guess we probably need to write this ourselves, but I hope it's also pretty straightforward.

@minrk
Copy link
Member

minrk commented May 20, 2016

If 3 is all we have to write ourselves, that seems pretty manageable.

@JCEmmons
Copy link

Hi @takluyver , I hadn't thought about what you're suggesting, but it may actually be cleaner, although would require a bit of coding.

1). For Python or Jinja2, Babel ( pybabel extract ) to create .pot -> Translators translate -> create .po -> compile and create .mo from .po ( probably at install time ). Python or Jinja2 consumes the .mo directly.

For the JS, one of these three:

2a. Babel ( pybabel extract ) to create .pot -> Translators translate -> create .po -> Create JSON as text from .po ( could be done as pre-install step ) > JavaScript loads the JSON text ( could be any mechanism that can load JSON text, I've used requirejs!text plugin before and it's pretty easy to use. At that point, use Jed or jQuery/globalize to access the messages accordingly. In this scenario, Jed seems to have a slight advantage since there is code out there already ( i.e. po2json, see https://github.com/mikeedwards/po2json ) that is designed to convert .po into JSON that is compatible with what Jed is expecting.

or I think what you're suggesting:

2b Babel ( pybabel extract ) to create .pot -> Translators translate -> create .po -> compile and create .mo from .po ( probably at install time ). Then we write some piece of code in Python that would read .mo, convert to JSON internally and serve it to the Javascript.

or a third one that I thought of,

2c). Babel ( pybabel extract ) to create .pot -> Translators translate -> create .po . Then use po2json's Javascript interface to parse the .po into something that Jed can use for rendering.

I just tried 2a last night (on my own machine...), and it seemed to work OK, although it's a bit clunky. 2c might be the cleanest if I can get it to work well.

@takluyver
Copy link
Member

2b was indeed what I was suggesting - I was assuming that we would end up with one .mo file per language, combining the strings that are in Python code, JS code and Jinja templates. But I'm open to any of 2a-c.

@srl295
Copy link

srl295 commented May 31, 2016

Thanks. I work with @JCEmmons and would like to move ahead on this. We are thinking 2a might be the best approach at this point. @TwistedHardware and all, what's the best way to get an updated JEP going here? Perhaps I can suggest some specific updates to the .md file in this PR?

@JCEmmons
Copy link

@srl295 is getting involved here primarily because I am going to be on a medical leave of absence starting Thursday this week, and we want to move forward. He's a great colleague and knows his stuff. If we can get the JEP finalized before I get back ( probably about a month ), then we can actually get moving with the implementation.

We had a discussion internally, and are thinking that 2a is the superior approach here, because we can manipulate everything we need at build time, rather than at run time.

@minrk
Copy link
Member

minrk commented May 31, 2016

@srl295 if you want to make patches to the @TwistedHardware's branch, that can update this PR. Alternately, we can merge this PR as a "provisional draft" (which I just made up) and you can make a new PR to this repo to update the proposal.

@srl295
Copy link

srl295 commented May 31, 2016

@minrk I'll work on a delta to @TwistedHardware’s branch, that will cover us either way.

srl295 added a commit to srl295/jupyter-enhancement-proposals that referenced this pull request Jun 10, 2016
srl295 added a commit to srl295/jupyter-enhancement-proposals that referenced this pull request Jun 10, 2016
srl295 added a commit to srl295/jupyter-enhancement-proposals that referenced this pull request Jun 10, 2016
@srl295
Copy link

srl295 commented Jun 10, 2016

Delta made at #16 - fyi @TwistedHardware

@TwistedHardware
Copy link
Author

Thank you @srl295

@takluyver
Copy link
Member

Closing this as superseded by #16. Thanks everyone for pushing the internationalisation story forwards.

@takluyver takluyver closed this Feb 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.