Skip to content
Pierre Quentel edited this page Aug 27, 2018 · 69 revisions

Translation from Python to Javascript

A typical Brython-powered HTML page looks like this :

<html>

<head>
<script src="/path/to/brython.js"></script>
</head>

<body onload="brython()">
<script type="text/python">
...
</script>
</body>

</html>

brython.js is the minified concatenation of individual scripts that handle specific tasks, either at compile time (generation of Javascript code from Python source : done in py2js.js) or at run time (for instance, implementation of all Python built-in objects, eg in py_list.js for list and tuple, py_string.js for str, etc). The development is done on the individual scripts ; brython.js is generated by the script /scripts/make_dist.py.

brython.js exposes 2 names in the global Javascript namespace : brython (the function called on page load) and __BRYTHON__, an object that holds all internal objects needed to run Python scripts.

The function brython() inspects all the scripts in the page ; for those that have the type text/python, it reads the Python source code, translates it to Javascript, and runs this script by a Javascript eval().

If the <script> tag has an attribute src, an Ajax call is performed to get the content of the file at the specified url, and its source is converted and executed as above.

The translation to Javascript takes the following steps :

  • a tokenizer reads the tokens in the source code and passes them to an automat that builds an abstract tree for the code, or raises SyntaxError or IndentationError.
  • this tree is transformed (nodes are added or modified) to translate some single Python statements in a number of Javascript statements.
  • if the debug level is set, additional nodes are added to update an internal object that is set to the current script name and line number.
  • the transformed tree supports a method to_js() that returns the Javascript code.

All this is done in the script py2js.js :

  • function brython() is the last one in the script
  • translation is done by function py2js()
  • this function calls the tokenizer : function tokenize()
  • the tokenizer builds a tree made of instances of the class $Node
  • an instance of $Node is created for each new statement, and a context is created for the node
  • new tokens generally change the state of the context by a call such as
context = transition(context, token_type, token_value)
  • context is an instance of one of the classes defined in the script, whose name starts with $ and ends with Ctx : for instance, when the tokenizer encounters the keyword try, the function transition() retuns an instance of $TryCtx

Builtin objects

__BRYTHON__ has an attribute builtins that stores all the built-in Python names (classes, functions, exceptions, objects), usually with the same name : for instance, the built-in class int is stored as __BRYTHON__.builtins.int. Only names that conflict with Javascript naming rules must be changed, eg super() is implemented as __BRYTHON__.$$super.

Implementation of Python objects

Objects implemented as native Javascript objects

Python strings are implemented as Javascript strings.

Python lists and tuples are implemented as Javascript arrays.

Python integers are implemented as Javascript numbers if they are in the range of Javascript "safe integers", ie [-(2^53-1), 2^53-1] ; outside of this range they are implemented with an internal class.

Python floats are implemented as instances of the Javascript Number class.

Other types

All other Python classes (builtin or user-defined) are implemented as a Javascript object that holds the class attributes and methods.

A minimal implementation of a class is done by such code :

$B.make_class = function(name, factory){
    // Builds a basic class object

    var A = {
        __class__: _b_.type,
        __mro__: [object],
        __name__: name,
        $is_class: true
    }

    A.$factory = factory

    return A
}

factory is the function that creates instances of the class. The instances have an attribute __class__ set to the class object.

The class dictionary has an attribute __mro__, a list of the classes used for attribute resolution on instances of the class.

Functions

Python functions are implemented as Javascript functions, but there are many differences, both with function definition and function calls.

To define a Python function, its parameters can be specified in many ways : named parameters, eg def f(x):; with default values : def f(x=1):; holders for additional positional and keyword arguments : def f(*x, **y):

A Python function can be called with positional arguments : f(2), keyword arguments : f(y=1), packed iterables : f(*args) and packed dictionaries : f(**kw).

Javascript also has a variety of ways to handle parameters : named parameters : function f(x), and a way to handle arguments with the object arguments that can be used inside the function, more or less like a list : function f(){var x=arguments[0]}. Function calls can be done with named arguments : f(x), or with the methods call and apply.

For function calls, the arguments passed to the Python function are translated this way :

  • positional arguments are kept unmodified
  • packed tuples are unpacked and added to the positional arguments
  • all keyword arguments (including packed dictionaries) are grouped in a single argument put at the end of the argument list. It is a Javascript object with 2 keys: $nat set to "kw" and kw set to an object indexed by the keyword arguments keys

For instance, the call

f(1, *t, x=2, **d)

where t=['a', 'b'] and d = {'z': 99} is translated to

f(1, 'a', 'b', {$nat: 'kw', kw: {x: 2, z: 99}})

Python function definitions are translated to a Javascript function definition that takes no significant parameters ; the arguments values are set at the beginning of the function body, using the object argument and the function $B.args defined in py_utils.js. This function takes the following parameters, initialised from the Python function parameters :

$B.args = function($fname, argcount, slots, var_names, $args, $dobj,
    extra_pos_args, extra_kw_args)
  • $fname is the function name
  • argcount is the number of named parameters expected by the function, not counting the holders for extra positional or keyword arguments
  • slots is a Javascript object indexed by the expected named parameters, with value set to "null"
  • var_names is a list of expected named parameters. It is the equivalent of Object.keys(slots), but for performance reasons the list is explicitely created in the function body, instead of being created at each function call
  • $args is the iterable holding the arguments passed to the function, generally set to the Javascript built-in arguments
  • $dobj is a Javascript dictionary for the named arguments that take default values ; set to {} if no default value is specified
  • extra_pos_args is the name of the holder for extra positional arguments, or null
  • extra_kw_args is the name of the holder for extra keyword arguments, or null

A few examples :

for def f(x): the Javascript function starts with

var $ns = $B.args("f", 1, {x:null}, ['x'], arguments, {}, null, null)

for def f(x, y=1):

var $ns = $B.args("f", 2, {x:null: y:null}, ['x', 'y'], arguments, 
    {y: 1}, null, null)

for def f(x, *t):

var $ns = $B.args("f", 1, {x:null}, ['x'], arguments, {}, "t", null)

for def f(x, y=1, *t, **d):

var $ns = $B.args("f", 2, {x:null, y:null}, ['x', 'y'], arguments, 
    {y: 1}, "t", "d")

$B.args checks the arguments passed to the function and raises exceptions if there are missing or unexpected arguments. Otherwise, the object returned is indexed by the name of the arguments passed and, if specified, the name of the holders for extra arguments.

For instance, in the last example above, $ns will have the keys x, y, t and d.

Name resolution

A Python program is divided in blocks : modules, functions, classes. For each block, Brython defines a Javascript variable that will hold all the names bound in the block (we call it the "block names object").

Based on lexical analysis, including the global and nonlocal keywords, it is generally possible to know in which block a name is bound. It is translated as the attribute of the same name of the block names object.

When the name is referenced (eg print(x)) and not bound (eg x = 1), the translation is actually a call to a function : check_def('a', X['a']) where X is the block names object. check_def(name, obj) is a function that checks if obj is undefined, and if so, throws a NameError or UnboundLocalError for the name. This is done because if a name is bound somewhere in a block, it may not have yet been bound when it is referenced, for instance in examples like :

# example 1 : raises NameError
def f():
    a
a = f()

# example 2 : raises NameError
class A:
    def __init__(self):
        a
a = A()

# example 3 : raises NameError
if False:
    a = 0
a

# example 4 : raises UnboundLocalError
def f():
    if False:
        a = 9
    a
f()

If lexical analysis shows that a referenced name is certainly defined, it is simply translated to X['a'] : this is the case when the name has been bound in a previous line in the block, at the block level, not in an indented level. For instance in this case :

x = 0
print(x)

The only case when the block can't be determined is when the program imports names by from some_module import *. In this case :

  • it is impossible to know if a name like range referenced in the script is the built-in class range or if it was among the names imported from some_module
  • if a name which is not explicitely bound in the script is referenced, lexical analysis can't determine if it should raise a NameError

In this case, the name is translated to a call to a function that will select at run time the value based on the names actually imported by the module, or raise a NameError.

Execution frames

Brython handles the execution frames in a stack. Each time the program enters a new module or a new function (including lambdas and comprehensions), information about the global and local environment is placed on top of the stack ; when the function or module exits, the element on top of the stack is removed.

This is done by inserting calls to the internal functions enter_frame() and leave_frame() in the generated Javascript code.

The stack is used for instance by built-in functions globals() and locals(), and to build the traceback information in case an exception is raised.

indexedDB cache for standard library modules

This feature is used under 2 conditions :

  • the browser must support the indexedDB database engine (most of them do, including on smartphones)
  • the Brython page must use brython_stdlib.js, or the reduced version brython_modules.js generated by the CPython brython module

The main idea is to store the Javascript translation of stdlib modules in an indexedDB database : the translation is done only once for each new version of Brython ; the generated Javascript is stored on the client side, not sent over the network, and indexedDB can easily handle a few Mb of data.

Unfortunately, indexedDB works asynchronously, while import is blocking. With this code:

import datetime
print(datetime.datetime.now())

using indexedDB at runtime to get the datetime module is not possible, because the code that follows the import statement is not in a callback function that could be called when the indexedDB asynchronous request completes.

The solution is to scan the script at translation time. For each import statement in the source code, the name of the module to import is stored in a list. When the translation is finished, the Brython engine enters an execution loop (defined in function loop() in py2js.js) that uses a tasks stack. The possible tasks are:

  • call function inImported() that checks if the module is already in the imported modules. If so, the control returns to loop()
  • if not, add a task to the stack : a call to function idb_get() that makes a request to the indexedDB database to see if the Javascript version of the Python module is already stored ; when the task is added, control returns to loop()
  • in the callback of this request (function idb_load()) :
    • if the Javascript version exists in the database, it is stored in a Brython variable (__BRYTHON__.precompiled) and the control returns to loop()
    • otherwise, the Python source for the module (found in brython_stdlib.js) is translated and another task is added to the stack : a request to store the Javascript code in the indexedDB database. The callback of this request adds another task : a new call to idb_get(), that is sure to succeed this time
  • the last task on the stack is the execution of the original script

At run time, when a module in the standard library is imported, the Javascript translation stored in __BRYTHON__.precompiled is executed : the Python to Javascript translation has been made previously.

Cache update

The indexedDB database is associated with the browser and persists between browser requests, when the browser is closed, when the PC is restarted, etc. The process described above must define a way to update the Javascript version stored in the database when the Python source code in the stdlib is changed, or when the translation engine changes.

To achieve this, cache update relies on a timestamp. Each version of Brython is marked with a timestamp, updated by the script make_dist.py. When a script in the stdlib is precompiled and stored in the indexedDB database, the record in the database has a timestamp field set to this Brython timestamp. If a new version of Brython is used in the HTML page, it has a different timestamp and in the result of idb_load(), a new translation is performed.

A complementary timestamp is defined if brython_modules.js is used instead of brython_stdlib.js.

Limitations

The detection of the modules to import is made by a static code analysis, relying on import moduleX of from moduleY import foo. It cannot work for imports performed with the built-in function __import__(), or for code passed to exec(). In these cases, the previous solution of on-the-fly compilation at each page load is used.

The mechanism is only implemented for modules in the standard library, or those in brython_modules.js. Using it for modules in site-packages or in the application directory is not implemented at the moment.

Pseudo-code

Below is a simplified version of the cache implementation, written in a Python-like pseudo code.

def brython():
    <get Brython scripts in the page>
    for script in scripts:
        # Translate Python script source to Javascript
        root = __BRYTHON__.py2js(script.src)
        js = root.to_js()
        if hasattr(__BRYTHON__, "VFS") and __BRYTHON__.has_indexedDB:
            # If brython_stdlib.js is included in the page, the __BRYTHON__
            # object has an attribute VFS (Virtual File System)
            for module in root.imports:
                tasks.append([inImported, module])
        tasks.append(["execute", js])
    loop()

def inImported(module_name):
    if module_name in imported:
        pass
    elif module_name in stdlib:
        tasks.insert(0, [idb_get, module_name])
    loop()

def idb_get(module_name):
    request = database.get(module_name)
    request.bind("success",
        lambda evt: idb_load(evt, module_name))

def idb_load(evt, module_name):
    result = evt.target.result
    if result and result.timestamp == __BRYTHON__.timestamp:
        __BRYTHON__.precompiled[module] = result.content
        for subimport in result.imports:
            tasks.insert(0, [inImported, subimport])
    else:
        # Not found or outdated : precompile source code found
        # in __BRYTHON__.VFS
        js = __BRYTHON__.py2js(__BRYTHON__.VFS[module]).to_js()
        tasks.insert(0, [store_precompiled, module, js])
    loop()

def store_precompiled(module, js):
    """Store precompiled Javascript in the database."""
    request = database.put({"content": js, "name": module})

    def restart(evt):
        """When the code is inserted, add a new request to idb_get (this time
        we are sure it will find the precompiled code) and call loop()."""
        tasks.insert(0, [idb_get, module])
        loop()

    request.bind("success", restart)

def loop():
    """Pops first item in tasks stack, run task with its arguments."""
    if not tasks:
        return
    func, *args = tasks.pop(0)
    if func == "execute":
        js_script = args[0]
        <execute js_script>
    else:
        func(*args)