Skip to content
Pierre Quentel edited this page Mar 12, 2022 · 69 revisions

Translation from Python to Javascript

A typical Brython-powered HTML page looks like this :

<html>

<head>
<script src="/path/to/brython.js"></script>
</head>

<body onload="brython()">
<script type="text/python">
...
</script>
</body>

</html>

brython.js is the minified concatenation of individual scripts that handle specific tasks, either at compile time (generation of Javascript code from Python source) or at run time (for instance, implementation of all Python built-in objects, eg in py_list.js for list and tuple, py_string.js for str, etc). The development is done on the individual scripts ; brython.js is generated by the script /scripts/make_dist.py.

brython.js exposes 2 names in the global Javascript namespace : brython (the function called on page load) and __BRYTHON__, an object that holds all internal objects needed to run Python scripts.

The function brython() inspects all the scripts in the page ; for those that have the type text/python, it reads the Python source code, translates it to Javascript, and runs this script by a Javascript eval() or new Function(func_name, source)(module) (second form to avoid memory leaks on some browsers).

If the <script> tag has an attribute src, an Ajax call is performed to get the content of the file at the specified url, and its source is converted and executed as above. Note that this is possible only if the page is loaded from a web server (protocol http), not with the File/Open browser menu.

These tasks are performed by the following functions:

  • function brython() in py2js.js gets the source code of Python scripts and builds a list of tasks to execute. This list is managed by function loop() in loaders.js
  • the main tasks are to run the scripts; this is done by function _run_script() in py2js.js
  • the translation to Javascript is managed by function py2js() in py2js.js. This function starts by creating a root node for the syntax tree, then calls function dispatch_tokens(root)
  • the source code is split in tokens by function tokenize() in python_tokenizer.js. These tokens are described in standard Python documentation
  • dispatch_tokens() maintains a context based on the tokens stream. For each new token, if it is valid, dispatch_tokens() creates the next context by context = transition(context, token_type, token_value); otherwise it raises the appropriate error
  • the tree built by dispatch_tokens() is made of nodes (instances of class $Node in py2js.js), one per statement in the source code. Nodes are made of instances of "context classes"; for instance if the statement is a "for" loop, the node contains an instance of class $ForExpr
  • once the tree is created, it is transformed into a Python-compliant AST by calling method .ast() on the root node
  • the script symtable.js, adapted from CPython symtable.c, analyses the AST tree. It detects some syntax errors, and builds a tree of lexical scopes (modules, classes, functions), each with a set of symbols. Cf. symtable
  • the AST and the symbol table are passed to function js_from_root() in ast_to_js.js, which is in charge of generating the Javascript code
  • at this stage, _run_script() puts the task "execute the Python script with its Javascript translation" on top of the tasks list and calls function loop() in loaders.js
  • this function executes the script by calling new Function(script.js)
  • if an exception is raised during execution, it is managed by function handle_error() in py_exceptions.js

Builtin objects

__BRYTHON__ has an attribute builtins that stores all the built-in Python names (classes, functions, exceptions, objects), usually with the same name : for instance, the built-in class int is stored as __BRYTHON__.builtins.int. Only names that conflict with Javascript naming rules must be changed, eg super() is implemented as __BRYTHON__.builtins.$$super.

Implementation of Python objects

Objects implemented as native Javascript objects

Python strings are implemented as Javascript strings, except those that have special characters such as 🐑, which are implemented with a custom class.

Python lists and tuples are implemented as Javascript arrays.

Python integers are implemented as Javascript numbers if they are in the range of Javascript "safe integers", ie [-(2^53-1), 2^53-1] ; outside of this range they are implemented with an internal class.

Python floats are implemented as instances of the Javascript Number class.

Other types

All other Python classes (builtin or user-defined) are implemented as a Javascript object that holds the class attributes and methods.

A minimal implementation of a class is done by such code :

$B.make_class = function(name, factory){
    // Builds a basic class object

    var A = {
        __class__: _b_.type,
        __mro__: [object],
        __name__: name,
        $is_class: true
    }

    A.$factory = factory

    return A
}

factory is the function that creates instances of the class. The instances have an attribute __class__ set to the class object.

The class dictionary has an attribute __mro__, a list of the classes used for attribute resolution on instances of the class.

Functions

Python functions are implemented as Javascript functions, but there are many differences, both with function definition and function calls.

To define a Python function, its parameters can be specified in many ways : named parameters, eg def f(x):; with default values : def f(x=1):; holders for additional positional and keyword arguments : def f(*x, **y):

A Python function can be called with positional arguments : f(2), keyword arguments : f(y=1), packed iterables : f(*args) and packed dictionaries : f(**kw).

Javascript also has a variety of ways to handle parameters : named parameters : function f(x), and a way to handle arguments with the object arguments that can be used inside the function, more or less like a list : function f(){var x=arguments[0]}. Function calls can be done with named arguments : f(x), or with the methods call and apply.

For function calls, the arguments passed to the Python function are translated this way :

  • positional arguments are kept unmodified
  • packed tuples are unpacked and added to the positional arguments
  • all keyword arguments (including packed dictionaries) are grouped in a single argument put at the end of the argument list. It is a Javascript object with 2 keys: $nat set to "kw" and kw set to an object indexed by the keyword arguments keys

For instance, the call

f(1, *t, x=2, **d)

is translated to

f.apply(null, [1].concat(list(t)).concat([{$nat: "kw", kw:[{x: 2}, d]}))

Python function definitions are translated to a Javascript function definition that takes no significant parameters ; the arguments values are set at the beginning of the function body, using the object argument and the function $B.args defined in py_utils.js. This function takes the following parameters, initialised from the Python function parameters :

$B.args = function(fname, argcount, slots, var_names, args, dobj,
                   extra_pos_args, extra_kw_args)
  • fname is the function name
  • argcount is the number of named parameters expected by the function, not counting the holders for extra positional or keyword arguments
  • slots is a Javascript object indexed by the expected named parameters, with value set to "null"
  • var_names is a list of expected named parameters. It is the equivalent of Object.keys(slots), but for performance reasons the list is explicitely created in the function body, instead of being created at each function call
  • args is the iterable holding the arguments passed to the function, generally set to the Javascript built-in arguments
  • dobj is a Javascript object for the named arguments that take default values ; set to {} if no default value is specified
  • extra_pos_args is the name of the holder for extra positional arguments, or null
  • extra_kw_args is the name of the holder for extra keyword arguments, or null

A few examples :

for def f(x): the Javascript function starts with

var $ns = $B.args("f", 1, {x:null}, ['x'], arguments, {}, null, null)

for def f(x, y=1):

var $ns = $B.args("f", 2, {x:null: y:null}, ['x', 'y'], arguments,
                  {y: 1}, null, null)

for def f(x, *t):

var $ns = $B.args("f", 1, {x:null}, ['x'], arguments, {}, "t", null)

for def f(x, y=1, *t, **d):

var $ns = $B.args("f", 2, {x:null, y:null}, ['x', 'y'], arguments,
                  {y: 1}, "t", "d")

$B.args checks the arguments passed to the function and raises exceptions if there are missing or unexpected arguments. Otherwise, the object returned is indexed by the name of the arguments passed and, if specified, the name of the holders for extra arguments.

For instance, in the last example above, $ns will have the keys x, y, t and d.

Name resolution

A Python program is divided in blocks : modules, functions, classes, comprehensions. For each block, Brython defines a Javascript variable that will hold all the names bound in the block (we call it the "block names object").

Based on lexical analysis, including the global and nonlocal keywords, it is generally possible to know in which block a name is bound. It is translated as the attribute of the same name of the block names object.

When the name is referenced (eg print(x)) and not bound (eg x = 1), the translation is actually a call to a function that checks if the object referenced by the name is undefined, and if so, throws a NameError or UnboundLocalError for the name. This is done because if a name is bound somewhere in a block, it may not have yet been bound when it is referenced, for instance in examples like :

# example 1 : raises NameError
def f():
    a
a = f()

# example 2 : raises NameError
class A:
    def __init__(self):
        a
a = A()

# example 3 : raises NameError
if False:
    a = 0
a

# example 4 : raises UnboundLocalError
def f():
    if False:
        a = 9
    a
f()

If lexical analysis shows that a referenced name is certainly defined, it is simply translated to X['a'] : this is the case when the name has been bound in a previous line in the block, at the block level, not in an indented level. For instance in this case :

x = 0
print(x)

The only case when the block can't be determined is when the program imports names by from some_module import *. In this case :

  • it is impossible to know if a name like range referenced in the script is the built-in class range or if it was among the names imported from some_module
  • if a name which is not explicitely bound in the script is referenced, lexical analysis can't determine if it should raise a NameError

In this case, the name is translated to a call to a function that will select at run time the value based on the names actually imported by the module, or raise a NameError.

Execution frames

Brython handles the execution frames in a stack. Each time the program enters a new module or a new function (including lambdas and comprehensions), information about the global and local environment is placed on top of the stack ; when the function or module exits, the element on top of the stack is removed.

This is done by inserting calls to the internal functions enter_frame() and leave_frame() in the generated Javascript code.

The stack is used for instance by built-in functions globals() and locals(), and to build the traceback information in case an exception is raised.

indexedDB cache for standard library modules

This feature is used under 2 conditions :

  • the browser must support the indexedDB database engine (most of them do, including on smartphones)
  • the Brython page must use brython_stdlib.js, or the reduced version brython_modules.js generated by the CPython brython module

The main idea is to store the Javascript translation of stdlib modules in an indexedDB database : the translation is done only once for each new version of Brython ; the generated Javascript is stored on the client side, not sent over the network, and indexedDB can easily handle a few Mb of data.

Unfortunately, indexedDB works asynchronously, while import is blocking. With this code:

import datetime
print(datetime.datetime.now())

using indexedDB at runtime to get the datetime module is not possible, because the code that follows the import statement is not in a callback function that could be called when the indexedDB asynchronous request completes.

The solution is to scan the script at translation time. For each import statement in the source code, the name of the module to import is stored in a list. When the translation is finished, the Brython engine enters an execution loop (defined in function loop() in loaders.js) that uses a tasks stack. The possible tasks are:

  • call function inImported() that checks if the module is already in the imported modules. If so, the control returns to loop()
  • if not, add a task to the stack : a call to function idb_get() that makes a request to the indexedDB database to see if the Javascript version of the Python module is already stored ; when the task is added, control returns to loop()
  • in the callback of this request (function idb_load()) :
    • if the Javascript version exists in the database, it is stored in a Brython variable (__BRYTHON__.precompiled) and the control returns to loop()
    • otherwise, the Python source for the module (found in brython_stdlib.js) is translated and another task is added to the stack : a request to store the Javascript code in the indexedDB database. The callback of this request adds another task : a new call to idb_get(), that is sure to succeed this time
  • the last task on the stack is the execution of the original script

At run time, when a module in the standard library is imported, the Javascript translation stored in __BRYTHON__.precompiled is executed : the Python to Javascript translation has been made previously.

Cache update

The indexedDB database is associated with the browser and persists between browser requests, when the browser is closed, when the PC is restarted, etc. The process described above must define a way to update the Javascript version stored in the database when the Python source code in the stdlib is changed, or when the translation engine changes.

To achieve this, cache update relies on a timestamp. Each version of Brython is marked with a timestamp, updated by the script make_dist.py. When a script in the stdlib is precompiled and stored in the indexedDB database, the record in the database has a timestamp field set to this Brython timestamp. If a new version of Brython is used in the HTML page, it has a different timestamp and in the result of idb_load(), a new translation is performed.

A complementary timestamp is defined if brython_modules.js is used instead of brython_stdlib.js.

Limitations

The detection of the modules to import is made by a static code analysis, relying on import moduleX of from moduleY import foo. It cannot work for imports performed with the built-in function __import__(), or for code passed to exec(). In these cases, the previous solution of on-the-fly compilation at each page load is used.

The mechanism is only implemented for modules in the standard library, or those in brython_modules.js. Using it for modules in site-packages or in the application directory is not implemented at the moment.

Pseudo-code

Below is a simplified version of the cache implementation, written in a Python-like pseudo code.

def brython():
    <get Brython scripts in the page>
    for script in scripts:
        # Translate Python script source to Javascript
        root = __BRYTHON__.py2js(script.src)
        js = root.to_js()
        if hasattr(__BRYTHON__, "VFS") and __BRYTHON__.has_indexedDB:
            # If brython_stdlib.js is included in the page, the __BRYTHON__
            # object has an attribute VFS (Virtual File System)
            for module in root.imports:
                tasks.append([inImported, module])
        tasks.append(["execute", js])
    loop()

def inImported(module_name):
    if module_name in imported:
        pass
    elif module_name in stdlib:
        tasks.insert(0, [idb_get, module_name])
    loop()

def idb_get(module_name):
    request = database.get(module_name)
    request.bind("success",
        lambda evt: idb_load(evt, module_name))

def idb_load(evt, module_name):
    result = evt.target.result
    if result and result.timestamp == __BRYTHON__.timestamp:
        __BRYTHON__.precompiled[module] = result.content
        for subimport in result.imports:
            tasks.insert(0, [inImported, subimport])
    else:
        # Not found or outdated : precompile source code found
        # in __BRYTHON__.VFS
        js = __BRYTHON__.py2js(__BRYTHON__.VFS[module]).to_js()
        tasks.insert(0, [store_precompiled, module, js])
    loop()

def store_precompiled(module, js):
    """Store precompiled Javascript in the database."""
    request = database.put({"content": js, "name": module})

    def restart(evt):
        """When the code is inserted, add a new request to idb_get (this time
        we are sure it will find the precompiled code) and call loop()."""
        tasks.insert(0, [idb_get, module])
        loop()

    request.bind("success", restart)

def loop():
    """Pops first item in tasks stack, run task with its arguments."""
    if not tasks:
        return
    func, *args = tasks.pop(0)
    if func == "execute":
        js_script = args[0]
        <execute js_script>
    else:
        func(*args)