-
Notifications
You must be signed in to change notification settings - Fork 514
How Brython works
A typical Brython-powered HTML page looks like this :
<html>
<head>
<script src="/path/to/brython.js"></script>
</head>
<body onload="brython()">
<script type="text/python">
...
</script>
</body>
</html>
brython.js is the minified concatenation of individual scripts that
handle specific tasks, either at compile time (generation of Javascript code
from Python source : done in py2js.js) or at run time (for instance,
implementation of all Python built-in objects, eg in py_list.js for
list
and tuple
, py_string.js for str
, etc). The development is
done on the individual scripts ; brython.js is generated by the script
/scripts/make_dist.py.
brython.js exposes 2 names in the global Javascript namespace : brython
(the function called on page load) and __BRYTHON__
, an object that holds all
internal objects needed to run Python scripts.
The function brython()
inspects all the scripts in the page ; for those that
have the type text/python
, it reads the Python source code, translates it to
Javascript, and runs this script by a Javascript eval()
.
If the <script>
tag has an attribute src
, an Ajax call is performed to
get the content of the file at the specified url, and its source is converted
and executed as above.
The translation to Javascript takes the following steps :
- a tokenizer reads the tokens in the source code and passes them to an automat
that builds an abstract tree for the code, or raises
SyntaxError
orIndentationError
. - this tree is transformed (nodes are added or modified) to translate some single Python statements in a number of Javascript statements.
- if the debug level is set, additional nodes are added to update an internal object that is set to the current script name and line number.
- the transformed tree supports a method
to_js()
that returns the Javascript code.
All this is done in the script py2js.js :
- function
brython()
is the last one in the script - translation is done by function
py2js()
- this function calls the tokenizer : function
tokenize()
- the tokenizer builds a tree made of instances of the class
$Node
- an instance of
$Node
is created for each new statement, and a context is created for the node - new tokens generally change the state of the context by a call such as
context = transition(context, token_type, token_value)
-
context
is an instance of one of the classes defined in the script, whose name starts with$
and ends withCtx
: for instance, when the tokenizer encounters the keywordtry
, the functiontransition()
retuns an instance of$TryCtx
__BRYTHON__
has an attribute builtins
that stores all the built-in
Python names (classes, functions, exceptions, objects), usually with the
same name : for instance, the built-in class int
is stored as
__BRYTHON__.builtins.int
. Only names that conflict with Javascript naming
rules must be changed, eg super()
is implemented as __BRYTHON__.$$super
.
Python strings are implemented as Javascript strings.
Python lists and tuples are implemented as Javascript arrays.
Python integers are implemented as Javascript numbers if they are in the range of Javascript "safe integers", ie [-(2^53-1), 2^53-1] ; outside of this range they are implemented with an internal class.
Python floats are implemented as instances of the Javascript Number
class.
All other Python classes (builtin or user-defined) are implemented as a Javascript object that holds the class attributes and methods.
A minimal implementation of a class is done by such code :
$B.make_class = function(name, factory){
// Builds a basic class object
var A = {
__class__: _b_.type,
__mro__: [object],
__name__: name,
$is_class: true
}
A.$factory = factory
return A
}
factory is the function that creates instances of the class. The instances
have an attribute __class__
set to the class object.
The class dictionary has an attribute __mro__
, a list of the classes
used for attribute resolution on instances of the class.
Python functions are implemented as Javascript functions, but there are many differences, both with function definition and function calls.
To define a Python function, its parameters can be specified in many ways : named parameters, eg def f(x):
; with default values : def f(x=1):
; holders for additional positional and keyword arguments : def f(*x, **y):
A Python function can be called with positional arguments : f(2)
, keyword arguments : f(y=1)
, packed iterables : f(*args)
and packed dictionaries : f(**kw)
.
Javascript also has a variety of ways to handle parameters : named parameters : function f(x)
, and a way to handle arguments with the object arguments
that can be used inside the function, more or less like a list : function f(){var x=arguments[0]}
. Function calls can be done with named arguments : f(x)
, or with the methods call
and apply
.
For function calls, the arguments passed to the Python function are translated this way :
- positional arguments are kept unmodified
- packed tuples are unpacked and added to the positional arguments
- all keyword arguments (including packed dictionaries) are grouped in a single argument put at the end of the argument list. It is a Javascript object with 2 keys:
$nat
set to"kw"
andkw
set to an object indexed by the keyword arguments keys
For instance, the call
f(1, *t, x=2, **d)
where t=['a', 'b']
and d = {'z': 99}
is translated to
f(1, 'a', 'b', {$nat: 'kw', kw: {x: 2, z: 99}})
Python function definitions are translated to a Javascript function definition that takes no significant parameters ; the arguments values are set at the beginning of the function body, using the object argument
and the function $B.args
defined in py_utils.js. This function takes the following parameters, initialised from the Python function parameters :
$B.args = function($fname, argcount, slots, var_names, $args, $dobj,
extra_pos_args, extra_kw_args)
-
$fname
is the function name -
argcount
is the number of named parameters expected by the function, not counting the holders for extra positional or keyword arguments -
slots
is a Javascript object indexed by the expected named parameters, with value set to "null" -
var_names
is a list of expected named parameters. It is the equivalent ofObject.keys(slots)
, but for performance reasons the list is explicitely created in the function body, instead of being created at each function call -
$args
is the iterable holding the arguments passed to the function, generally set to the Javascript built-inarguments
-
$dobj
is a Javascript dictionary for the named arguments that take default values ; set to{}
if no default value is specified -
extra_pos_args
is the name of the holder for extra positional arguments, ornull
-
extra_kw_args
is the name of the holder for extra keyword arguments, ornull
A few examples :
for def f(x):
the Javascript function starts with
var $ns = $B.args("f", 1, {x:null}, ['x'], arguments, {}, null, null)
for def f(x, y=1):
var $ns = $B.args("f", 2, {x:null: y:null}, ['x', 'y'], arguments,
{y: 1}, null, null)
for def f(x, *t)
:
var $ns = $B.args("f", 1, {x:null}, ['x'], arguments, {}, "t", null)
for def f(x, y=1, *t, **d)
:
var $ns = $B.args("f", 2, {x:null, y:null}, ['x', 'y'], arguments,
{y: 1}, "t", "d")
$B.args
checks the arguments passed to the function and raises
exceptions if there are missing or unexpected arguments. Otherwise, the
object returned is indexed by the name of the arguments passed and, if
specified, the name of the holders for extra arguments.
For instance, in the last example above, $ns
will have the keys x, y, t
and d
.
A Python program is divided in blocks : modules, functions, classes. For each block, Brython defines a Javascript variable that will hold all the names bound in the block (we call it the "block names object").
Based on lexical analysis, including the global
and nonlocal
keywords, it
is generally possible to know in which block a name is bound. It is translated
as the attribute of the same name of the block names object.
When the name is referenced (eg print(x)
) and not bound (eg x = 1
), the translation
is actually a call to a function : check_def('a', X['a'])
where X
is the block names
object. check_def(name, obj)
is a function that checks if obj
is undefined, and
if so, throws a NameError
or UnboundLocalError
for the name. This is done
because if a name is bound somewhere in a block, it may not have yet been bound
when it is referenced, for instance in examples like :
# example 1 : raises NameError
def f():
a
a = f()
# example 2 : raises NameError
class A:
def __init__(self):
a
a = A()
# example 3 : raises NameError
if False:
a = 0
a
# example 4 : raises UnboundLocalError
def f():
if False:
a = 9
a
f()
If lexical analysis shows that a referenced name is certainly defined,
it is simply translated to X['a']
: this is the case when the name has
been bound in a previous line in the block, at the block level, not in
an indented level. For instance in this case :
x = 0
print(x)
The only case when the block can't be determined is when the program imports
names by from some_module import *
. In this case :
- it is impossible to know if a name like
range
referenced in the script is the built-in classrange
or if it was among the names imported fromsome_module
- if a name which is not explicitely bound in the script is referenced,
lexical analysis can't determine if it should raise a
NameError
In this case, the name is translated to a call to a function that will select
at run time the value based on the names actually imported by the module, or
raise a NameError
.
Brython handles the execution frames in a stack. Each time the program enters a new module or a new function (including lambdas and comprehensions), information about the global and local environment is placed on top of the stack ; when the function or module exits, the element on top of the stack is removed.
This is done by inserting calls to the internal functions enter_frame()
and
leave_frame()
in the generated Javascript code.
The stack is used for instance by built-in functions globals()
and
locals()
, and to build the traceback information in case an exception is
raised.
This feature is used under 2 conditions :
- the browser must support the indexedDB database engine (most of them do, including on smartphones)
- the Brython page must use brython_stdlib.js, or the reduced version brython_modules.js generated by the CPython brython module
The main idea is to store the Javascript translation of stdlib modules in an indexedDB database : the translation is done only once for each new version of Brython ; the generated Javascript is stored on the client side, not sent over the network, and indexedDB can easily handle a few Mb of data.
Unfortunately, indexedDB works asynchronously, while import is blocking. With this code:
import datetime
print(datetime.datetime.now())
using indexedDB at runtime to get the datetime module is not possible, because the code that follows the import statement is not in a callback function that could be called when the indexedDB asynchronous request completes.
The solution is to scan the script at translation time. For each import statement in the source code, the name of the module to import is stored in a list. When the translation is finished, the Brython engine enters an execution loop (defined in function loop()
in py2js.js) that uses a tasks stack. The possible tasks are:
- call function
inImported()
that checks if the module is already in the imported modules. If so, the control returns toloop()
- if not, add a task to the stack : a call to function
idb_get()
that makes a request to the indexedDB database to see if the Javascript version of the Python module is already stored ; when the task is added, control returns toloop()
- in the callback of this request (function
idb_load()
) :- if the Javascript version exists in the database, it is stored in a Brython variable (
__BRYTHON__.precompiled
) and the control returns toloop()
- otherwise, the Python source for the module (found in brython_stdlib.js) is translated and another task is added to the stack : a request to store the Javascript code in the indexedDB database. The callback of this request adds another task : a new call to
idb_get()
, that is sure to succeed this time
- if the Javascript version exists in the database, it is stored in a Brython variable (
- the last task on the stack is the execution of the original script
At run time, when a module in the standard library is imported, the Javascript translation stored in __BRYTHON__.precompiled
is executed : the Python to Javascript translation has been made previously.
Cache update
The indexedDB database is associated with the browser and persists between browser requests, when the browser is closed, when the PC is restarted, etc. The process described above must define a way to update the Javascript version stored in the database when the Python source code in the stdlib is changed, or when the translation engine changes.
To achieve this, cache update relies on a timestamp. Each version of Brython is marked with a timestamp, updated by the script make_dist.py. When a script in the stdlib is precompiled and stored in the indexedDB database, the record in the database has a timestamp field set to this Brython timestamp. If a new version of Brython is used in the HTML page, it has a different timestamp and in the result of idb_load()
, a new translation is performed.
A complementary timestamp is defined if brython_modules.js is used instead of brython_stdlib.js.
Limitations
The detection of the modules to import is made by a static code analysis, relying on import moduleX
of from moduleY import foo
. It cannot work for imports performed with the built-in function __import__()
, or for code passed to exec()
. In these cases, the previous solution of on-the-fly compilation at each page load is used.
The mechanism is only implemented for modules in the standard library, or those in brython_modules.js. Using it for modules in site-packages or in the application directory is not implemented at the moment.
Pseudo-code
Below is a simplified version of the cache implementation, written in a Python-like pseudo code.
def brython():
<get Brython scripts in the page>
for script in scripts:
# Translate Python script source to Javascript
root = __BRYTHON__.py2js(script.src)
js = root.to_js()
if hasattr(__BRYTHON__, "VFS") and __BRYTHON__.has_indexedDB:
# If brython_stdlib.js is included in the page, the __BRYTHON__
# object has an attribute VFS (Virtual File System)
for module in root.imports:
tasks.append([inImported, module])
tasks.append(["execute", js])
loop()
def inImported(module_name):
if module_name in imported:
pass
elif module_name in stdlib:
tasks.insert(0, [idb_get, module_name])
loop()
def idb_get(module_name):
request = database.get(module_name)
request.bind("success",
lambda evt: idb_load(evt, module_name))
def idb_load(evt, module_name):
result = evt.target.result
if result and result.timestamp == __BRYTHON__.timestamp:
__BRYTHON__.precompiled[module] = result.content
for subimport in result.imports:
tasks.insert(0, [inImported, subimport])
else:
# Not found or outdated : precompile source code found
# in __BRYTHON__.VFS
js = __BRYTHON__.py2js(__BRYTHON__.VFS[module]).to_js()
tasks.insert(0, [store_precompiled, module, js])
loop()
def store_precompiled(module, js):
"""Store precompiled Javascript in the database."""
request = database.put({"content": js, "name": module})
def restart(evt):
"""When the code is inserted, add a new request to idb_get (this time
we are sure it will find the precompiled code) and call loop()."""
tasks.insert(0, [idb_get, module])
loop()
request.bind("success", restart)
def loop():
"""Pops first item in tasks stack, run task with its arguments."""
if not tasks:
return
func, *args = tasks.pop(0)
if func == "execute":
js_script = args[0]
<execute js_script>
else:
func(*args)