Authors: Ulf Wiger (ulf@wiger.net
).
Pluggable parsers for the Erlang compiler
This project started as an experiment with runtime code injection
using the parse_trans
library.
(Note, 'Toker' is the Swedish name for Dopey the dwarf, but also obviously a word play refering to token transformation.)
This library installs itself into the Erlang compiler, allowing modules to switch parser modules as well as install token transformers.
Example: the toker_test
module uses a custom syntax:
-module(toker_test).
-export([double/1, i2l/1]).
-toker_parser(toker_erl_parse).
double(L) ->
lists:map(`(X) -> X*2`, L).
i2l(L) ->
lists:map(`integer_to_list/1, L).
This module will not compile with the standard Erlang parser, but toker's
own build chain bootstraps itself and installs a hook in the erl_parse
module,
which then detects the instruction -toker_parser(toker_erl_parse)
.
Demonstration:
Eshell V5.10.3 (abort with ^G)
1> toker_test:double([1,2,3]).
[2,4,6]
2> toker_test:i2l([1,2,3]).
["1","2","3"]
In the above example, we assume that toker_test
has been compiled with
rebar compile
. The rebar.config
file in toker
makes use of the
erl_first_files
option and a parse transform to bootstrap the compiler
patch. Specifically:
toker_c.erl
contains the basic erl_parse modifications, and is compiled first.toker_pt.erl
is a parse transform which leaves the forms untouched, but ensures thattoker
is initialized.toker_bootstrap.erl
is an empty module which only exists to trigger thetoker_pt
parse transform.
Other applications could throw in a reference to the toker_pt
parse transform
in order to activate toker, but remember that parse transforms are only called
after the module has been parsed, so if a module contains
unconventional grammar, the parse transform must be called in a preceding
module.
Instructions recognized by toker are:
-toker_parser(Mod)
- whereMod
must exportMod:parse_form(Tokens)
, which must return a valid erlang abstract form.-toker_token_transform(Mod)
- whereMod
must exportMod:transform_tokens(Tokens)
, which must return a list of tokens. The functiontoker_c:transform_tokens/1
returns the tokens unchanged.-toker_reset(Type)
- whereType
is eitherparser
,token_transform
orall
, restores the relevant settings to the default.
Note that a token transform must return a list of tokens corresponding to a valid Erlang form (possibly after being processed by another parser). The Erlang parser has no support for skipping a part of the token stream.
The toker compiler patch can also be installed by starting the toker
application.
Example:
Eshell V5.10.3 (abort with ^G)
1> compile:file("src/toker_test", [{outdir,"ebin"},report]).
src/toker_test.erl:8: syntax error before: '`'
src/toker_test.erl:11: syntax error before: '`'
src/toker_test.erl:3: function double/1 undefined
src/toker_test.erl:3: function i2l/1 undefined
error
2> application:start(toker).
ok
3> compile:file("src/toker_test", [{outdir,"ebin"},report]).
{ok,toker_test}
4> toker_test:double([1,2,3]).
[2,4,6]
In order to get erlc to pick up the toker functionality, ensure that toker
has been compiled and is in the path, then set ERLC_EMULATOR="erl -s toker"
Example:
toker uwiger$ erlc -o ebin src/toker_test.erl
src/toker_test.erl:8: syntax error before: '`'
src/toker_test.erl:11: syntax error before: '`'
src/toker_test.erl:3: function double/1 undefined
src/toker_test.erl:3: function i2l/1 undefined
toker uwiger$ ERLC_EMULATOR="erl -s toker" erlc -o ebin src/toker_test.erl
toker uwiger$ erl -pa ebin
Erlang R16B02 (erts-5.10.3) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.3 (abort with ^G)
1> toker_test:double([1,2,3]).
[2,4,6]
A rebar plugin can be found in toker/util/toker_rebar_plugin.erl
. It
bootstraps the toker functionality in pre_compile
and pre_eunit
for
any application that has toker in its 'deps' list.
Apart from the src/toker_test.erl
module, the examples/
directory
contains examples of e.g. token transforms (implementing a very simple
macro pre-processor in tt1.erl
, used by m1.erl
.)
- Currently, it isn't possible to replace the scanner. Among other things, this means that forms must be terminated by '.'. Replacing the scanner would require some additional patch (which is probably doable).
toker |
toker_app |
toker_bootstrap |
toker_c |
toker_erl_parse |
toker_pt |
toker_server |
toker_sup |