MetaDL is a Datalog language extension that enables analysis of Datalog and Java programs inside Datalog. In addition, the subset of the metalanguage that depends on the analyzed language is mechanically generated from a description of the abstract and concrete grammars of the analyzed language.
Srcs("p/t0.mdl").
metadl ('Srcs) {
Arity(pname, arity) :- <: ... :- ..., $p(...,$i:$x), ... .:>, BIND(arity, $i+1), ID($p, pname).
Arity(pname, arity) :- <: ... :- ..., NOT($p(...,$i:$x)), ... .:>, BIND(arity, $i+1), ID($p, pname).
Arity(pname, arity) :- <: ..., $p(...,$i:$x), ... :- ... .:>, BIND(arity, $i+1), ID($p, pname).
}
OUTPUT('Arity).
MetaDL depends on modified versions of JastAdd and JastAddParser. These dependencies are packaged as submodules, so run git submodule update --init --recursive
to fetch them.
Ensure that the souffle
executable is available in your $PATH
.
For the hybrid
and incremental
evaluation modes, a version a modified version of Souffle is needed. This version exposes an extended JNI interface and uses a different format for SQLite input and output.
git clone https://github.com/alexdura/souffle.git -b javadl
When building, make sure to enable 64-bit domains and SWIG (needed only for the hybrid mode):
./bootstrap ./configure --enable-swig --enable-64bit-domain make
For analyzing Java, MetaDL needs access to a Java 8 runtime library. Set METADL_JAVA_RT=/path/to/jre/lib/rt.jr
in your environment to let MetaDL know where to find this library.
Run ./gradlew :jar
to build MetaDL and ./gradlew :test
to test it. For the tests to run, you need the souffle
executable in your executable path.
We provide a Docker image which contains MetaDL and evaluation scripts, packaged together with all the required dependencies.
cd docker ./build.sh
then run the image using:
docker run -it javadl:oopsla21
Inside the Docker image, run the evaluation scripts using:
cd /work ./run_quality.bash ./run_performance.bash
The analysis quality evaluation script prints its results to standard output. The performance evaluation script produces the data in /work/javadl-inc-eval/data/
and the plots in /work/javadl-inc-eval/{plots_ep,plots_sb}
.
MetaDL support multiple running modes, which represent a trade-off between speed and external dependencies.
- Internal evaluation
Uses the internal semi-naive evaluator and it is reasonably fast when the number of tuples is small (< 1 million).
java -jar compiler.jar --eval metadl program.mdl -F fact_dir -D output_dir
If the strata dependency is graph, parallel evaluation will speed things up. For that, use the internall parallel evaluator:
java -jar compiler.jar --eval metadl-par program.mdl -F fact_dir -D output_dir
This is performed in two-steps:
java -jar compiler.jar --gen-hybrid program.mdl
This will generate a library called libSwigInterface.so
in your current directory.
java -jar compiler.jar --eval hybrid program.mdl --lib /ABS/PATH/TO/libSwigInterface.so -F fact_dir -D output_dir
For incremental evaluation mode, the source predicate reference for an analyze block, e.g. Srcs
in java ('Srcs) { .. }
must contain tuples of the form (FilePath, MDA)
where MDA is one of “M”, “D”, “A” and indicates whether the file was added, modified or removed.
The initial run generates the analysis programs and initializes the cache structure. For initial runs, the MDA component of the source predicate is always interpreted as “A”.
java -jar compiler.jar --incremental init -D output_and_cache_dir -F fact_dir
The update run, recomputes the analysis results based on the information in the source relations. Results for files that were previously analyzed, but are not present in the source relation, are kept in the cache and updated if necessary.
java -jar compiler.jar --incremental update -D output_and_cache_dir -F fact_dir
The fact that the output and the cache share the same directory is an artifact of the current command line interface and they will be separated in the future.
When the number of tuples is becoming large, using a high performance evaluator will make the runs faster. For this, MetaDL can generate a Souffle program and then evaluate it:
java -jar compiler.jar --eval souffle program.mdl -F fact_dir -D output_dir
Datalog is a declarative query language, with roots in logic programming. Relations between tables are expressed as Horn clauses. MetaDL extends Datalog with syntactic patterns and associates side-effects to the following predicates EDB
and OUTPUT
. The order of evaluation is as follows:
- All predicates the
EDB
predicate depends upon are evaluated. For all tuples('P, "file")
in the theEDB
relation, the file is read as a CSV and its tuples are added to the relationP
. - The predicate the
java
ormetadl
block depends upon is evaluated. For an analyze block,java ('P) { ... }
, the P predicate is evaluated. The strings in the P relation are interpreted as file system paths and the source files present at those locations are loaded for analysis. - Fixpoint evaluation.
- For all values
('P)
in theOUTPUT
relation, the contents of relationP
are written out to a fileP.csv
.
The order of evaluation was chosen for convenience reasons, to make the following patterns possible:
- Control the analyzed program from a configuration file
Prog.csv
:
EDB('Progs, "Prog.csv", "csv"). java ('Progs) { ... }
Additional Supported features:
- Stratified negation
NOT(P(x1,...,xn))
- Filtering
LT(expr1, expr2)
,GT(expr1, expr2)
- Object creation
BIND(v, expr)
binds a variable to the result of an expression - Arithmetic expressions (
+
,-
,*
,/
) and string concatenation (cat
) insideBIND
and filtering predicates - Type inference
Datalog: metadl ('P) { }
or Java: java ('P) { }
The only blocks that are allowed to contain patterns, metavariables and gaps. All the patterns and special predicates inside these blocks refer to the analyzed program, \'P
.
Datalog: <:$p(x, 1) :- ..., $q(..., $i:$v, ...) , ... .:>
or Java <: class `c implements .., `i, .. { .. } :>
Patterns are a mechanism to match rules and bind metavariables to terms, expressions and predicate symbols.
The root node of a pattern can be accessed by using a bounded pattern $p <:$x + $y:>
.
Datalog ...
or Java ..
Gaps express missing elements inside a list.
Datalog: $x
, $p
or Java: `c, `i
Variables used inside analyze blocks to connect patterns with other literals in the rule
- Terms:
p($x, $y)
- Predicates:
$p(x, y)
- Arithmetic expressions:
$x + $y
- Index metavariables
p(..., $i:$v, ...)
Special metapredicates are allowed only inside analyze
blocks.
STR(c, "value")
,INT($c, value)
- relate constants to their valueID(v, "name")
- relate identifiers to their nameSRC(n, l, c)
- relate an AST node to its source location
#Import a program that contains BIND(t, x*y + ((1 + 2*3) - 1) / 2) P("bpatterns.in"). metadl ('P) { Expr($p, 0, $q, "+"), Expr($p, 1, $r, "+") :- $p <:$q + $r:>. Expr($p, 0, $q, "*"), Expr($p, 1, $r, "*") :- $p <:$q * $r:>. Expr($p, 0, $q, "-"), Expr($p, 1, $r, "-") :- $p <:$q - $r:>. Expr($p, 0, $q, "/"), Expr($p, 1, $r, "/") :- $p <:$q / $r:>. Eval(e, v) :- INT(e, v). Eval(e, v) :- Expr(e, 0, l, "+"), Expr(e, 1, r, "+"), Eval(l, lv), Eval(r, rv), BIND(v, lv + rv). Eval(e, v) :- Expr(e, 0, l, "*"), Expr(e, 1, r, "*"), Eval(l, lv), Eval(r, rv), BIND(v, lv * rv). Eval(e, v) :- Expr(e, 0, l, "-"), Expr(e, 1, r, "-"), Eval(l, lv), Eval(r, rv), BIND(v, lv - rv). Eval(e, v) :- Expr(e, 0, l, "/"), Expr(e, 1, r, "/"), Eval(l, lv), Eval(r, rv), BIND(v, lv / rv). OurExprEval(v) :- <: ... :- ..., BIND(t, x*y + $e), ... .:>, Eval($e, v). } # OurExprEval = {3}. OUTPUT('OurExprEval, "OurExprEval", "csv").
P("tests/evaluation/withimport/evalTest_15_input.java"). java ('P) { ClassImplementsInterface(c, i) :- <: class `c implements .., `i, .. { .. } :>, ID(`c, c), ID(`i, i). InterfaceExtendsInterface(i, j) :- <: interface `i extends `j { .. } :>, ID(`i, i), ID(`j, j). ClassExtendsClass(c, d) :- <: class `c extends `d { .. } :>, ID(`c, c), ID(`d, d). ClassImplementsInterface(c, i), ClassExtendsClass(c, d) :- <: class `c extends `d implements .., `i, .. { .. } :>, ID(`c, c), ID(`d, d), ID(`i, i). } SuperClass(c, s) :- ClassExtendsClass(c, s). SuperClass(c, s) :- ClassExtendsClass(c, d), SuperClass(d, s). SuperInterface(i, s) :- InterfaceExtendsInterface(i, s). SuperInterface(i, s) :- InterfaceExtendsInterface(i, j), SuperInterface(j, s). Interface(c, i) :- ClassImplementsInterface(c, i). Interface(c, i) :- SuperClass(c, d), Interface(d, i). Interface(c, i) :- Interface(c, j), SuperInterface(j, i). OUTPUT('Interface, "Interface.csv", "csv"). OUTPUT('SuperClass, "SuperClass.csv", "csv"). OUTPUT('SuperInterface, "SuperInterface.csv", "csv").
This repository is covered by a BSD 2-clause license, see LICENSE.
The following commands are useful when debugging MetaDL:
- Pretty print the desugared program in MetaDL format
java -jar compiler.jar --pretty-print metadl program.mdl
- Pretty print the desugared program in Souffle format
java -jar compiler.jar --pretty-print metadl program.mdl
- Enable internal debug printouts by setting
METADL_LOG=debug|time|info
in the environment.
SEP is an Earley parser implementation. We use it to parse the patterns.
JastAdd is a meta-compilation system that supports Reference Attribute Grammars (RAGs). It uses the parser generated from Beaver. In addition it takes an abstract grammar description file as input. The abstract grammar description is used to generate the classes that represent the AST.
ExtendJ is an extensible Java compiler built using JastAdd.
Souffl'e is a high performance Datalog engine that MetaDL uses as backend for evaluating complex queries that are too slow for the internal evaluator.
JUnit is a unit testing framework.
JFlex is a lexical analyzer generator.
Beaver is a LALR(1) parser generator. The parser descriptions are written in EBNF-form.
Based on the Datalog implementation developed by Hampus Balldin for the Project Course in Computer Science, Faculty of Engineering LTH, Lund University.