Skip to content

MetaDL: Datalog analyzing Datalog, Java or any language of your choice

License

Notifications You must be signed in to change notification settings

lu-cs-sde/metadl

Repository files navigation

MetaDL: Datalog analyzing Datalog, Java or any language of your choice

MetaDL is a Datalog language extension that enables analysis of Datalog and Java programs inside Datalog. In addition, the subset of the metalanguage that depends on the analyzed language is mechanically generated from a description of the abstract and concrete grammars of the analyzed language.

Srcs("p/t0.mdl").

metadl ('Srcs) {
  Arity(pname, arity) :- <: ... :- ..., $p(...,$i:$x), ... .:>, BIND(arity, $i+1), ID($p, pname).
  Arity(pname, arity) :- <: ... :- ..., NOT($p(...,$i:$x)), ... .:>, BIND(arity, $i+1), ID($p, pname).
  Arity(pname, arity) :- <: ..., $p(...,$i:$x), ... :- ... .:>, BIND(arity, $i+1), ID($p, pname).
}

OUTPUT('Arity).

Building

MetaDL depends on modified versions of JastAdd and JastAddParser. These dependencies are packaged as submodules, so run git submodule update --init --recursive to fetch them.

Ensure that the souffle executable is available in your $PATH.

For the hybrid and incremental evaluation modes, a version a modified version of Souffle is needed. This version exposes an extended JNI interface and uses a different format for SQLite input and output.

git clone https://github.com/alexdura/souffle.git -b javadl

When building, make sure to enable 64-bit domains and SWIG (needed only for the hybrid mode):

./bootstrap
./configure --enable-swig --enable-64bit-domain
make

For analyzing Java, MetaDL needs access to a Java 8 runtime library. Set METADL_JAVA_RT=/path/to/jre/lib/rt.jr in your environment to let MetaDL know where to find this library. Run ./gradlew :jar to build MetaDL and ./gradlew :test to test it. For the tests to run, you need the souffle executable in your executable path.

Docker

We provide a Docker image which contains MetaDL and evaluation scripts, packaged together with all the required dependencies.

cd docker
./build.sh

then run the image using:

docker run -it javadl:oopsla21

Inside the Docker image, run the evaluation scripts using:

cd /work
./run_quality.bash
./run_performance.bash

The analysis quality evaluation script prints its results to standard output. The performance evaluation script produces the data in /work/javadl-inc-eval/data/ and the plots in /work/javadl-inc-eval/{plots_ep,plots_sb}.

Running

MetaDL support multiple running modes, which represent a trade-off between speed and external dependencies.

  1. Internal evaluation

Uses the internal semi-naive evaluator and it is reasonably fast when the number of tuples is small (< 1 million).

java -jar compiler.jar --eval metadl program.mdl -F fact_dir -D output_dir

Internal parallel evaluation

If the strata dependency is graph, parallel evaluation will speed things up. For that, use the internall parallel evaluator:

java -jar compiler.jar --eval metadl-par program.mdl -F fact_dir -D output_dir

Hybrid MetaDL-Souffle evaluation

This is performed in two-steps:

The MetaDL program is compiled to a native library.

java -jar compiler.jar --gen-hybrid program.mdl

This will generate a library called libSwigInterface.so in your current directory.

Running the compiled program

java -jar compiler.jar --eval hybrid program.mdl --lib /ABS/PATH/TO/libSwigInterface.so -F fact_dir -D output_dir

Incremental evaluation

For incremental evaluation mode, the source predicate reference for an analyze block, e.g. Srcs in java ('Srcs) { .. } must contain tuples of the form (FilePath, MDA) where MDA is one of “M”, “D”, “A” and indicates whether the file was added, modified or removed.

Initial run

The initial run generates the analysis programs and initializes the cache structure. For initial runs, the MDA component of the source predicate is always interpreted as “A”.

java -jar compiler.jar --incremental init -D output_and_cache_dir -F fact_dir

Update run

The update run, recomputes the analysis results based on the information in the source relations. Results for files that were previously analyzed, but are not present in the source relation, are kept in the cache and updated if necessary.

java -jar compiler.jar --incremental update -D output_and_cache_dir -F fact_dir

The fact that the output and the cache share the same directory is an artifact of the current command line interface and they will be separated in the future.

Souffle evaluation (deprecated)

When the number of tuples is becoming large, using a high performance evaluator will make the runs faster. For this, MetaDL can generate a Souffle program and then evaluate it:

java -jar compiler.jar --eval souffle program.mdl -F fact_dir -D output_dir

Language description

Datalog

Datalog is a declarative query language, with roots in logic programming. Relations between tables are expressed as Horn clauses. MetaDL extends Datalog with syntactic patterns and associates side-effects to the following predicates EDB and OUTPUT. The order of evaluation is as follows:

  1. All predicates the EDB predicate depends upon are evaluated. For all tuples ('P, "file") in the the EDB relation, the file is read as a CSV and its tuples are added to the relation P.
  2. The predicate the java or metadl block depends upon is evaluated. For an analyze block, java ('P) { ... }, the P predicate is evaluated. The strings in the P relation are interpreted as file system paths and the source files present at those locations are loaded for analysis.
  3. Fixpoint evaluation.
  4. For all values ('P) in the OUTPUT relation, the contents of relation P are written out to a file P.csv.

The order of evaluation was chosen for convenience reasons, to make the following patterns possible:

  • Control the analyzed program from a configuration file Prog.csv:
EDB('Progs, "Prog.csv", "csv").
java ('Progs) {
  ...
}

Additional Supported features:

  • Stratified negation NOT(P(x1,...,xn))
  • Filtering LT(expr1, expr2), GT(expr1, expr2)
  • Object creation BIND(v, expr) binds a variable to the result of an expression
  • Arithmetic expressions ( +, -, *, /) and string concatenation (cat) inside BIND and filtering predicates
  • Type inference

Metalanguage description

Analyze blocks

Datalog: metadl ('P) { } or Java: java ('P) { }

The only blocks that are allowed to contain patterns, metavariables and gaps. All the patterns and special predicates inside these blocks refer to the analyzed program, \'P.

Patterns

Datalog: <:$p(x, 1) :- ..., $q(..., $i:$v, ...) , ... .:> or Java <: class `c implements .., `i, .. { .. } :>

Patterns are a mechanism to match rules and bind metavariables to terms, expressions and predicate symbols.

Bounded patterns

The root node of a pattern can be accessed by using a bounded pattern $p <:$x + $y:>.

Gaps

Datalog ... or Java ..

Gaps express missing elements inside a list.

Metavariables

Datalog: $x, $p or Java: `c, `i

Variables used inside analyze blocks to connect patterns with other literals in the rule

  • Terms: p($x, $y)
  • Predicates: $p(x, y)
  • Arithmetic expressions: $x + $y
  • Index metavariables p(..., $i:$v, ...)

Special metapredicates

Special metapredicates are allowed only inside analyze blocks.

  • STR(c, "value"), INT($c, value) - relate constants to their value
  • ID(v, "name") - relate identifiers to their name
  • SRC(n, l, c) - relate an AST node to its source location

Examples

Constant folding for Datalog

#Import a program that contains BIND(t, x*y + ((1 + 2*3) - 1) / 2)

P("bpatterns.in").

metadl ('P) {
	    Expr($p, 0, $q, "+"), Expr($p, 1, $r, "+") :- $p <:$q + $r:>.
	    Expr($p, 0, $q, "*"), Expr($p, 1, $r, "*") :- $p <:$q * $r:>.
	    Expr($p, 0, $q, "-"), Expr($p, 1, $r, "-") :- $p <:$q - $r:>.
	    Expr($p, 0, $q, "/"), Expr($p, 1, $r, "/") :- $p <:$q / $r:>.

	    Eval(e, v) :- INT(e, v).
	    Eval(e, v) :- Expr(e, 0, l, "+"), Expr(e, 1, r, "+"), Eval(l, lv), Eval(r, rv), BIND(v, lv + rv).
	    Eval(e, v) :- Expr(e, 0, l, "*"), Expr(e, 1, r, "*"), Eval(l, lv), Eval(r, rv), BIND(v, lv * rv).
	    Eval(e, v) :- Expr(e, 0, l, "-"), Expr(e, 1, r, "-"), Eval(l, lv), Eval(r, rv), BIND(v, lv - rv).
	    Eval(e, v) :- Expr(e, 0, l, "/"), Expr(e, 1, r, "/"), Eval(l, lv), Eval(r, rv), BIND(v, lv / rv).

	    OurExprEval(v) :- <: ... :- ..., BIND(t, x*y + $e), ... .:>, Eval($e, v).
}

# OurExprEval = {3}.
OUTPUT('OurExprEval, "OurExprEval", "csv").

Type hierarchy for Java

P("tests/evaluation/withimport/evalTest_15_input.java").

java ('P) {
	ClassImplementsInterface(c, i) :-
		<: class `c implements .., `i, .. { .. } :>,
		ID(`c, c), ID(`i, i).
	InterfaceExtendsInterface(i, j) :-
		<: interface `i extends `j { .. } :>,
		ID(`i, i), ID(`j, j).
	ClassExtendsClass(c, d) :-
		<: class `c extends `d { .. } :>,
		ID(`c, c), ID(`d, d).
	ClassImplementsInterface(c, i), ClassExtendsClass(c, d) :-
		<: class `c extends `d implements .., `i, .. { .. } :>,
		ID(`c, c), ID(`d, d), ID(`i, i).
}

SuperClass(c, s) :- ClassExtendsClass(c, s).
SuperClass(c, s) :- ClassExtendsClass(c, d), SuperClass(d, s).

SuperInterface(i, s) :- InterfaceExtendsInterface(i, s).
SuperInterface(i, s) :- InterfaceExtendsInterface(i, j), SuperInterface(j, s).

Interface(c, i) :- ClassImplementsInterface(c, i).
Interface(c, i) :- SuperClass(c, d), Interface(d, i).
Interface(c, i) :- Interface(c, j), SuperInterface(j, i).

OUTPUT('Interface, "Interface.csv", "csv").
OUTPUT('SuperClass, "SuperClass.csv", "csv").
OUTPUT('SuperInterface, "SuperInterface.csv", "csv").

License

This repository is covered by a BSD 2-clause license, see LICENSE.

Debugging

The following commands are useful when debugging MetaDL:

  • Pretty print the desugared program in MetaDL format java -jar compiler.jar --pretty-print metadl program.mdl
  • Pretty print the desugared program in Souffle format java -jar compiler.jar --pretty-print metadl program.mdl
  • Enable internal debug printouts by setting METADL_LOG=debug|time|info in the environment.

Dependencies

SEP

SEP is an Earley parser implementation. We use it to parse the patterns.

JastAdd

JastAdd is a meta-compilation system that supports Reference Attribute Grammars (RAGs). It uses the parser generated from Beaver. In addition it takes an abstract grammar description file as input. The abstract grammar description is used to generate the classes that represent the AST.

ExtendJ

ExtendJ is an extensible Java compiler built using JastAdd.

Souffle

Souffl'e is a high performance Datalog engine that MetaDL uses as backend for evaluating complex queries that are too slow for the internal evaluator.

JUnit

JUnit is a unit testing framework.

JFlex

JFlex is a lexical analyzer generator.

Beaver

Beaver is a LALR(1) parser generator. The parser descriptions are written in EBNF-form.

Credits

Based on the Datalog implementation developed by Hampus Balldin for the Project Course in Computer Science, Faculty of Engineering LTH, Lund University.

About

MetaDL: Datalog analyzing Datalog, Java or any language of your choice

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •