Skip to content

piotr-skotnicki/tc-optimizer

Repository files navigation

TC Optimizing Compiler 0.4.0

Latest release

Introduction

TC is an automatic source-to-source optimizing compiler for affine loop nests, generating sequential or parallel tiled code based on the application of a transitive closure of a loop nest dependence graph, and combining the Polyhedral Model and Iteration Space Slicing frameworks. TC utilizes a state-of-the-art polyhedral compilation toolchain, that is:

  • Polyhedral Extraction Tool [3] for extracting polyhedral representations of original loop nests,
  • Integer Set Library [1] for performing dependence analysis, manipulating sets and relations as well as generating output code,
  • Barvinok library [2] for calculating set cardinality and processing its representation.

In order to optimize a loop nest, one should be surrounded by #pragma scop and #pragma endscop directives:

int main()
{
  int N;
  int A[N+2][N+2];
#pragma scop
  for (int i = 1; i <= N; ++i) {
    for (int j = 1; j <= N; ++j) {
S1:   A[i][j] = A[i][j+1] + A[i+1][j] + A[i+1][j-1];
    }
  }
#pragma endscop
}

Note: The source file containing the loop nest should be valid C code, and simplified as much as possible. Array accesses must not exceed array bounds. Since version 0.3.0, iterators of a for loop must be declared inside that for loop itself, otherwise they will create a dependency for outer loops.

TC implements a number of tiling transformation algorithms as well as schedulers and code generators (including parallel generators utilizing OpenMP), all available to choose from through command line options (full description below). One is encouraged to experiment with various combinations of algorithms, schedulers and code generators, as well as tile sizes and transitive closure algorithms.

Note: TC is primarily used for studying algorithms utilizing transitive closure. Despite being able to generate efficient tiled code, some features are still in development.

For the example loop nest, the correction technique can be applied with a tile size of 32x32:

/* TC Optimizing Compiler 0.4.0 */
/* ./tc ../examples/other/correction.scop.c --correction-tiling --lex-scheduling --serial-codegen -b 32 */
#define min(x,y)    ((x) < (y) ? (x) : (y))
#define max(x,y)    ((x) > (y) ? (x) : (y))
#define floord(n,d) (((n)<0) ? -((-(n)+(d)-1)/(d)) : (n)/(d))
#pragma scop
for (int ii0 = 0; ii0 <= floord(N - 1, 32); ii0 += 1) {
  for (int ii1 = 0; ii1 <= (N - 1) / 32; ii1 += 1) {
    for (int i0 = 32 * ii0 + 1; i0 <= min(N, 32 * ii0 + 32); i0 += 1) {
      for (int i1 = max(1, 32 * ii0 + 32 * ii1 - i0 + 2); i1 <= 32 * ii1; i1 += 1) {
        A[i0][i1] = ((A[i0][i1 + 1] + A[i0 + 1][i1]) + A[i0 + 1][i1 - 1]);
      }
      if (32 * ii1 + 32 >= N) {
        for (int i1 = 32 * ii1 + 1; i1 <= N; i1 += 1) {
          A[i0][i1] = ((A[i0][i1 + 1] + A[i0 + 1][i1]) + A[i0 + 1][i1 - 1]);
        }
      } else {
        for (int i1 = 32 * ii1 + 1; i1 <= 32 * ii0 + 32 * ii1 - i0 + 33; i1 += 1) {
          A[i0][i1] = ((A[i0][i1 + 1] + A[i0 + 1][i1]) + A[i0 + 1][i1 - 1]);
        }
      }
    }
  }
}
#pragma endscop

The codes generated by TC for the studied kernels can be found in the results directory of the compiler’s repository.

See CHANGELOG to follow latest changes.

Installation

Dependencies:

automake autoconf libtool pkg-config libgmp3-dev libclang-dev
llvm libntl-dev g++ make git clang zlib1g-dev libglpk-dev

Downloading:

git clone https://github.com/piotr-skotnicki/tc-optimizer.git tc
cd tc
git submodule update --init --recursive

Compilation:

./autogen.sh
./configure
make

Manual

Usage:

tc <input.c> <algorithm> <scheduling> <codegen> [<closure>] [<options>...]

Hint: Use source scripts/tc-completion.bash to enable bash completions.

Algorithms:

--stencil-tiling             Concurrent start tiling for stencils
--regular-tiling             Tiling with regular tile shapes
--correction-tiling          Tiling with LT tiles correction
--correction-inv-tiling      Tiling with GT tiles correction
--merge-tiling               Tiling with tiles merging
--split-tiling               Tiling with tiles splitting
--mod-correction-tiling      Tiling with LT cyclic tiles modified correction

Schedulers:

--lex-scheduling               Lexicographic order execution
--isl-scheduling               Integer set library scheduler
--isl-wave-scheduling          Integer set library scheduler with wavefronting
--feautrier-scheduling         Integer set library scheduler (Feautrier scheduling)
--sfs-single-scheduling        Tiling of synchronization-free slices with single sources
--sfs-multiple-scheduling      Tiling of synchronization-free slices with multiple sources
--sfs-tile-scheduling          Tile-wise synchronization-free slices
--free-scheduling              Free scheduling based on R^+
--free-rk-scheduling           Free scheduling based on R^k
--free-finite-scheduling       Exact free scheduling for finite graphs
--dynamic-free-scheduling      Dynamic free scheduling

Code generators:

--serial-codegen       Serial code generator
--omp-for-codegen      OpenMP parallel for generator
--omp-task-codegen     OpenMP parallel task generator
--omp-gpu-codegen      OpenMP offloading to GPU target

Transitive closure:

--isl-map-tc           ISL normalized map transitive closure (default)
--isl-union-map-tc     ISL union map transitive closure
--floyd-warshall-tc    Floyd-Warshall algorithm
--iterative-tc         Iterative algorithm
--omega-map-tc         Omega normalized map transitive closure
--omega-union-map-tc   Omega union map transitive closure
--tarjan-tc            Tarjan algorithm for finite graphs

Options:

-b <value>           Tile size, e.g. -b 256 -b S1:128,128 (default: 32)
--debug   | -d       Verbose mode
--report             Generate tile statistics report (use -R for each parameter)
--inline             Always inline loop bounds expressions
-D <name>=<value>    Define parameter value, e.g. -D M=2000 -D N=2600
-R <name>=<value>    Set parameter value for report generation, e.g. --report -R M=2000 -R N=2600
--cache <value>      Cache line length in bytes (default: 64)
--use-macros         Use macro definitions in place of statements
--yes     | -y       Non-interactive mode
--version | -v       Print compiler info
--help    | -h       Print help

Examples

./src/tc ./examples/stencils/heat-1d.scop.c --stencil-tiling --omp-for-codegen -b 150,25000 --debug
./src/tc ./examples/polybench/bicg.scop.c --correction-tiling --sfs-single-scheduling --omp-for-codegen -b 8
./src/tc ./examples/polybench/trisolv.scop.c --merge-tiling --free-scheduling --omp-task-codegen -b S1:16 -b S2:16,8 -b S3:16

Contact

In case of questions/problems/bugs, please contact:

Piotr Skotnicki <pskotnicki@zut.edu.pl>

West Pomeranian University of Technology
Faculty of Computer Science and Information Technology
ul. Zolnierska 49, 71-210 Szczecin, Poland

References

[1] Verdoolaege S (2010) ISL: an integer set library for the polyhedral model. In: Mathematical software--ICMS 2010, Lecture notes in computer science. vol 6327. Springer, Berlin, pp 299--302

[2] Verdoolaege S, Seghir R, Beyls K et al. Algorithmica (2007) 48: 37. https://doi.org/10.1007/s00453-006-1231-0

[3] Verdoolaege S, Grosser T (2012) Polyhedral extraction tool. In: Proceedings of the 2nd international workshop on polyhedral compilation techniques. Paris, France