Skip to content

Workflow of implementing the OpenMP transformation using LLVM runtime

Anjia Wang edited this page Oct 28, 2020 · 5 revisions

1. Introduction

Currently, the REX compiler uses GOMP/Omni OpenMP runtime to transform the OpenMP code. Our goal is to implement the transformation using LLVM runtime and remove the support to the other two runtimes eventually.

2. Start with a minimum program

For any unsupported OpenMP constructs, we should start with a minimum program, which only has the targeting pragma. Then we can work on more complicated cases later. For example, assume we are trying to support the parallel directive. The following code parallel.c is a good starting point.

#include <stdio.h>

int main(int argc, char** argv) {

  #pragma omp parallel
    printf("Test.\n");

  return 0;
}

In this example, we can easily count how many lines of output to verify whether it works. The transformed code will be relatively simpler for us to review. Any other clauses, such as private or num_threads, should not be added at this point. They can be addressed later when parallel itself is well supported.

3. Find relevant information

Depending on how familiar you are with the targeting OpenMP code, some of or all the following information could be checked.

3.1 OpenMP specs

First, we can check the official OpenMP specs to understand the syntax and semantic here. There is no need to cover every detail at the moment.

3.2 ROSE transformed code

We can check the code generated by ROSE to see how its transformation works. It will give us some idea in general, although its implementation could differ from using LLVM runtime. To generate the transformed code of the example above:

rose-compiler -rose:openmp:lowering -rose:skipfinalCompileStep parallel.c

It will produce a file named rose_parallel.c and skip the final compilation. From this transformed code, it can be noticed that the parallel directive and its parallel region is converted to an outlined function and a few related function calls. Then we know the basic idea is to create the outlined function and call it properly.

3.3 LLVM transformed code

Unlike ROSE, Clang/LLVM generates LLVM IR instead of the C/C++ code. It requires some knowledge of assembly code to read. To get the transformed LLVM IR:

clang -fopenmp -emit-llvm -S parallel.c

Then a file named parallel.ll is produced. We can see a few LLVM runtime functions there, which we will use as well. Besides the function calls, we also need to fully understand the workflow of transformed code, such as what exactly parameters should be prepared, how and when they are passed, how and when the functions are called. For example, there is only one LLVM runtime function in this case: __kmpc_fork_call. There are a few section jumps, which indicate the outlined function and control flows.

3.4 LLVM documents

After finding out what LLVM runtime functions are required, we need to check their official documents or google anything not listed in the documents. A handy official reference is here. However, it hasn't been updated since 2015. Some information may not be correct.

We can check LLVM's official repository on GitHub here. Since we are using Clang/LLVM 10.x, please check the release/10.x branch. The OpenMP related code is mostly located at llvm-project/openmp. For instance, the definition of __kmpc_fork_call is in the file llvm-project/openmp/runtime/src/kmp_csupport.cpp.

4. Write the manually transformed code

By checking LLVM IR and the definition of __kmpc_fork_call, we know the function has three parameters: source location, number of outlined function parameters, and the outlined function. In our example, there are no variables in the parallel region but a printing. Therefore, we don't need any parameters in the outlined function. The information about the source location is optional. Therefore, we can write the following manually transformed code llvm_parallel.c.

#include <stdio.h>

void outlined_function(int* global_id, int* bound_id) {
    printf("Test.\n");
}

int main(void) {

  __kmpc_fork_call(NULL, 0, outlined_function);

  return 0;
}

In this case, int* global_id and int* bound_id are two parameters required by LLVM for outlined functions. We can get this information from the reference and source code.

5. Test the manually transformed code

First, we compile and run the original code using Clang to check the output.

clang -fopenmp parallel.c -o original.out
./original.out

We have to enable the flag -fopenmp to compile the OpenMP code.

Then we compile the manually transformed code using Clang or GCC. Assume the LLVM OpenMP runtime libomp.so is installed at ~/Projects/llvm_install/lib.

clang llvm_parallel.c ~/Projects/llvm_install/lib/libomp.so -o llvm.out
./llvm.out

Since we have transformed the OpenMP code, no flag is required. By comparing two outputs, we can know whether the transformed code is working.

For this simple example, we didn't write a header file to declare __kmpc_fork_call. Clang will throw a warning but still compile the code correctly. However, we need to declare all the data structures and LLVM runtime functions in a header file included by the transformed code in the actual development. In REX compiler, this particular header file rex_kmp.h can be checked here.

6. Implement the transformation in REX

There are many details to cover to implement the transformation. We only pick the most important step as an example, inserting the LLVM runtime function call. In REX/ROSE compiler, the feature to outline a function is ready for use, including handling the necessary parameters. Most of the time, we don't need to change much for that part.

We can use the builder functions of ROSE to insert a function call. Specifically, the following code is required to call __kmpc_fork_call.

...
/*
  parameter2 is the outlined function parameters generated by ROSE.
  p_scope is the working scope. In this case, it refers to the function having the parallel directive.
*/
// Create NULL for the source location
SgExpression* source_location_info = buildIntVal(0);
// ROSE always creates exactly one parameter for the outlined function.
SgExpression* outlined_function_parameter_amount = buildIntVal(1);
// Prepare three parameters for __kmpc_fork_call.
parameters = buildExprListExp(source_location_info, outlined_function_parameter_amount, buildFunctionRefExp(outlined_func), parameter2);
// Create a statement SgNode to call __kmpc_fork_call with proper parameters. The return type is void.
SgExprStatement * s1 = buildFunctionCallStmt("__kmpc_fork_call", buildVoidType(), parameters, p_scope);
...

Please note that we only created a statement of the function call in the code above but did not actually insert the statement to the final code. There are multiple ways to insert a statement in ROSE and set up its relationship with other nodes in the context. We must carefully read the original code and under the context, then choose the correct method to insert the statement.

Please check the commit here for more details to implement the parallel directive. Again, DO NOT simply use the same method to insert a node as in this commit.

7. Test the code transformed by REX

After updating the source code of transformation, we need to rebuild the REX compiler. Normally, it doesn't require starting from scratch unless you encounter any weird issues that can't be resolved.

rose-compiler -rose:openmp:lowering -rose:skipfinalCompileStep parallel.c
clang rose_parallel.c ~/Projects/llvm_install/lib/libomp.so -o rex.out
./rex.out

If everything works as expected, it should give the same output as the original one. If not, we can check the generated file rose_parallel.c and modify something for quick debugging or go back to the REX source code.