Welcome to OptML! This repository is designed for those new to MLIR and machine learning-based optimizations. As a compiler enthusiast, I wanted to create a platform for hobbyists like myself to experiment with and benchmark new optimizations on real ML models in an out-of-tree manner. This project is heavily inspired by mlir-tutorial, which laid the foundation for my learning and development.
- Vision Models
- Benchmarking Options
- Build Instructions
- Usage Guide
- Benchmarking Process
- Files of Interest
The repository includes three vision models generated from TorchScript:
- AlexNet
- VGG11
- ResNet152
OptML supports multiple benchmarking methodologies:
- Google Benchmarks
- Hardware Counters (PAPI)
- C++ Chrono library
Before building and running OptML, make sure you have the following installed:
- CMake (version 3.20 or higher)
- PAPI (for hardware counter support)
- Python 3.x (for script execution)
- C/C++ compiler (clang 17 /gcc 11 or higher)
Ensure that these dependencies are installed and configured correctly before proceeding with the build instructions.
git clone https://github.com/mvvsmk/OptML.git
cd OptML
git submodule update --init --recursive
./build_llvm.sh # Builds the LLVM submodule
# please note while doing check-mlir build might fail but it doesn't affect the project.
./build_mlir.sh # Builds the project-opt tool with out-of-tree optimizations
./benchmark.sh chrono Alexnet --affine-64-unroll
Let's walk through how to use this repository, specifically using the Affine64Unroll
pass I implemented.
- Headers: Located in
$rootdir/include/Transform/Affine/
- Implementation: Located under
$rootdir/lib/Transform/Affine/
- CMake File: The CMake file in the implementation folder is straightforward; ensure you include the necessary libraries for your use case.
- Register the Pass: Register your pass with the
project-opt
tool.
Now, you're all set!
To run your own benchmarks, use the command:
$rootdir/benchmark.sh [benchmark_type] [ML_model] [benchmark_flag] [PAPI_event_name]
Warning
Before interpreting your benchmark results, it's important to understand how the benchmarking process works. This looks like a 29% increase but what you miss, is the object file size increase from 16KB to 2.5MB. XD . This was measured on a Intel(R) Core(TM) Ultra 9 185H, in a single threaded manner, this measurement also includes array initilization to all zeros.
When the object file is compiled, two important passes are run:
--rem-forward-func-args-and-return-run-mlir-zero-init
:- This pass removes the arguments and return values of the
forward
function to make all functions uniform. - You can choose between two variants: one where the arguments are zero-initialized, and another where they remain uninitialized (resulting in undefined behavior).
- Uninitialized one takes less memory to compile as adding the initialization ends up creating a lot of instructions which takes up a lot of ram. The default behaviour is to zero init the argument (usually the picture for the model) but if you are feeling lucky and want to experiment with undefined behviour change the following in make_MLIR_obj.py :
- This pass removes the arguments and return values of the
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Compile MLIR files to object files')
parser.add_argument('input_folder', type=str, help='Absolute path to the folder with MLIR files or the only MLIR file')
parser.add_argument('output_folder', type=str, help='Absolute path to the folder where object files will be stored')
- parser.add_argument('--mlir-flags', required=False,default="--rem-forward-func-args-and-return-run-mlir-zero-init --rem-global-constants-run-mlir" ,type=str, help='Flags to be passed to project-opt')
+ parser.add_argument('--mlir-flags', required=False,default="--rem-forward-func-args-and-return-run-mlir --rem-global-constants-run-mlir" ,type=str, help='Flags to be passed to project-opt')
--rem-global-constants-run-mlir
:- This pass removes global constants and places them inside the main function.This is to prevent some
<stdin>:5:3: error: resource does not exist
errors
- This pass removes global constants and places them inside the main function.This is to prevent some
After the modified MLIR is generated, it is compiled into an object file without any optimizations (creating either the original or oracle version). For benchmarking, the object file is linked against an empty C++ file that benchmarks the function:
extern "C" void forward();
When you run a pass using the benchmark.sh
script, it generates a Modified.mlir
file, which is then processed through the same pipeline and linked for benchmarking.
All these benchmarks are run in a single threaded manner without sudo taskset -c .
You can use the *.cpp and *.h files present in the benchmarks/Hardware_Counters_or_Time
folder and add any custom parameters you want to measure.
Do go through these files if you want to lean more on how the benchmarks actually compile and execute.
benchmark.sh
: Executes your pass and compares the results against the original.make_MLIR_obj.py
: Converts the MLIR file to an object file for benchmarking.