GitHub - lianakoleva/no-libtorch-compile

CUDA MODE IRL Project -- WIP

Background:

AOTInductor is a specialized version of TorchInductor , designed to process exported PyTorch models, optimize them, and produce shared libraries as well as other relevant artifacts.
These compiled artifacts are specifically crafted for deployment in non-Python environments, which are frequently employed for inference deployments on the server side.
A somewhat little known feature is that torch.compile can convert a PyTorch program into a C++ .so binary file.
We can then load the shared library, enabling us to conduct model predictions directly within a C++ environment.
Unfortunately, today, running it still requires a libtorch dependency (very large filesize especially with CUDA).
Upon inspection of the symbol table and other attributes of the binary , we see that there is a fairly limited amount of APIs that need to be shimmed.

Goal: Compile a PyTorch program into a (dependency-free) binary through torch.compile()

Started with a simple transpose kernel, then tried to expand support to Bert Maher's llama2.so project
AOTInductor internals, patching ELF files, dynamic linking, ABI-compatibility
lots of simplifying assumptions (no support for dynamic shapes)

APIs to shim:

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
Makefile		Makefile
README.md		README.md
compile.py		compile.py
model.py		model.py
run.cpp		run.cpp
sticky_cache.py		sticky_cache.py
triton-aoti.py		triton-aoti.py

Provide feedback