comparison to NN-512/ direct (C) code generation #1668

benmkw · 2022-09-02T09:31:44Z

I was looking into deploying onnx models as native code and came across NN-512 which was basically exactly what I was looking for, except the input format is custom text instead of a common format like onnx and the output is not very portable.

basic comparison to onnx https://jonatron.github.io/test_nn512/

Is a direct c backend for onnx planned/ possible as well? This would make compilation easier as one would just need a onnx -> c compiler and could then use existing tooling to produce the binary. If the resulting c code could be optionally created without simd intrinsics it would immediately be portable to wasm and embedded because it would basically just be iteration and arithmetic without special operations.

A c backend could also probably relatively easily be extended to rust/ zig/ (maybe halide https://github.com/halide/Halide), making it easy to deploy models with minimum friction in each language ecosystem. This would make a onnx -> c compiler effectively be something like https://github.com/shinh/elvm , but for onnx models, the opcodes being https://github.com/onnx/onnx-mlir/blob/d0b60c3c1e948afd12af5f0e0f0b969c0cee5cfa/docs/Dialects/onnx.md .

Generating c code instead of llvm is not uncommon for language implementations such as https://github.com/pervognsen/bitwise/blob/master/ion/gen.c (even llvm had a c backend),

onnx-mlir seems to be one step further such that it targets multiple languages at once and may be able to perform better optimisations but after looking at NN512 I got the impression that a basic/ primitive translation to native code could already provide much value.

Maybe this already exists for onnx and I did not find it?

My understanding of the current state of standard onnx inference is that a runtime loads a protobuf file at runtime and then interprets it (basically like python loads a file and interprets it). This seems to waste the opportunity of compile time optimisations and also makes deployment more difficult because multiple files and io (non pure computation) is involved. These are similar reasons one might choose c over python in the first place, thus it seems counterintuitive to me that one would call onnx this way from a compiled language.

AlexandreEichenberger · 2022-09-06T14:49:33Z

I would look into using onnx-mlir lowering to llvm (onnx-mlir -EmitLLVMIR) and then looking if you can feed that to mlir. I believe they have a C dialect, but no-one at IBM has used it, I believe. It is also possible to intercept the output of onnx-mlir before the LLVM dialect, then you may have to deal with a few krnl op... but not that many.

Let us know how your experiment goes, I am sure others might be interested in the outcome too.

Note that there are efforts to link onnx-mlir to mlir via HLO or TOSSA, and a proposal for Torch/MLIR.

benmkw · 2022-09-06T16:12:27Z

Thanks for the response! The MLIR C backend sounds interesting although it might generate cpp in many cases which would require wrapping that once more into c to make interop with other languages possible.

It seems that the current onnx-mlir api is very weakly typed and basically trusts the user to get it right https://github.com/onnx/onnx-mlir/blob/main/docs/mnist_example/Mnist.java / there is no step which actually inspects the onnx file inputs and generates language specific types from it? (this is the first important step I for me)

Maybe its still a better idea to write more langauge specific wrappers for onnx-mlir similar to what is already present for e.g. java, I think the main thing I need to look at here is how to generate proper types such that the function arguments from e.g. rust can be properly typed to provide safe/ reliable model execution. This could be a build script which generates the object file and a corresponding c file and rust file and does the linking automagically. For this it would be really good if onnx-mlir would be more easily available as a library rather than a binary as mentioned in #1597. Another problem I foresee are different llvm versions of the rust/ zig/ whatever compiler and the onnx-mlir project which might lead to incompatibilities, maybe also around LTO which seems to be especially difficult in such mixed scenarios. (This is also a reason why I think it would be good idea to generate source code in the language x rather than linking some object file into a project of language x).

The krnl ops https://github.com/onnx/onnx-mlir/blob/42c543dd73e282049059f23fa97e1d6fbdf8883a/docs/Dialects/krnl.md look interesting although I'm not sure how soon I'll have time to actually build the project/ modify it to the extend that I could attempt that experiment, but I'd definitely be very interested in trying this. (this would be the second important step for me although it would also probably include the library-fication of onnx-mlir) Some ops seems to be a little bit ad-hoc (find_index) and some bit high level (strlen) but something along the lines of the krnl ops would definitely be important. The main question is whether implementing the onnx ops directly after parsing the protobuf is easier/ mores or less performant than implementing the krnl ops. I think a smaller set of ops along the lines of https://github.com/geohot/tinygrad/blob/master/tinygrad/llops/ops_cpu.py would be preferable.

Note that there are efforts to link onnx-mlir to mlir via HLO or TOSSA, and a proposal for Torch/MLIR.

Given that MLIR does not yet seem to have widespread use compared to the traditional llvm ir I think converting the models directly into existing languages would make widespread adoption easier. (Although MLIR seems to be very promising in the long run for sure)

If there is a meeting I could attend to where some ideas could be discussed I'd also try to join #1666

AlexandreEichenberger · 2022-09-06T18:05:43Z

Added #1676 for meeting invite.

benmkw · 2022-09-06T18:15:14Z

Thanks, It's great to see that the meeting is so open, unfortunately it seems that this is in the middle of the night in Germany (where I'm from) so I'll have to pass. If you happen to have any feedback I look forward to hearing it though and I will follow this project some more 👍

benmkw · 2022-09-15T18:37:08Z

I figured I would be able to attend on Sept. 20th and made some preliminary slides. I'd be especially interested in feedback/ discussion after the (short) presentation.

onnx_benedikt_mandelkow.pdf

cc @AlexandreEichenberger

AlexandreEichenberger · 2022-09-15T18:43:00Z

thanks, added pointer to this issue to our wiki agenda.

benmkw · 2022-09-29T18:00:09Z

I managed to attend the last meeting.

The conclusion was that a language specific codegen option would be too large of a maintenance burden for ONNX and would thus need to be added to MLIR directly. The existing C codegen option of MLIR should be explored. This could then be used to generate code which only requires a c compiler for each user of a model, instead of the onnx-mlir compiler.

Regarding better language bindings with stronger typedefs it was said, that the existing object files which onnx-mlir produces already contain a function which, when called, produce a json output of the types of the inputs and outputs of the model. This information could be extracted at compile time and could be used to generate more precise types for the language specific bindings. A separate issue with a more precise proposal would be welcome regarding this.

benmkw · 2022-10-05T09:55:00Z

regarding the portability concerns sonos/tract#580 and sonos/tract#393 (comment) are relevant user stories that fit into this issue

I just tried to build onnx-mlir natively on macos and its rather difficult, compiling llvm myself or using docker is quite a bit of overhead which would be added to every developer interacting with a library which uses NN and uses onnx-mlir for doing the inference part. (this could be solved in onnx-mlir if the c backend worked but I did not have enough time to test that yet)

benmkw mentioned this issue Oct 9, 2022

[Feature request] Machine Readable definition of Operations (Schema file instead of Markdown or cc file) onnx/onnx#4578

Open

benmkw mentioned this issue Feb 19, 2023

C backend (C code generation from model) tinygrad/tinygrad#568

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

comparison to NN-512/ direct (C) code generation #1668

comparison to NN-512/ direct (C) code generation #1668

benmkw commented Sep 2, 2022 •

edited

Loading

AlexandreEichenberger commented Sep 6, 2022

benmkw commented Sep 6, 2022

AlexandreEichenberger commented Sep 6, 2022

benmkw commented Sep 6, 2022

benmkw commented Sep 15, 2022

AlexandreEichenberger commented Sep 15, 2022

benmkw commented Sep 29, 2022

benmkw commented Oct 5, 2022

comparison to NN-512/ direct (C) code generation #1668

comparison to NN-512/ direct (C) code generation #1668

Comments

benmkw commented Sep 2, 2022 • edited Loading

AlexandreEichenberger commented Sep 6, 2022

benmkw commented Sep 6, 2022

AlexandreEichenberger commented Sep 6, 2022

benmkw commented Sep 6, 2022

benmkw commented Sep 15, 2022

AlexandreEichenberger commented Sep 15, 2022

benmkw commented Sep 29, 2022

benmkw commented Oct 5, 2022

benmkw commented Sep 2, 2022 •

edited

Loading