-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How GGML is different from ONNX #3022
Comments
I also want to ask this question. |
I think the question is a little ambiguous. GGML could mean the machine language library itself, the file format (now called GGUF) or maybe even an implementation based on GGML that can do stuff like run inference on models (llama.cpp). From the GGML as a library side, there isn't really a "format" for the graph, there's an API you can use to construct the graph. Likewise for the weights, they don't have to come from a GGML/GGUF format file at all. Just for example, my little Rust RWKV implementation over here actually only loads models for PyTorch or SafeTensors format files and dynamically quantizes the tensors. |
Glancing through ONNX GitHub readme, from what I understand ONNX is just a "model container" format without any specifics associated inference engine, whereas GGML/GGUF are part of an inference ecosystem together with ggml/llama.cpp. So the difference would be roughly similar to a 3d model vs unreal engine asset. |
@staviq sorry for not being clear, but for inference onnx can use onnxruntime which has multiple backends/ optimisations support. |
I see, thank you for clarification. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
I am looking to create an exhaustive pros and cons list for ONNX vs GGML, and would like some help if someone can describe or give pointers on how GGML is different from ONNX.
Currently I am aware that GGML supports 4bit-quantization and follows a no-dependency approach (as mentioned here), and the format in which it creates the computation graph and stores the weights with optimizations (if any) is different.
Apart from this what are the differentiating factors here?
The text was updated successfully, but these errors were encountered: