Merged ONNX decoder next steps #784

fxmarty · 2023-02-15T15:17:19Z

Feature request

The PR #647 was merged that adds support for merged without/with past decoder as a single ONNX file, along with inference in ORTModelForCausalLM.

Some key steps are still remaining:

Support IO Binding Support IO Binding for merged decoder #797
Find a way to hide the prints 2023-02-10 16:29:24.868007832 [W:onnxruntime:, graph.cc:3487 CleanUnusedInitializersAndNodeArgs] Removing initializer '/transformer/h.4/attn/Constant_18_output_0'. It is not used by any node and should be removed from the model., tracked in CleanUnusedInitializersAndNodeArgs warnings are printed only with subgraphs microsoft/onnxruntime#14694
Fix the generation of dummy past key values for bloom that is currently ugly
Investigate why codegen does not support -with-past in tasks.py
Support the merge for Seq2Seq model
Support ONNX Runtime optimizations along with merged models Support ONNX Runtime optimizations in exporters.onnx #807

Motivation

Reduce memory usage

Your contribution

/

The text was updated successfully, but these errors were encountered:

fxmarty · 2023-02-23T13:10:15Z

Hi @un-certainty , yes if you are using CUDAExecutionProvider, using IO Binding is probably helpful. I don't have a proper benchmark at hand though.

Also I wonder if the caches are perserved on GPU, will it potentially cause a memory explosion? When the QPS is high and sequences are long, there will be a lot of intermediate tensors. I'm not sure if this could lead to OOM.

I would say it could, yes.

fxmarty added onnxruntime Related to ONNX Runtime onnx Related to the ONNX export labels Feb 15, 2023

fxmarty mentioned this issue Feb 16, 2023

Validating ONNX model fails for GPT-J #607

Closed

4 tasks

xenova mentioned this issue Mar 10, 2023

TypeError: Cannot convert undefined to a BigInt huggingface/transformers.js#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merged ONNX decoder next steps #784

Merged ONNX decoder next steps #784

fxmarty commented Feb 15, 2023 •

edited

Loading

fxmarty commented Feb 23, 2023

Merged ONNX decoder next steps #784

Merged ONNX decoder next steps #784

Comments

fxmarty commented Feb 15, 2023 • edited Loading

Feature request

Motivation

Your contribution

fxmarty commented Feb 23, 2023

fxmarty commented Feb 15, 2023 •

edited

Loading