Can't export GPT2-XL to ONNX: ModelProto exceeds maximum protobuf size of 2GB #4707

klimentij · 2020-08-04T23:11:52Z

Describe the bug
I use onnxruntime/onnxruntime/python/tools/transformers/benchmark_gpt2.py script to benchmark and export GPT2-XL (1.5B) to ONNX and apply optimizations:

python benchmark_gpt2.py \
--model_name "gpt2-xl" \
--cache_dir "./cache_models" \
--onnx_dir="./gpt2_xl_onnx_past" \
--test_times 10 \
--precision fp16 \
--optimize_onnx \
--use_gpu \
--batch_sizes "1" \
--past_sequence_lengths 1000 \
--result_csv gpt2_results.csv

When I use it for gpt2-large, it works without a problem. When I switch model_name to gpt2-xl, it shows that optimizations are being applied, but fails to save optimized model to disk:

...
Output model to ./gpt2_xl_onnx_past/_past_fp16.onnx
Traceback (most recent call last):
  File "benchmark_gpt2.py", line 258, in <module>
    main()
  File "benchmark_gpt2.py", line 152, in main
    model.config.num_attention_heads, model.config.hidden_size)
  File "/home/jupyter/onnxruntime/onnxruntime/python/tools/transformers/gpt2_helper.py", line 252, in optimize_onnx
    m.save_model_to_file(optimized_model_path)
  File "/home/jupyter/onnxruntime/onnxruntime/python/tools/transformers/onnx_model.py", line 668, in save_model_to_file
    save_model(self.model, output_path, format=None)
  File "/opt/conda/lib/python3.7/site-packages/onnx/__init__.py", line 186, in save_model
    s = _serialize(proto)
  File "/opt/conda/lib/python3.7/site-packages/onnx/__init__.py", line 67, in _serialize
    result = proto.SerializeToString()
ValueError: Message ONNX_REL_1_7.ModelProto exceeds maximum protobuf size of 2GB: 3276141735

I also added use_external_data_format=True to torch.onnx.export() method in gpt2_helper.py and expected that it would help, but it did not. I can't use other scripts (benchmark.py) because I need GPT2 to be exported with the past state support.

Urgency
I'm blocked on my current GPT2 deployment project because of this issue. The model is approximately 4x more expensive and slow in our production without ONNX optimizations.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian 9
ONNX Runtime installed from (source or binary): Binary
ONNX Runtime version: 1.4.0
Python version: 3.7.6
CUDA/cuDNN version: 10.1
GPU model and memory: Nvidia T4 16Gb + 51Gb RAM
Pytorch version: 1.5.0+cu101 (has to support use_external_data_format flag in torch.onnx.export())

To Reproduce

Describe steps/code to reproduce the behavior.

Add use_external_data_format=True to torch.onnx.export() method in gpt2_helper.py
Add gpt2-xl to PRETRAINED_MODELS list in benchmark_gpt2.py
Run:

python benchmark_gpt2.py \
--model_name "gpt2-xl" \
--cache_dir "./cache_models" \
--onnx_dir="./gpt2_xl_onnx_past" \
--test_times 10 \
--precision fp16 \
--optimize_onnx \
--use_gpu \
--batch_sizes "1" \
--past_sequence_lengths 1000 \
--result_csv gpt2_results.csv

Attach the ONNX model to the issue (where applicable) to expedite investigation.
https://huggingface.co/gpt2-xl

Expected behavior
gpt2-xl is benchmarked and exported to ONNX without any error.

The text was updated successfully, but these errors were encountered:

liqunfu · 2020-08-05T17:15:40Z

according to onnx.save_model, if you pass in 'f' a path, it will save external tensors which will avoid 2gb limit. May you give onnx_model_path a folder name and see what happens?

klimentij · 2020-08-05T20:29:40Z

@liqunfu unfortunately it didn't seem to help. I edited onnx_model.py file so that save_model_to_file function sends a folder name instead of file path to onnx.save_model:

def save_model_to_file(self, output_path):
        
        output_folder = os.path.dirname(output_path)
        
        logger.info(f"Output model to {output_path}")
        logger.info(f"Output model to folder {output_folder}")
        

        if output_path.endswith(".json"):
            assert isinstance(self.model, ModelProto)
            with open(output_path, "w") as out:
                out.write(str(self.model))
        else:
            save_model(self.model, output_folder, format=None)
            #external_data_helper.convert_model_to_external_data(self.model, all_tensors_to_one_file=True, location = output_path + ".data")
            #with open(output_path, "wb") as out:
            #    out.write(self.model.SerializeToString())

I also made sure f is the second parameter in save_model:

save_model(proto, f, format=None)

According to the log, this method is indeed called and output_folder is indeed a folder, but it didn't help:

Output model to ./gpt2_xl_onnx_past/_past_fp16.onnx
Output model to folder ./gpt2_xl_onnx_past
Traceback (most recent call last):
  File "benchmark_gpt2.py", line 258, in <module>
    main()
  File "benchmark_gpt2.py", line 152, in main
    model.config.num_attention_heads, model.config.hidden_size)
  File "/home/jupyter/onnxruntime/onnxruntime/python/tools/transformers/gpt2_helper.py", line 252, in optimize_onnx
    m.save_model_to_file(optimized_model_path)
  File "/home/jupyter/onnxruntime/onnxruntime/python/tools/transformers/onnx_model.py", line 674, in save_model_to_file
    save_model(self.model, output_folder, format=None)
  File "/opt/conda/lib/python3.7/site-packages/onnx/__init__.py", line 186, in save_model
    s = _serialize(proto)
  File "/opt/conda/lib/python3.7/site-packages/onnx/__init__.py", line 67, in _serialize
    result = proto.SerializeToString()
ValueError: Message ONNX_REL_1_7.ModelProto exceeds maximum protobuf size of 2GB: 3276141735

tianleiwu · 2020-08-06T00:33:35Z

@klimentij, I can reproduce the problem. I will try modify the save_model_to_file function and let you know when there is progress.

tianleiwu · 2020-08-06T17:27:58Z

@klimentij, It seems that the following change could help export large model to ONNX:

    def save_model_to_file(self, output_path):
            from pathlib import Path
            external_data_helper.convert_model_to_external_data(self.model, all_tensors_to_one_file=True, location = Path(output_path).name + ".data")       
            save_model(self.model, output_path)

The output model will contain two files like name.onnx and name.onnx.data. I'll send a pull request later after more testing.

It seems that the model is very large, so benchmark will get out of memory exception when both PyTorch model and ONNX model are loaded in V100 GPU (with 16GB memory). After ONNX model is exported, use ONNX model only might avoid the problem.

klimentij · 2020-08-08T15:05:52Z

Thank you @tianleiwu! I managed to export it to .onnx and .onnx.data files using the edit you suggested.

klimentij closed this as completed Aug 8, 2020

tianleiwu mentioned this issue Aug 10, 2020

Update benchmark for large model or model name with non-alphanumeric. #4743

Merged

klimentij mentioned this issue Aug 11, 2020

Can't make GPT2 working with past state #4756

Closed

Ki6an mentioned this issue May 3, 2021

t5-11b out of memory/FileNotFoundError Ki6an/fastT5#11

Closed

This was referenced May 14, 2022

[Snyk] Security upgrade mocha from 8.3.2 to 9.1.2 ekmixon/onnxruntime#75

Open

[Snyk] Fix for 1 vulnerabilities ekmixon/onnxruntime#76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't export GPT2-XL to ONNX: ModelProto exceeds maximum protobuf size of 2GB #4707

Can't export GPT2-XL to ONNX: ModelProto exceeds maximum protobuf size of 2GB #4707

klimentij commented Aug 4, 2020 •

edited

Loading

liqunfu commented Aug 5, 2020

klimentij commented Aug 5, 2020

tianleiwu commented Aug 6, 2020

tianleiwu commented Aug 6, 2020 •

edited

Loading

klimentij commented Aug 8, 2020

Can't export GPT2-XL to ONNX: ModelProto exceeds maximum protobuf size of 2GB #4707

Can't export GPT2-XL to ONNX: ModelProto exceeds maximum protobuf size of 2GB #4707

Comments

klimentij commented Aug 4, 2020 • edited Loading

liqunfu commented Aug 5, 2020

klimentij commented Aug 5, 2020

tianleiwu commented Aug 6, 2020

tianleiwu commented Aug 6, 2020 • edited Loading

klimentij commented Aug 8, 2020

klimentij commented Aug 4, 2020 •

edited

Loading

tianleiwu commented Aug 6, 2020 •

edited

Loading