Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA EP] Fix BeamSearch on T5 with sequence_as_input_ids (#20667) #20668

Merged
merged 10 commits into from
Dec 11, 2024

Conversation

amancini-N
Copy link
Contributor

Description

Change the implementation of BeamSearch op when using CUDA EP: in case of T5 model, and in case the decoder input_ids are sequences, copy the sequences device-to-device instead of host-to-device

Motivation and Context

@tianleiwu
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@tianleiwu
Copy link
Contributor

@amancini-N, could you take a look at those build and test errors in CI pipeline. Let me know if you need help to resolve them.

@amancini-N
Copy link
Contributor Author

/azp run Linux CPU CI Pipeline

Copy link

Commenter does not have sufficient privileges for PR 20668 in repo microsoft/onnxruntime

@tianleiwu
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-linux-gpu-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,CoreML CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline

Copy link

Azure Pipelines successfully started running 6 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

- Fixing initialization of encoder_hidden_states feed in T5 decoder
- Re-generating data for T5 BeamSearch test
@amancini-N
Copy link
Contributor Author

@tianleiwu I had time to come back on this. test should be fixed now. The issue was caused by the fact that encoder_hidden_states input of decoder was not prepared correctly. Fixed that part and re-generated the test sample

@tianleiwu
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-linux-gpu-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,CoreML CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline

Copy link

Azure Pipelines successfully started running 6 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@tianleiwu
Copy link
Contributor

/azp run Windows CPU CI Pipeline, ONNX Runtime Web CI Pipeline, Windows GPU CUDA CI Pipeline, Linux ROCm CI Pipeline, Linux OpenVINO CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@tianleiwu
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models

@tianleiwu
Copy link
Contributor

/azp run Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 7 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@tianleiwu
Copy link
Contributor

@amancini-N, Added some suggestion with static_cast to avoid code scan warnings. The other looks good to me.

@amancini-N
Copy link
Contributor Author

@tianleiwu sure thing, done

@tianleiwu
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models

@tianleiwu
Copy link
Contributor

/azp run Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 7 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@tianleiwu tianleiwu merged commit d8de3c4 into microsoft:main Dec 11, 2024
85 checks passed
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
…20667) (microsoft#20668)

### Description
Change the implementation of BeamSearch op when using CUDA EP: in case
of T5 model, and in case the decoder input_ids are sequences, copy the
sequences device-to-device instead of host-to-device

### Motivation and Context
- Fixes microsoft#20667
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
…20667) (microsoft#20668)

### Description
Change the implementation of BeamSearch op when using CUDA EP: in case
of T5 model, and in case the decoder input_ids are sequences, copy the
sequences device-to-device instead of host-to-device

### Motivation and Context
- Fixes microsoft#20667
@amancini-N amancini-N deleted the 20667-fix-beam-search-gpu branch December 11, 2024 10:27
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
…20667) (microsoft#20668)

### Description
Change the implementation of BeamSearch op when using CUDA EP: in case
of T5 model, and in case the decoder input_ids are sequences, copy the
sequences device-to-device instead of host-to-device

### Motivation and Context
- Fixes microsoft#20667
tarekziade pushed a commit to tarekziade/onnxruntime that referenced this pull request Jan 10, 2025
…20667) (microsoft#20668)

### Description
Change the implementation of BeamSearch op when using CUDA EP: in case
of T5 model, and in case the decoder input_ids are sequences, copy the
sequences device-to-device instead of host-to-device

### Motivation and Context
- Fixes microsoft#20667
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BeamSearch op returning wrong results on CUDA execution provider when sequence is used as input_ids
2 participants