[CUDA, DML] MatMul does not properly handle matrices with inner dim == 0 #21483
Labels
core runtime
issues related to core runtime
ep:CUDA
issues related to the CUDA execution provider
ep:DML
issues related to the DirectML execution provider
platform:windows
issues related to the Windows platform
stale
issues that have not been addressed in a while; categorized by a bot
Describe the issue
MatMul is expected to produce a valid result when it is multiplying matrices with inner dimension equal to zero.
For example, operands of shapes {16,0} x {0, 16} should produce a zero filled matrix of shape {16, 16}.
This is properly supported in CPU EP, but it is confirmed not to work in CUDA and DML providers.
This feature is necessary to support current design of Lora Adapaters in GenAI, as well as for correctness.
To reproduce
CUDA complains about dimensions equal to zero.
Urgency
No response
Platform
Windows
OS Version
Windows 11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.18.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
DirectML
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: