RuntimeError: _int_mm_out_cuda not compiled for this platform. #130928

mattiadg · 2024-07-17T13:13:46Z

🐛 Describe the bug

Hi all, I have encountered this issue while trying to work with models quantized to 8 bits. For instance, I want to add an example to optimum-quanto and when running the quantized model I get the error in the subject
RuntimeError: _int_mm_out_cuda not compiled for this platform., which just happens when calling torch._int_mm.
There are multiple tests in the project using this function and all of them fail with the same error.
I guess it should just work, but I probably have something wrong in my setup.

Versions

PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: 12.1.66
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060
Nvidia driver version: 551.83
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2100
DeviceID=CPU0
Family=198
L2CacheSize=12288
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2100
Name=12th Gen Intel(R) Core(TM) i7-12700F
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.4
[pip3] torch==2.3.1+cu121
[pip3] torchaudio==2.3.1+cu121
[pip3] torchvision==0.18.1+cu121
[conda] Could not collect

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @malfet @seemethere @peterjc123 @mszhanyi @skyline75489 @nbcsm @vladimir-aubrecht @iremyux @Blackhex @cristianPanaite @ptrblck

The text was updated successfully, but these errors were encountered:

malfet · 2024-07-17T13:42:12Z

Tenatively grabbing for myself to get a repro, as there are no platform specific guards in this code, just one hiding the code behind CUDA version

mattiadg · 2024-07-17T13:53:49Z

People are clearly using this, and I'm confused because nowhere is reported to do anything special. Can it maybe depends on the C++ compiler?
Here there are no information collected for any. I'll check again what's going on

dacorvo · 2024-07-17T13:59:16Z

@mattiadg it seems torch._int_mm is only available for CUDA cards whose capability is higher than 8.0 (yours is 7.5).
It is strange I did not get that error myself because I run unit tests on T4 from time to time: please create an issue in quanto as well as I might be able to catch this earlier and avoid calling torch._int_mm.

mattiadg · 2024-07-17T14:29:40Z

New output of collect_env.py, still same result

PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home
GCC version: (Rev3, Built by MSYS2 project) 14.1.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: 12.1.66
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060
Nvidia driver version: 551.83
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2100
DeviceID=CPU0
Family=198
L2CacheSize=12288
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2100
Name=12th Gen Intel(R) Core(TM) i7-12700F
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.4
[pip3] torch==2.3.1+cu121
[pip3] torchaudio==2.3.1+cu121
[pip3] torchvision==0.18.1+cu121
[conda] Could not collect

mattiadg · 2024-07-18T08:35:11Z

The discussion continued a bit here huggingface/optimum-quanto#245 and @dacorvo suggested that the operation may not be compiled on Windows.

albanD · 2024-07-22T17:10:15Z

Given the ifdef in the code:

pytorch/aten/src/ATen/native/cuda/Blas.cpp

Line 775 in d8a35d5

#if (defined(CUDA_VERSION) && (CUDA_VERSION >= 11070)) || defined(USE_ROCM)

The issue is most likely that this was compiled for an old version of cuda on windows.

cc @eqy this still looks suspicious, maybe this condition doesnt work?

eqy · 2024-07-22T19:51:46Z

Is this with a wheel or a source build? Since it's showing the second message, it looks like CUDA_VERSION isn't even defined during the build.

mattiadg · 2024-07-22T19:52:21Z

from pip

malfet · 2024-07-23T06:22:13Z

Hmm, I can not reproduce it using 2.4 release candidate

 python -c "import torch;print(torch.__version__,  torch._int_mm(torch.randint(0, 127, (32, 32), device='cuda', dtype=torch.int8),  torch.randint(0, 32, (32, 32), device='cuda', dtype=torch.int8)))"
2.4.0+cu118 tensor([[30834, 32776, 32329,  ..., 28246, 25706, 27117],
        [31315, 34551, 38485,  ..., 31454, 30362, 31866],
        [28472, 30010, 33893,  ..., 27359, 28720, 27821],
        ...,
        [32690, 40828, 40961,  ..., 34232, 28498, 37512],
        [33119, 33277, 37838,  ..., 30230, 29147, 30507],
        [30075, 33998, 31835,  ..., 25740, 23120, 25182]], device='cuda:0',
       dtype=torch.int32)

And in 2.3 it indeed was disabled for Windows platform:

pytorch/aten/src/ATen/native/cuda/Blas.cpp

Line 739 in 63d5e92

    
           #if !defined(USE_ROCM) && !defined(_MSC_VER) && defined(CUDA_VERSION) && CUDA_VERSION >= 11070

But this constraint was lifted by #125792

mattiadg mentioned this issue Jul 17, 2024

[WIP] Whisper demo for ASR huggingface/optimum-quanto#242

Closed

4 tasks

malfet added module: windows Windows support for PyTorch module: cuda Related to torch.cuda, and CUDA support in general high priority labels Jul 17, 2024

pytorch-bot bot added the triage review label Jul 17, 2024

malfet added the module: build Build system issues label Jul 17, 2024

malfet self-assigned this Jul 17, 2024

mattiadg mentioned this issue Jul 17, 2024

RuntimeError: _int_mm_out_cuda not compiled for this platform. huggingface/optimum-quanto#245

Closed

malfet added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Jul 22, 2024

malfet removed the high priority label Jul 23, 2024

malfet closed this as completed Jul 23, 2024

h-vetinari mentioned this issue Feb 9, 2025

[v2.5.x] Fix stray bracket breaking pytest; fix include-patch for cross-compilation conda-forge/pytorch-cpu-feedstock#346

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: _int_mm_out_cuda not compiled for this platform. #130928

RuntimeError: _int_mm_out_cuda not compiled for this platform. #130928

mattiadg commented Jul 17, 2024 •

edited by pytorch-bot bot

Loading

malfet commented Jul 17, 2024

mattiadg commented Jul 17, 2024

dacorvo commented Jul 17, 2024 •

edited

Loading

mattiadg commented Jul 17, 2024

mattiadg commented Jul 18, 2024

albanD commented Jul 22, 2024 •

edited

Loading

eqy commented Jul 22, 2024

mattiadg commented Jul 22, 2024

malfet commented Jul 23, 2024 •

edited

Loading

RuntimeError: _int_mm_out_cuda not compiled for this platform. #130928

RuntimeError: _int_mm_out_cuda not compiled for this platform. #130928

Comments

mattiadg commented Jul 17, 2024 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

malfet commented Jul 17, 2024

mattiadg commented Jul 17, 2024

dacorvo commented Jul 17, 2024 • edited Loading

mattiadg commented Jul 17, 2024

mattiadg commented Jul 18, 2024

albanD commented Jul 22, 2024 • edited Loading

eqy commented Jul 22, 2024

mattiadg commented Jul 22, 2024

malfet commented Jul 23, 2024 • edited Loading

mattiadg commented Jul 17, 2024 •

edited by pytorch-bot bot

Loading

dacorvo commented Jul 17, 2024 •

edited

Loading

albanD commented Jul 22, 2024 •

edited

Loading

malfet commented Jul 23, 2024 •

edited

Loading