Skip to content

Commit

Permalink
[NVIDIA] Fix an issue to use current stream for the nvfp4 quant (vllm…
Browse files Browse the repository at this point in the history
…-project#13632)

Signed-off-by: Michael Glass <mrglass@us.ibm.com>
  • Loading branch information
kaixih authored and michaelrglass committed Feb 21, 2025
1 parent 1a40a97 commit 288eca6
Showing 1 changed file with 1 addition and 4 deletions.
5 changes: 1 addition & 4 deletions csrc/quantization/fp4/nvfp4_quant_kernels.cu
Original file line number Diff line number Diff line change
Expand Up @@ -348,10 +348,7 @@ void scaled_fp4_quant_sm100a(torch::Tensor const& output,
auto sf_out = static_cast<int32_t*>(output_sf.data_ptr());
auto output_ptr = static_cast<int64_t*>(output.data_ptr());
at::cuda::CUDAGuard device_guard{(char)input.get_device()};
auto stream = at::cuda::getStreamFromPool(false, input.get_device());
if (stream == nullptr) {
std::cerr << "Warning: Null CUDA stream" << std::endl;
}
auto stream = at::cuda::getCurrentCUDAStream(input.get_device());

// We don't support e8m0 scales at this moment.
bool useUE8M0 = false;
Expand Down

0 comments on commit 288eca6

Please sign in to comment.