Skip to content

Commit

Permalink
fix _chunk_cumsum_fwd_kernel invalid address
Browse files Browse the repository at this point in the history
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
  • Loading branch information
fabianlim committed Dec 8, 2024
1 parent 02943e7 commit cf8ad1b
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion vllm/model_executor/layers/mamba/ops/ssd_chunk_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,10 @@ def _chunk_cumsum_fwd_kernel(
BLOCK_SIZE_H: tl.constexpr, BLOCK_SIZE_CHUNK: tl.constexpr,
):
pid_b = tl.program_id(axis=0)
pid_c = tl.program_id(axis=1)

# if dt is long, may cause problems, so use 64 bit
# https://github.com/triton-lang/triton/issues/1058
pid_c = tl.program_id(axis=1).to(tl.int64)
pid_h = tl.program_id(axis=2)
dt_ptr += pid_b * stride_dt_batch + pid_c * chunk_size * stride_dt_seqlen
dt_out_ptr += pid_b * stride_dt_out_batch + pid_c * stride_dt_out_chunk
Expand Down

0 comments on commit cf8ad1b

Please sign in to comment.