Skip to content

Commit

Permalink
[Kernel] Support running GPTQ 8-bit models in Marlin (vllm-project#4533)
Browse files Browse the repository at this point in the history
  • Loading branch information
alexm-redhat authored and dtrifiro committed May 7, 2024
1 parent 49e083c commit 81c0f04
Show file tree
Hide file tree
Showing 7 changed files with 553 additions and 324 deletions.
4 changes: 3 additions & 1 deletion csrc/ops.h
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ torch::Tensor gptq_marlin_gemm(
torch::Tensor &g_idx,
torch::Tensor &perm,
torch::Tensor &workspace,
int64_t num_bits,
int64_t size_m,
int64_t size_n,
int64_t size_k,
Expand All @@ -141,7 +142,8 @@ torch::Tensor gptq_marlin_repack(
torch::Tensor &b_q_weight,
torch::Tensor &perm,
int64_t size_k,
int64_t size_n);
int64_t size_n,
int64_t num_bits);
#endif

void squeezellm_gemm(
Expand Down
Loading

0 comments on commit 81c0f04

Please sign in to comment.