LLM: support quantized kv cache for Mistral in transformers >=4.36.0 #10326

lalalapotter · 2024-03-05T07:59:10Z

Description

This PR enable quantize kv cache for Mistral model in transformers >= 4.36.0.

How to test?

MTL test
Arc test

…10326) * support quantize kv for mistral in transformers 4.36 * update mistral support. * fix style.

lalalapotter added 4 commits March 5, 2024 15:30

support quantize kv for mistral in transformers 4.36

8606564

Merge branch 'main' into mistral-quantize-kv-4-36

54d7928

update mistral support.

d365f9c

fix style.

a1045f2

lalalapotter added the llm label Mar 5, 2024

lalalapotter requested a review from MeouSker77 March 5, 2024 07:59

lalalapotter self-assigned this Mar 5, 2024

MeouSker77 approved these changes Mar 5, 2024

View reviewed changes

lalalapotter merged commit cc5dbfe into intel:main Mar 5, 2024
19 checks passed

liu-shaojun pushed a commit that referenced this pull request Mar 25, 2024

LLM: support quantized kv cache for Mistral in transformers >=4.36.0 (#…

30d009b

…10326) * support quantize kv for mistral in transformers 4.36 * update mistral support. * fix style.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM: support quantized kv cache for Mistral in transformers >=4.36.0 #10326

LLM: support quantized kv cache for Mistral in transformers >=4.36.0 #10326

lalalapotter commented Mar 5, 2024 •

edited

Loading

LLM: support quantized kv cache for Mistral in transformers >=4.36.0 #10326

LLM: support quantized kv cache for Mistral in transformers >=4.36.0 #10326

Conversation

lalalapotter commented Mar 5, 2024 • edited Loading

Description

How to test?

lalalapotter commented Mar 5, 2024 •

edited

Loading