LLM: support quantized kv cache for Mistral in transformers >=4.36.0 #4244
Job | Run time |
---|---|
1s | |
2s | |
4s | |
4s | |
3s | |
7s | |
2s | |
2m 15s | |
2m 1s | |
3m 16s | |
1m 2s | |
59s | |
45s | |
15m 50s | |
11m 8s | |
12m 25s | |
14m 16s | |
1h 4m 20s |
Job | Run time |
---|---|
1s | |
2s | |
4s | |
4s | |
3s | |
7s | |
2s | |
2m 15s | |
2m 1s | |
3m 16s | |
1m 2s | |
59s | |
45s | |
15m 50s | |
11m 8s | |
12m 25s | |
14m 16s | |
1h 4m 20s |