Skip to content

Commit

Permalink
Allow int8
Browse files Browse the repository at this point in the history
  • Loading branch information
ilya-lavrenov committed Jan 30, 2025
1 parent e293a33 commit 7d37f5e
Showing 1 changed file with 0 additions and 5 deletions.
5 changes: 0 additions & 5 deletions src/cpp/src/continuous_batching_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,6 @@ void apply_kv_cache_precision(const std::shared_ptr<ov::Model>& model, const std
// x86 and ARM have different default kv cache type, take this information from the plugin
m_kv_cache_type = core.get_property(device, ov::hint::kv_cache_precision);
}

// TEMP WA: currently FP16 / BF16 KV cache is faster than U8 for PagedAttention
if (m_kv_cache_type == ov::element::u8) {
m_kv_cache_type = inference_precision == ov::element::bf16 ? ov::element::bf16 : ov::element::f16;
}
} else if (device.find("GPU") != std::string::npos) {
if (accuracy_mode) {
inference_precision = ov::element::f32;
Expand Down

0 comments on commit 7d37f5e

Please sign in to comment.