-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Snippets][CPU] Added KVCacheMatcher check for LLM in MHATokenization #28812
base: master
Are you sure you want to change the base?
[Snippets][CPU] Added KVCacheMatcher check for LLM in MHATokenization #28812
Conversation
6fedb81
to
6cf3934
Compare
6cf3934
to
0c75162
Compare
ov::op::util::has_op_with_type<intel_cpu::ScaledDotProductAttentionWithKVCache>(model) || | ||
ov::op::util::has_op_with_type<ov::op::PagedAttentionExtension>(model); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need these checks here?
If we do, can move them inside is_large_language_model
? Since we created a dedicated helper for this check, all the logic related should be inside. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first time I just wanted to align with GPU check - I was not sure that PA should be in the check is_llm
in GPU Plugin. But I found that they updated the condition included PA
. So I moved PA
to common helper leaving only SDPAWithKVCache
because it's CPU-specific op.
Thanks!
0c75162
to
3c8a73c
Compare
3c8a73c
to
782ed19
Compare
Details:
is_LLM
check from GPU plugin to common part to reuse in other pluginsis_large_language_model
function in check for LLM in MHATokenization in CPU PluginTickets: