changing dimensions of batch size, kv cache and num_input_heads #793

lamiayous · 2024-08-23T15:54:48Z

Once KV-cache tensors are exposed from the stateful model, they should be reshaped to have static size. Current implementation of reshape function assumes that KV-cache dimension is always equal to 2 and batch dimension always equal to 0. For chatglm and Qwen this is not the case. This PR identifies the KV-cache and batch dimensions by reading the models config.json file

src/cpp/src/llm_pipeline_static.cpp

…genai into ly/handling_tensors_access_in_genai

src/cpp/src/llm_pipeline_static.cpp

…genai into ly/handling_tensors_access_in_genai

…com/lamiayous/openvino.genai into ly/handling_tensors_access_in_genai

…genai into ly/handling_tensors_access_in_genai

src/cpp/src/llm_pipeline_static.cpp

TolyaTalamanov

Great, thanks @lamiayous!

LGTM 👍

Wovchena · 2024-09-03T14:52:12Z

jenkins_build

Wovchena · 2024-09-03T17:19:41Z

jenkins_build

src/cpp/src/llm_pipeline_static.cpp

Wovchena · 2024-09-03T17:40:47Z

Failing checks are caused by broken master. Ignore them until #807 is merged.

Co-authored-by: Zlobin Vladimir <vladimir.zlobin@intel.com>

ilya-lavrenov · 2024-09-03T23:13:43Z

src/cpp/src/llm_pipeline_static.cpp

+    KVAxesPosition axes;
+    if (model_type == "chatglm") {
+        axes.batch = 1u;
+        axes.seq_len = 0u;


why this approach https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/cpp/prompt_lookup_decoding_lm/prompt_lookup_decoding_lm.cpp#L13 is not used? it seems generic and does not required model type

@ilya-lavrenov Yes, you're right. The only drawback is that this function cannot be used for .blob's case as they will be stateless + static.

how is it supposed to pass compiled blob via current LLMPipeline API ?

It's on review so far: #811

The current solution aim to handle qwen / chatglm cases that are crucial for now.

But in general, I'd prefer using your approach but somewhere inside StatefulToStateless transformation, so that it could save necessary metadata that will be available from both xml / blob formats.

…com/lamiayous/openvino.genai into ly/handling_tensors_access_in_genai

ilya-lavrenov · 2024-09-06T03:51:56Z

build_jenkins

chanign dimensions of batch size, kv cache and num_input_heads

006c942

ilya-lavrenov assigned TolyaTalamanov Aug 23, 2024

TolyaTalamanov reviewed Aug 26, 2024

View reviewed changes

src/cpp/src/llm_pipeline_static.cpp Outdated Show resolved Hide resolved

src/cpp/src/llm_pipeline_static.cpp Outdated Show resolved Hide resolved

lamiayous and others added 4 commits August 26, 2024 15:59

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

2f88d30

…genai into ly/handling_tensors_access_in_genai

Support for Qwen and isolating changes to functions

7d5cb93

indentation fix

0969051

Merge branch 'master' into ly/handling_tensors_access_in_genai

a5abd8f

TolyaTalamanov reviewed Aug 28, 2024

View reviewed changes

src/cpp/src/llm_pipeline_static.cpp Outdated Show resolved Hide resolved

lamiayous added 5 commits August 28, 2024 16:21

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

73181e3

…genai into ly/handling_tensors_access_in_genai

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

d223867

…genai into ly/handling_tensors_access_in_genai

Merge branch 'ly/handling_tensors_access_in_genai' of https://github.…

e929a48

…com/lamiayous/openvino.genai into ly/handling_tensors_access_in_genai

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

323e1ce

…genai into ly/handling_tensors_access_in_genai

fix

bc84a24

TolyaTalamanov reviewed Sep 2, 2024

View reviewed changes

src/cpp/src/llm_pipeline_static.cpp Outdated Show resolved Hide resolved

src/cpp/src/llm_pipeline_static.cpp Outdated Show resolved Hide resolved

src/cpp/src/llm_pipeline_static.cpp Outdated Show resolved Hide resolved

src/cpp/src/llm_pipeline_static.cpp Outdated Show resolved Hide resolved

lamiayous added 3 commits September 3, 2024 09:03

passing KVAxesPosition to reshape_to_static

c208f1f

typo fix

be7a41c

remove debug print

a119bef

TolyaTalamanov reviewed Sep 3, 2024

View reviewed changes

src/cpp/src/llm_pipeline_static.cpp Outdated Show resolved Hide resolved

changed to strict model_type comparison

311c358

TolyaTalamanov approved these changes Sep 3, 2024

View reviewed changes

fix typo

2896476

Merge branch 'master' into ly/handling_tensors_access_in_genai

7019379

Wovchena requested changes Sep 3, 2024

View reviewed changes

src/cpp/src/llm_pipeline_static.cpp Outdated Show resolved Hide resolved

Update src/cpp/src/llm_pipeline_static.cpp

9bcaebd

Co-authored-by: Zlobin Vladimir <vladimir.zlobin@intel.com>

Wovchena approved these changes Sep 3, 2024

View reviewed changes

Merge branch 'master' into ly/handling_tensors_access_in_genai

6359a73

ilya-lavrenov enabled auto-merge September 3, 2024 23:11

ilya-lavrenov disabled auto-merge September 3, 2024 23:11

ilya-lavrenov self-assigned this Sep 3, 2024

ilya-lavrenov reviewed Sep 3, 2024

View reviewed changes

lamiayous added 2 commits September 4, 2024 09:35

fix typo

db7c28b

Merge branch 'ly/handling_tensors_access_in_genai' of https://github.…

88e6b0f

…com/lamiayous/openvino.genai into ly/handling_tensors_access_in_genai

ilya-lavrenov approved these changes Sep 6, 2024

View reviewed changes

ilya-lavrenov added this to the 2024.5 milestone Sep 6, 2024

ilya-lavrenov enabled auto-merge September 6, 2024 03:52

ilya-lavrenov added this pull request to the merge queue Sep 6, 2024

Merged via the queue into openvinotoolkit:master with commit 72730a4 Sep 6, 2024
34 checks passed

ilya-lavrenov added the category: LLM LLM pipeline (stateful, static) label Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changing dimensions of batch size, kv cache and num_input_heads #793

changing dimensions of batch size, kv cache and num_input_heads #793

lamiayous commented Aug 23, 2024 •

edited

Loading

TolyaTalamanov left a comment

Wovchena commented Sep 3, 2024

Wovchena commented Sep 3, 2024

Wovchena commented Sep 3, 2024

ilya-lavrenov Sep 3, 2024

TolyaTalamanov Sep 4, 2024

ilya-lavrenov Sep 4, 2024

TolyaTalamanov Sep 4, 2024

TolyaTalamanov Sep 4, 2024 •

edited

Loading

ilya-lavrenov commented Sep 6, 2024

changing dimensions of batch size, kv cache and num_input_heads #793

changing dimensions of batch size, kv cache and num_input_heads #793

Conversation

lamiayous commented Aug 23, 2024 • edited Loading

TolyaTalamanov left a comment

Choose a reason for hiding this comment

Wovchena commented Sep 3, 2024

Wovchena commented Sep 3, 2024

Wovchena commented Sep 3, 2024

ilya-lavrenov Sep 3, 2024

Choose a reason for hiding this comment

TolyaTalamanov Sep 4, 2024

Choose a reason for hiding this comment

ilya-lavrenov Sep 4, 2024

Choose a reason for hiding this comment

TolyaTalamanov Sep 4, 2024

Choose a reason for hiding this comment

TolyaTalamanov Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

ilya-lavrenov commented Sep 6, 2024

lamiayous commented Aug 23, 2024 •

edited

Loading

TolyaTalamanov Sep 4, 2024 •

edited

Loading