Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changing dimensions of batch size, kv cache and num_input_heads #793

Conversation

lamiayous
Copy link
Contributor

@lamiayous lamiayous commented Aug 23, 2024

Once KV-cache tensors are exposed from the stateful model, they should be reshaped to have static size. Current implementation of reshape function assumes that KV-cache dimension is always equal to 2 and batch dimension always equal to 0. For chatglm and Qwen this is not the case. This PR identifies the KV-cache and batch dimensions by reading the models config.json file

Copy link
Collaborator

@TolyaTalamanov TolyaTalamanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks @lamiayous!

LGTM 👍

@Wovchena
Copy link
Collaborator

Wovchena commented Sep 3, 2024

jenkins_build

@Wovchena
Copy link
Collaborator

Wovchena commented Sep 3, 2024

jenkins_build

@Wovchena
Copy link
Collaborator

Wovchena commented Sep 3, 2024

Failing checks are caused by broken master. Ignore them until #807 is merged.

Co-authored-by: Zlobin Vladimir <vladimir.zlobin@intel.com>
@ilya-lavrenov ilya-lavrenov self-assigned this Sep 3, 2024
KVAxesPosition axes;
if (model_type == "chatglm") {
axes.batch = 1u;
axes.seq_len = 0u;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilya-lavrenov Yes, you're right. The only drawback is that this function cannot be used for .blob's case as they will be stateless + static.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is it supposed to pass compiled blob via current LLMPipeline API ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's on review so far: #811

Copy link
Collaborator

@TolyaTalamanov TolyaTalamanov Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current solution aim to handle qwen / chatglm cases that are crucial for now.

But in general, I'd prefer using your approach but somewhere inside StatefulToStateless transformation, so that it could save necessary metadata that will be available from both xml / blob formats.

@ilya-lavrenov
Copy link
Contributor

build_jenkins

@ilya-lavrenov ilya-lavrenov added this to the 2024.5 milestone Sep 6, 2024
@ilya-lavrenov ilya-lavrenov added this pull request to the merge queue Sep 6, 2024
Merged via the queue into openvinotoolkit:master with commit 72730a4 Sep 6, 2024
34 checks passed
@ilya-lavrenov ilya-lavrenov added the category: LLM LLM pipeline (stateful, static) label Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: LLM LLM pipeline (stateful, static)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants