Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KVCache] Attention func accepting over-padded qkv and output NDArray #17401

Merged
merged 1 commit into from
Sep 22, 2024

Conversation

MasterJH5574
Copy link
Contributor

This PR enhances the AttentionWithFusedQKV function of PagedKVCache so that it can now accept input qkv_data and o_data that have padding along the sequence dimension.

We introduce this enhancement to allow more flexibility for the caller of PagedKVCache to decide whether to pad the input qkv/o NDArrays or not.

This PR enhances the `AttentionWithFusedQKV` function of `PagedKVCache`
so that it can now accept input `qkv_data` and `o_data` that have
padding along the sequence dimension.

We introduce this enhancement to allow more flexibility for the caller
of PagedKVCache to decide whether to pad the input qkv/o NDArrays or
not.
@tqchen tqchen merged commit ce46185 into apache:main Sep 22, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants