[KVCache] Attention func accepting over-padded qkv and output NDArray #17401

MasterJH5574 · 2024-09-22T05:27:23Z

This PR enhances the AttentionWithFusedQKV function of PagedKVCache so that it can now accept input qkv_data and o_data that have padding along the sequence dimension.

We introduce this enhancement to allow more flexibility for the caller of PagedKVCache to decide whether to pad the input qkv/o NDArrays or not.

This PR enhances the `AttentionWithFusedQKV` function of `PagedKVCache` so that it can now accept input `qkv_data` and `o_data` that have padding along the sequence dimension. We introduce this enhancement to allow more flexibility for the caller of PagedKVCache to decide whether to pad the input qkv/o NDArrays or not.

tqchen approved these changes Sep 22, 2024

View reviewed changes

tqchen merged commit ce46185 into apache:main Sep 22, 2024
18 of 19 checks passed

ysh329 mentioned this pull request Oct 16, 2024

[Release] v0.18.0 Release Candidate Notes #17468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KVCache] Attention func accepting over-padded qkv and output NDArray #17401

[KVCache] Attention func accepting over-padded qkv and output NDArray #17401

MasterJH5574 commented Sep 22, 2024

[KVCache] Attention func accepting over-padded qkv and output NDArray #17401

[KVCache] Attention func accepting over-padded qkv and output NDArray #17401

Conversation

MasterJH5574 commented Sep 22, 2024