Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM CPU] Add rotary embedding fp16 kernel #23013

Merged
merged 8 commits into from
Dec 6, 2024
Merged

Conversation

fajin-corp
Copy link
Contributor

Description

Add fp16 kernel to rotary embedding to boost performance.

Motivation and Context

Part of performance optimization work for group query attention

@fajin-corp fajin-corp requested a review from a team as a code owner December 5, 2024 00:01
if (rotary_emb_dim < head_size) {
std::memcpy(output_data + rotary_emb_dim,
input_data + rotary_emb_dim,
(head_size - rotary_emb_dim) * sizeof(T));

Check warning

Code scanning / PREfast

Arithmetic overflow: Using operator '-' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '-' to avoid overflow (io.2). Warning

Arithmetic overflow: Using operator '-' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '-' to avoid overflow (io.2).
if (rotary_emb_dim < head_size) {
std::memcpy(output_data + rotary_emb_dim,
input_data + rotary_emb_dim,
(head_size - rotary_emb_dim) * sizeof(T));

Check warning

Code scanning / PREfast

Arithmetic overflow: Using operator '-' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '-' to avoid overflow (io.2). Warning

Arithmetic overflow: Using operator '-' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '-' to avoid overflow (io.2).
@fajin-corp fajin-corp merged commit bd5a759 into main Dec 6, 2024
95 checks passed
@fajin-corp fajin-corp deleted the fajin/gqa-rotary branch December 6, 2024 21:25
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description
Add fp16 kernel to rotary embedding to boost performance.


### Motivation and Context
Part of performance optimization work for group query attention
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description
Add fp16 kernel to rotary embedding to boost performance.


### Motivation and Context
Part of performance optimization work for group query attention
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description
Add fp16 kernel to rotary embedding to boost performance.


### Motivation and Context
Part of performance optimization work for group query attention
tarekziade pushed a commit to tarekziade/onnxruntime that referenced this pull request Jan 10, 2025
### Description
Add fp16 kernel to rotary embedding to boost performance.


### Motivation and Context
Part of performance optimization work for group query attention
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants