[LLM Runtime] enable MHA fusion for gptneox&dolly&starcoder&llama2-70b #567

intellinjun · 2023-10-27T07:13:30Z

Type of Change

feature or bug fix or documentation or others
API changed or not:not

enable MHA fusion for gptneox&dolly&starcoder&llama2-70b

Description

detail description
JIRA ticket: xxx

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: intellinjun <jun.lin@intel.com>

Signed-off-by: intellinjun <105184542+intellinjun@users.noreply.github.com>

intellinjun · 2023-10-27T07:40:21Z

https://inteltf-jenk.sh.intel.com/job/ITREX-cpp-graph-extension/158/

Signed-off-by: intellinjun <jun.lin@intel.com>

…-for-transformers into mha_fusion

Signed-off-by: intellinjun <105184542+intellinjun@users.noreply.github.com>

…ransformers into mha_fusion

Signed-off-by: intellinjun <jun.lin@intel.com>

…-for-transformers into mha_fusion

intellinjun · 2023-10-30T05:18:31Z

intellinjun · 2023-10-30T05:18:52Z

intellinjun · 2023-10-30T05:20:41Z

intellinjun · 2023-10-30T05:21:02Z

DDEle

LGTM

airMeng · 2023-10-30T05:28:50Z

can you summary the comparison, before and after?

Signed-off-by: intellinjun <105184542+intellinjun@users.noreply.github.com>

intel_extension_for_transformers/llm/runtime/graph/models/llama/llama.cpp

a32543254

LGTM

Signed-off-by: intellinjun <jun.lin@intel.com>

intellinjun · 2023-10-31T07:04:34Z

can you summary the comparison, before and after?

I used ci test to get the summary,there will be results tomorrow.

Signed-off-by: intellinjun <jun.lin@intel.com>

intellinjun · 2023-10-31T10:48:10Z

starcoder still has accuracy issues, please don't merge

Signed-off-by: intellinjun <jun.lin@intel.com>

intellinjun · 2023-11-01T03:41:17Z

https://inteltf-jenk.sh.intel.com/job/ITREX-cpp-graph-extension/169/artifact/report.html

[LLM Runtime] enable MHA fusion for gptneox&dolly&starcoder&llama2-70b

6dc2d74

Signed-off-by: intellinjun <jun.lin@intel.com>

intellinjun added the draft label Oct 27, 2023

intellinjun requested a review from airMeng as a code owner October 27, 2023 07:13

intellinjun and others added 2 commits October 27, 2023 15:35

[LLM Runtime] enable llama2-7b mha fusion

e6c3221

Signed-off-by: intellinjun <jun.lin@intel.com>

Merge branch 'main' into mha_fusion

1f58eee

Signed-off-by: intellinjun <105184542+intellinjun@users.noreply.github.com>

intellinjun added 3 commits October 27, 2023 15:50

[LLM Runtime] fix format error

b3f5adf

Signed-off-by: intellinjun <jun.lin@intel.com>

Merge branch 'main' into mha_fusion

bf8e6f6

Merge branch 'mha_fusion' of https://github.com/intel/intel-extension…

72fdb0e

…-for-transformers into mha_fusion

airMeng marked this pull request as draft October 27, 2023 07:54

intellinjun requested review from DDEle and a32543254 October 27, 2023 08:01

DDEle added ITREX.cpp and removed draft labels Oct 30, 2023

intellinjun and others added 4 commits October 30, 2023 10:04

Merge branch 'main' into mha_fusion

4c81854

Signed-off-by: intellinjun <105184542+intellinjun@users.noreply.github.com>

Merge branch 'main' of https://github.com/intel/intel-extension-for-t…

f51a21c

…ransformers into mha_fusion

[LLM Runtime] enable gqa_fusion for mistral-7b

50e9e01

Signed-off-by: intellinjun <jun.lin@intel.com>

Merge branch 'mha_fusion' of https://github.com/intel/intel-extension…

150ba7d

…-for-transformers into mha_fusion

DDEle approved these changes Oct 30, 2023

View reviewed changes

Merge branch 'main' into mha_fusion

bfc7bc5

Signed-off-by: intellinjun <105184542+intellinjun@users.noreply.github.com>

airMeng marked this pull request as ready for review October 31, 2023 06:32

airMeng approved these changes Oct 31, 2023

View reviewed changes

airMeng requested a review from ClarkChin08 October 31, 2023 06:34

ClarkChin08 reviewed Oct 31, 2023

View reviewed changes

intel_extension_for_transformers/llm/runtime/graph/models/llama/llama.cpp Show resolved Hide resolved

a32543254 approved these changes Oct 31, 2023

View reviewed changes

intellinjun added 3 commits October 31, 2023 14:50

[LLM Runtime] fix format error

c0f215d

Signed-off-by: intellinjun <jun.lin@intel.com>

[LLM Runtime] resolve conflict

f0a30d8

Signed-off-by: intellinjun <jun.lin@intel.com>

[LLM Runtime] Replace n_embd / n_head to head_dim

3e5b76f

Signed-off-by: intellinjun <jun.lin@intel.com>

fix starcoder accuracy error

9b2e040

Signed-off-by: intellinjun <jun.lin@intel.com>

intellinjun and others added 2 commits November 1, 2023 10:48

[LLM Runtime] fix starcoder accuracy issue

eb18278

Signed-off-by: intellinjun <jun.lin@intel.com>

Merge branch 'main' into mha_fusion

3b8bd28

VincyZhang merged commit 81dde20 into main Nov 1, 2023

VincyZhang deleted the mha_fusion branch November 1, 2023 03:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM Runtime] enable MHA fusion for gptneox&dolly&starcoder&llama2-70b #567

[LLM Runtime] enable MHA fusion for gptneox&dolly&starcoder&llama2-70b #567

intellinjun commented Oct 27, 2023

intellinjun commented Oct 27, 2023

intellinjun commented Oct 30, 2023

intellinjun commented Oct 30, 2023

intellinjun commented Oct 30, 2023

intellinjun commented Oct 30, 2023

DDEle left a comment

airMeng commented Oct 30, 2023

a32543254 left a comment

intellinjun commented Oct 31, 2023

intellinjun commented Oct 31, 2023

intellinjun commented Nov 1, 2023 •

edited

Loading

[LLM Runtime] enable MHA fusion for gptneox&dolly&starcoder&llama2-70b #567

[LLM Runtime] enable MHA fusion for gptneox&dolly&starcoder&llama2-70b #567

Conversation

intellinjun commented Oct 27, 2023

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

intellinjun commented Oct 27, 2023

intellinjun commented Oct 30, 2023

intellinjun commented Oct 30, 2023

intellinjun commented Oct 30, 2023

intellinjun commented Oct 30, 2023

DDEle left a comment

Choose a reason for hiding this comment

airMeng commented Oct 30, 2023

a32543254 left a comment

Choose a reason for hiding this comment

intellinjun commented Oct 31, 2023

intellinjun commented Oct 31, 2023

intellinjun commented Nov 1, 2023 • edited Loading

intellinjun commented Nov 1, 2023 •

edited

Loading