fix: disable tensor parallel for falcon7b #755

zhuangqh · 2024-12-04T12:40:20Z

Reason for Change:

vllm requires the model specification to be exactly divisible by
the number of GPUs (tensor parallel level).
while falcon-7b-instruct have 71 attention heads, which is a prime number.
So, give up tensor parallel inference for it.

Requirements

added unit tests and e2e tests (if applicable).

Signed-off-by: jerryzhuang <zhuangqhc@gmail.com>

fix: disable tensor parallel for falcon7b

a6a7340

Signed-off-by: jerryzhuang <zhuangqhc@gmail.com>

zhuangqh requested review from Fei-Guo, helayoty and ishaansehgal99 as code owners December 4, 2024 12:40

zhuangqh had a problem deploying to unit-tests December 4, 2024 12:43 — with GitHub Actions Error

fix

70c050c

Signed-off-by: jerryzhuang <zhuangqhc@gmail.com>

zhuangqh temporarily deployed to unit-tests December 4, 2024 12:45 — with GitHub Actions Inactive

ishaansehgal99 approved these changes Dec 5, 2024

View reviewed changes

polish

9b287a0

zhuangqh temporarily deployed to unit-tests December 5, 2024 00:42 — with GitHub Actions Inactive

Fei-Guo approved these changes Dec 5, 2024

View reviewed changes

Merge branch 'main' into fix-falcon7b

682d642

Fei-Guo merged commit 3882218 into kaito-project:main Dec 5, 2024
3 of 6 checks passed

Fei-Guo had a problem deploying to unit-tests January 4, 2025 01:09 — with GitHub Actions Failure

Fei-Guo had a problem deploying to e2e-test January 4, 2025 01:09 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: disable tensor parallel for falcon7b #755

fix: disable tensor parallel for falcon7b #755

zhuangqh commented Dec 4, 2024

fix: disable tensor parallel for falcon7b #755

fix: disable tensor parallel for falcon7b #755

Conversation

zhuangqh commented Dec 4, 2024