Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Orca prefix sharing benchmark #41

Closed
wants to merge 108 commits into from
Closed

Conversation

suquark
Copy link
Contributor

@suquark suquark commented Apr 17, 2023

No description provided.

@zhuohan123
Copy link
Member

Close this PR since it's too diverged from the current main.

@zhuohan123 zhuohan123 closed this Jun 17, 2023
@zhuohan123 zhuohan123 deleted the orca_prefix branch June 18, 2023 07:30
tianyil1 pushed a commit to tianyil1/vllm that referenced this pull request Jun 5, 2024
* Fix mixtral hidden states layout to fit into habana model runner

* Add static moe op to mixtral

* Add mark_step to static_fused_moe

* Update __init__.py

* Fix code indentation

* Make code compatible with non HPU devices

* Move static_fused_moe to vllm.hpu.ops

* Update mixtral.py

* Move op import from forward to top of the file

* Remove circular import
fxmarty pushed a commit to fxmarty/vllm-public that referenced this pull request Jun 12, 2024
* Using rocm_flash_attention that supports bias computed from alibi slopes; Using attn_fwd triton kernel from ROCm/triton main_perf that does not cause triton compolier to hang

* Uninitialized variable fix
joerunde added a commit to joerunde/vllm that referenced this pull request Jun 17, 2024
Adds in this one more metric from TGIS

---------

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
bigPYJ1151 added a commit to bigPYJ1151/vllm that referenced this pull request Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants