forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Enable HPU support in vLLM (vllm-project#1)
* Porting vllm to HPU * add hpu cache allocate * move slot_mapping to cpu and add is_prompt in cache_ops.reshape_and_cache * add bucket to input metadata * 1. limit max block number for lazy mode (TODO) 2. set some inpu metadata from cuda to cpu * remove bucket for block tables * add run bash script and change benchmark config * 1. modify kv cache structure to tensors 2. update hpu paged attention API (for hpu graph compatibility) * add attention mask for generation * add multi_query_kv_attention attn_bias * Temp commit * Integrate fused kernels for RMSNorm and RoPE * Resolve merge conflicts * Minor Gaudi workarounds, add debugging to stock vLLM API server * Fix post-merge pinned memory segfaults * Re-enable sequence decode * Maintain GPU compatibility in cache_engine * Adjust HPU RoPE for non-query runs * Integrate HPU primitive implementations * Add xops bindings * Cast paged attention inputs to bfloat16 * Remove leftover debug calls * Update comments on HPU ops * Restoring NVIDIA compatibility in setup.py * vllm.hpu cleanup * Added HPU-specific requirements * Restored full functionality on NVIDIA * vllm.core cleanup * vllm init cleanup * vllm.hpu cleanup * vllm.benchmarks cleanup * vllm.entrypoint cleanup * Changed is_hpu logic * vllm.benchmark cleanup * Fixed importing condition * tests cleanup * removed dummy printings * Update test_api_server.py * restored attention and logprobs tests functionality on Nvidia * throughput benchmark cleanup * Changed Habana copyright header * Restored alibi in bloom * Added BSD license header --------- Co-authored-by: Xiaotong Chen <xchen@habana.ai> Co-authored-by: Jinyan Chen <jychen@habana.ai> Co-authored-by: Mikhail Dvoretckii <mdvoretckii@habana.ai> Co-authored-by: Sebastian Urwan <surwan@habana.ai>
- Loading branch information
1 parent
bd29cf3
commit 512c414
Showing
33 changed files
with
1,796 additions
and
204 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
ninja # For faster builds. | ||
psutil | ||
ray >= 2.5.1 | ||
pandas # Required for Ray data. | ||
pyarrow # Required for Ray data. | ||
sentencepiece # Required for LLaMA tokenizer. | ||
numpy | ||
#torch == 2.1.2 | ||
transformers >= 4.36.0 # Required for Mixtral. | ||
#xformers == 0.0.23.post1 # Required for CUDA 12.1. | ||
fastapi | ||
uvicorn[standard] | ||
pydantic == 1.10.13 # Required for OpenAI server. | ||
aioprometheus[starlette] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.