forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable HPU support in vLLM #1
Merged
Merged
Changes from all commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
e528d06
Porting vllm to HPU
d8da01f
add hpu cache allocate
4d1538f
move slot_mapping to cpu and add is_prompt in cache_ops.reshape_and_c…
c336824
add bucket to input metadata
068c748
1. limit max block number for lazy mode (TODO)
9a042f7
remove bucket for block tables
1e7e16d
add run bash script and change benchmark config
153eb71
1. modify kv cache structure to tensors
9b7e0a7
add attention mask for generation
c99eefc
add multi_query_kv_attention attn_bias
1327be8
Temp commit
de7799f
Integrate fused kernels for RMSNorm and RoPE
b839181
Resolve merge conflicts
00df486
Minor Gaudi workarounds, add debugging to stock vLLM API server
kzawora-intel 8b20664
Merge remote-tracking branch 'origin/main' into mdvoretc/prototype
kzawora-intel 16b5557
Fix post-merge pinned memory segfaults
kzawora-intel 2b6ec4e
Re-enable sequence decode
kzawora-intel 9d4bd9f
Maintain GPU compatibility in cache_engine
kzawora-intel 7a0337a
Adjust HPU RoPE for non-query runs
6351d41
Integrate HPU primitive implementations
c0d3c69
Add xops bindings
48b26d1
Cast paged attention inputs to bfloat16
aefa573
Remove leftover debug calls
c49b68e
Update comments on HPU ops
c5c2a99
Restoring NVIDIA compatibility in setup.py
1c66908
vllm.hpu cleanup
kzawora-intel 5725b31
Added HPU-specific requirements
97d31b0
Restored full functionality on NVIDIA
07671d7
vllm.core cleanup
413fb60
vllm init cleanup
a38686e
vllm.hpu cleanup
bed7da6
vllm.benchmarks cleanup
0baa2ef
vllm.entrypoint cleanup
1f22aa1
Changed is_hpu logic
eb2c22a
vllm.benchmark cleanup
e69fca6
Fixed importing condition
38cc53b
tests cleanup
54d499a
removed dummy printings
c0ea99c
Update test_api_server.py
ea3ea44
restored attention and logprobs tests functionality on Nvidia
5543642
throughput benchmark cleanup
a2acb86
Changed Habana copyright header
956bab7
Restored alibi in bloom
702d8a7
Added BSD license header
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
ninja # For faster builds. | ||
psutil | ||
ray >= 2.5.1 | ||
pandas # Required for Ray data. | ||
pyarrow # Required for Ray data. | ||
sentencepiece # Required for LLaMA tokenizer. | ||
numpy | ||
#torch == 2.1.2 | ||
transformers >= 4.36.0 # Required for Mixtral. | ||
#xformers == 0.0.23.post1 # Required for CUDA 12.1. | ||
fastapi | ||
uvicorn[standard] | ||
pydantic == 1.10.13 # Required for OpenAI server. | ||
aioprometheus[starlette] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: May want to rephrase the comment here to mention that required functionality is integrated for HPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's true. I have some changes local because I'm still testing compatibility in all possible places (tests, benchmarks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the comment thread resolutions be withheld until the changes land on the PR? The current state makes it harder to track which issues are known, since comments on their instances may be closed without a visible change.