-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend #6143
Merged
Merged
Changes from 250 commits
Commits
Show all changes
442 commits
Select commit
Hold shift + click to select a range
d2e2854
fix rotary embedding
jikunshang 97bd0fd
Avoiding torch.index_select for embedding LoRA–B
SanjuCSudhakaran ededdaf
Remove special handling of no-LoRA case
SanjuCSudhakaran b507cc4
Update test
SanjuCSudhakaran 016f343
Fix formatting
SanjuCSudhakaran d9fa7cf
Dispersed dummy slots (#243)
madamczykhabana 7488c58
Use PT_COMPILE_ONLY_MODE during warmup (#227)
mfylcek 17447ed
Do not pass warmup_mode to execute_model_kwargs (#229)
kzawora-intel b50aa14
Add error handling for PT_COMPILE_ONLY_MODE (#251)
kzawora-intel 00f1333
Hardcode fastapi version due to pydantic error (#255)
hlahkar b764610
Mask based BGMV implementation for LoRA Embedding (#247)
vivekgoe 73af823
Eliminate graph breaks for torch.compile mode (#202)
yuwenzho 5cf8441
Port flat PA from habana_next to habana_main (#169)
dolszewska 2fed15b
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel f74fe23
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel e2c8b5a
format.sh
kzawora-intel 4194195
i did not drink my afternoon coffee and made an oopsie
kzawora-intel 4052bdb
Add disable_tensor_cache=True to HPUGraph capture (#252)
kzawora-intel c9bf908
do not build core ext on hpu
kzawora-intel 69df1e7
Fix dispersed slots (#261)
madamczykhabana 53f96b7
Skip compilation warnings during warmup phase (#262)
jkaniecki d436d38
fix tensor parallelism
kzawora-intel 61b6fbb
add missing functions
kzawora-intel 2091161
Port PT Profiler to habana_main (#256)
adobrzyniewicz-habana c9bdcbe
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel 8e41fb5
Merge remote-tracking branch 'upstream/main' into private/kzawora/vll…
kzawora-intel 68e0f57
Reduce frequency of garbage collector
kwisniewski98 b776d5e
Fix LoRA test by handling mask creation inside the test
SanjuCSudhakaran c0ff22f
Fix LoRA test by handling mask creation inside the test (#270)
vivekgoe f858d43
Attn MetaData dtype should be same as model dtype (#271)
hlahkar acf7d54
Support Mixtral quantization using INC (#267)
dudilester 6a734f4
Fixed ALiBi (#254)
itaraban 543bb6d
Update gaudi-installation.rst (#279)
dolszewska c2c1e0f
Move setting gc threshold to separate function
kwisniewski98 6b3503c
Fix mypy issues
kwisniewski98 8535d53
Fix line too long
kwisniewski98 27b618a
Format files
kwisniewski98 35a4a98
Remove hardcoded value from softmax in flat_pa (#280)
madamczykhabana 046cb25
Fix yapf detected format issue
xuechendi aa4c59c
some update to vision model
xuechendi 181babf
resolve conflicts
xuechendi 88b06c2
Increase garbage collector's threshold (#281)
kwisniewski98 54c1688
[Bugfix][Habana_main] fix guided_decode HPU failing issue (#236)
michalkuligowski 8a92591
fix rotary embedding `rotary_dim` not equal `head_size` case (#245)
michalkuligowski ffa7174
[Bugfix][Habana_main] - dbrx model and arctic model codes fix to remo…
michalkuligowski f4ac1f9
Add Dockerfile.hpu (#200)
michalkuligowski 1a35da2
fix ruff detected format error
xuechendi 3b710a6
fix mypy format error
xuechendi 5abe4d7
Move ALiBi to supported features in README_GAUDI.md
kwisniewski98 4c1ca3a
optimized topp/topk calculation (#195)
michalkuligowski 1a712d5
Move ALiBi to supported features in gaudi-installation.rst
kwisniewski98 44c4f93
[Bugfix][Habana_main] fix multi-modal model inference - tested with l…
michalkuligowski a9de5ba
Add fake HPU mode to Habana components with dummy habana_frameworks m…
jmaksymczuk d39298c
Update documentation on support of fp8 (#288)
michalkuligowski ed19acd
Reduce default value of VLLM_GRAPH_RESERVED_MEM to 0.1
kzawora-intel 6a96d9b
Removed vllm.hpu directory and changed relevant imports (#291)
tzielinski-habana 47a89be
Reduce default value of VLLM_GRAPH_RESERVED_MEM to 0.1 (#292)
michalkuligowski 18d6339
fix minor logging issue
schoi-habana 83b54e9
Fix minor logging issue in habana_model_runner.py (#294)
michalkuligowski b62fba8
Fix blocks number calculation for Flat PA (#269)
iboiko-habana 347f9c7
Merge branch 'habana_main' into private/kwisniewski/alibi_readme_update
kwisniewski98 cd7b1c1
Remove dummy seq group data creation from loop (#301)
iboiko-habana 12d7033
optimize qwen2 model on Gaudi (#233)
czhu15 bc39baa
fix bug: device_str in initialize_ray_cluster requires uppercase stri…
hlin99 b2653ab
Fix Lora Rebase (#290)
hlahkar 82960d8
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel f4d2097
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel 9f8b8e7
add missing files
kzawora-intel 346139d
format.sh
kzawora-intel 6d45443
more format.sh
kzawora-intel 3a0ff3b
gha update
kzawora-intel 6502b91
Separate LoRA algorithms
kzawora-intel 7057da5
yapf is being a headache
kzawora-intel 43df762
oh come on now
kzawora-intel 3134b8a
fix fakehpu mode
kzawora-intel f92ffc1
Fix calculating slots for warmup (#310)
madamczykhabana 63fae51
Removed padding block from a list of available blocks in allocators (…
tzielinski-habana aa507d4
Fix seq_len for padding sequences (#318)
madamczykhabana b70a8c2
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel a844837
Fix lora specific conditions in profile-run
SanjuCSudhakaran 084db0f
Fix lora specific conditions in profile-run (#317)
vivekgoe a9f94be
TP fixes
kzawora-intel 9bb65b7
Run with HPU graphs even when warmup was skipped (#320)
madamczykhabana 2a499c7
mixtral api fixes
kzawora-intel 9372734
revert debug prints
kzawora-intel c15ddd2
format.sh
kzawora-intel f5d254d
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel e00ab5a
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel 3bb593a
use ray for hpu distributed inference
kzawora-intel f9b222e
vLLM 0.6.1 rebase (#311)
kzawora-intel 2f23cb7
prune the easy parts
kzawora-intel 28df6fd
prune more easy parts
kzawora-intel c6d2d5a
prune lora files
kzawora-intel 97c398e
prune unnecessary docs
kzawora-intel 6a913b3
revert requirements-build.txt changes
kzawora-intel c64dc83
Move profilers to vllm-hpu-extension (#323)
kzawora-intel f56953f
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel c562b02
Revert "Add fake HPU mode to Habana components with dummy habana_fram…
kzawora-intel cf3bbd2
fix revert
kzawora-intel 09357b4
Revert "Initial commit"
kzawora-intel 3713da8
cleanup
kzawora-intel bb6564a
remove redundant import
kzawora-intel c968320
Restore upstream requirements-build.txt (#324)
kzawora-intel 58d5cde
Remove reminder_comment.yml workflow (#325)
kzawora-intel cf4c3e5
Don't throw "Failed to import from vllm._C" warning on HPU (#326)
kzawora-intel aa5edcc
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel f6ff4a7
restore reminder_comment.yml
kzawora-intel a000e62
Revert "[Doc][BugFix] Update setup instructions and reference links (…
kzawora-intel 41217cf
Fix doc build warnings (#330)
kzawora-intel 4eb9809
fix qwen2 model issue (#329)
jikunshang c1232e9
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel 20c87dd
update docs
kzawora-intel 9be37a3
Remove vllm.utils.is_hpu() (#331)
kzawora-intel c90e153
Merge remote-trackng branch 'origin/habana_main' into private/kzawora…
kzawora-intel 874f3d8
remove get_device
kzawora-intel e16918d
Remove logger from layernorm (#332)
kzawora-intel 18b0e98
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel 347380f
Fix INC FP8 inference after rebase
kzawora-intel 73f4b48
Fix INC FP8 inference after rebase (#333)
kzawora-intel fc1cf5e
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel e2f72e3
Merge remote-tracking branch 'upstream/main' into private/kzawora/pru…
kzawora-intel b582d77
Make weights_load_device not change EngineArgs.create_load_config()
kzawora-intel b90adac
More robust load device autodetection
kzawora-intel d853eeb
WA for none load device
kzawora-intel 9111a80
Make weights_load_device not change EngineArgs.create_load_config() (…
kzawora-intel db8dbce
device type
kzawora-intel c337e93
Revert "fix guided_decode HPU failing issue"
kzawora-intel e8e369f
load device fix
kzawora-intel 8c6dcae
Refine INC shutdown code (#335)
kzawora-intel cef2f54
Setting enough cache_size_limit for torch.compile warmup (#238)
zehao-intel 45ee586
Change default values for decode bucket flags (#316)
iboiko-habana 29fb5ed
Support loading checkpoints quantized using Autofp8 (#286)
Yantom1 4c8a6c6
Fix torch.compile issue of dispatch key set mismatch (#299)
yuwenzho 1c6bada
Chunk prefill cache writes, remove div_i32 from insert_or_update_cach…
kzawora-intel fccaca0
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel 5ffcfa3
Update cpu-test.yml
kzawora-intel c3577af
Fix runtime errors reported when using long input sequence lengths wi…
vivekgoe f347a84
vLLM 0.6.2 rebase (#340)
kzawora-intel ed85058
Enable Async output process for HPU (#342)
zhouyu5 b611e20
Port last_bucket change from v1.18.0 (#347)
iboiko-habana 3010f8c
Add setuptools_scm to requirements-hpu.txt (#349)
kzawora-intel 44d8173
test_lora_manager fix
rsshaik1 188bd3a
Added both hpu and gpu specific changes confest
rsshaik1 f59495a
Added the changes to conftest to fix test_lora_manager
rsshaik1 b0a9d02
Applied the format changes in conftest
rsshaik1 70f544c
Resolved format issues in conftest
rsshaik1 ec34f88
Added changes of HPU flags
rsshaik1 c7b1509
Fixed lora manager tests (#315)
vivekgoe cafff17
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel 25f4ed9
Oct 01 rebase (#353)
kzawora-intel da03d8b
Lora Mask based on lora index (#348)
hlahkar f848d27
Add rope_scaling support for LLama3.1 (#356)
kdamaszk d8ba780
[Core] Support Torch profiler in Habana Worker (#357)
mswiniarsk 250487b
[Refactor] Rename components *Habana* -> *HPU*
kzawora-intel eb095b3
oopsie
kzawora-intel 65fa6f6
format.sh
kzawora-intel 0576360
make yapf happy
kzawora-intel 7f73cc9
Merge remote-tracking branch 'upstream/main' into private/kzawora/hab…
kzawora-intel b4e26d3
fix sampler metadata generation
kzawora-intel cfe231d
[Refactor] Rename components *Habana* -> *HPU* (#359)
kzawora-intel 38e60f4
Oct 04 rebase (#360)
kzawora-intel 76cbbb5
Use BF16 on HPU by default
kzawora-intel 95a7ece
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel d7d609f
Revert "Support loading checkpoints quantized using Autofp8 (#286)"
kzawora-intel c07cbc6
remove lora test
kzawora-intel d90bbce
revert FP8 changes
kzawora-intel 84dc6c5
remove leftover fp8 code
kzawora-intel f7288de
remove weights_load_device stuff
kzawora-intel 6899c3f
remove weights_load_device
kzawora-intel e5d640e
fp8 leftovers
kzawora-intel 25388e2
Update vllm/model_executor/layers/logits_processor.py
kzawora-intel b4f7ffa
Rename HabanaAttention -> HPUAttention
kzawora-intel 43959db
oopsie
kzawora-intel b8404ad
format.sh
kzawora-intel d38564f
fix comment length
kzawora-intel eed1b05
Merge remote-tracking branch 'origin/private/kzawora/hpu_attn' into p…
kzawora-intel 5c3e29c
Merge remote-tracking branch 'origin/private/kzawora/hpu_bf16_default…
kzawora-intel 33c1db0
fix comment
kzawora-intel 05777e0
Lazily import HPU-dependent components
kzawora-intel 1f6de5d
Lazily import HPU-dependent components (#363)
kzawora-intel ad08dd4
[Refactor] Rename HabanaAttention -> HPUAttention (#362)
kzawora-intel e00750e
Use BF16 on HPU by default (#361)
kzawora-intel db5aed6
Set vllm-hpu-extension to 36c7f9c (#365)
madamczykhabana 902f575
Add AliBi to supported features in README_GAUDI.md (#287)
kzawora-intel 27c05e1
Merge remote-tracking branch 'upstream/main' into habana_main
kzawora-intel bb4c23e
format.sh
kzawora-intel 563184a
Fix hpu_set_env call in load_model in vllm (#364)
Yantom1 0e46492
Update offline_inference_fakehpu.py
michalkuligowski 6028354
Timeout adjusted in MLLMEngine (#368)
jczaja 64369fd
Add Jenkins test definitions (#369)
kzawora-intel 69fb91c
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel 1ee20c5
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel 388e500
Make workaround for SW-204785 broader (#374)
kzawora-intel 8f79b6e
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel ca98dae
Fix LoRA tests by handling broken imports
SanjuCSudhakaran 4030216
Fix LoRA tests by handling broken import (#376)
vivekgoe b70c1a5
[CI] Report test name, add properties to JUnitXML (#377)
kzawora-intel 49444bc
Disable performance counters if profiler is not enabled (#383)
kdamaszk d6bd375
Remove constraints for bucket creation during warmup in LoRA
SanjuCSudhakaran 4f1787b
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel 6cd4694
Remove constraints for bucket creation during warmup in LoRA (#382)
vivekgoe d8f2aa7
seed_everything function doesn't handle HPU (#384)
SanjuCSudhakaran 03b407b
Fixed lora_manager tests with hpu_model_runner (#386)
rsshaik1 ebd42c4
Reformat README_GAUDI.md (#389)
kzawora-intel 2d2bf7a
[CI] Prepare separate Jenkins tests for torch compile mode (#388)
anko-intel 9df1d4a
Remove workaround added to resolve multi-card stall issue (#387)
SanjuCSudhakaran 9777c9f
Update SynapseAI version in README & Dockerfile (#390)
kzawora-intel 5ceda69
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel 3e6a2d4
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel 9ac52ab
fix attention backend selector:
kzawora-intel 57bc31d
Oct 7 rebase (#367)
kzawora-intel 55dd07e
enable mixtral quantization using INC (#372)
dudilester 401f5ae
[CI] Temporarily increase test tolerances (#392)
kzawora-intel e598f3f
Add quickstart section to READMEs (#391)
kzawora-intel f77435d
Softmax: add weighted-sum normalization (#378)
madamczykhabana 0783d18
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel 2fa46cd
remove jenkins files
kzawora-intel 3683db6
restore README.md
kzawora-intel 91af5da
remove fakehpu
kzawora-intel d2ce468
use sentinel in model runner base WA
kzawora-intel b6428cd
remove leftovers from habana_main
kzawora-intel 5149278
remove leftovers from habana_main
kzawora-intel f4b356f
remove HPUExecutorAsync import
kzawora-intel 3eee00d
remove hpu fused_moe
kzawora-intel a59fc7b
Remove HPU changes from cache_engine.py (#400)
kzawora-intel c07951b
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel 398c5c3
Merge remote-tracking branch 'origin' into HEAD
kzawora-intel f79d454
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel 8b6e30d
remove hpuexecutor import
kzawora-intel 05bcdf5
[bucketing overhaul 1/n] Add padding-aware scheduling and option to l…
kzawora-intel c11f23a
Add forward_hpu to RotaryEmbedding, remove custom module
kzawora-intel 78a816c
add missing mark step in test
kzawora-intel 640f0be
Merge branch 'private/kzawora/rope_rework' into HEAD
kzawora-intel e894746
Merge branch 'private/kzawora/oct_16_rebase' into HEAD
kzawora-intel 5bc3985
cleanup
kzawora-intel 14f8af4
padding-aware scheduler cleanup
kzawora-intel 65e34f6
fix sentinel usage in model runner base
kzawora-intel 4757350
doc fixes
kzawora-intel ef6603c
Update requirements-hpu.txt
kzawora-intel 4c306cf
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel f0d6c5c
Merge remote-tracking branch 'origin/habana_upstream' into HEAD
kzawora-intel 3043141
Merge branch 'main' into habana_upstream
kzawora-intel 92e23fe
Merge branch 'main' into habana_upstream
kzawora-intel 397405b
Merge branch 'main' into habana_upstream
kzawora-intel acec97b
Merge branch 'main' into habana_upstream
kzawora-intel bc0bf43
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel 01b190e
Update docs/source/index.rst
kzawora-intel ede1280
Merge branch 'main' into habana_upstream
kzawora-intel bb512dd
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel c9ce231
Conform to new worker/model_runner APIs
kzawora-intel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
FROM vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest | ||
|
||
COPY ./ /workspace/vllm | ||
|
||
WORKDIR /workspace/vllm | ||
|
||
RUN pip install -v -r requirements-hpu.txt | ||
|
||
ENV no_proxy=localhost,127.0.0.1 | ||
ENV PT_HPU_ENABLE_LAZY_COLLECTIVES=true | ||
|
||
RUN VLLM_TARGET_DEVICE=hpu python3 setup.py install | ||
|
||
WORKDIR /workspace/ | ||
|
||
RUN ln -s /workspace/vllm/tests && ln -s /workspace/vllm/examples && ln -s /workspace/vllm/benchmarks | ||
kzawora-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"] |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Common dependencies | ||
-r requirements-common.txt | ||
|
||
# Dependencies for HPU code | ||
ray == 2.32.0 | ||
kzawora-intel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
triton | ||
pandas | ||
tabulate | ||
vllm-hpu-extension @ git+https://github.com/HabanaAI/vllm-hpu-extension.git@0a7adab |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: Does Gaudi support PyTorch 2.4 or later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#8932
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Image with 2.4 is now here, 2.5 will be also supported shortly.