Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MFM-20250115] Merge from ROCm/main to llama_fp8 #360

Merged
merged 537 commits into from
Jan 15, 2025

Conversation

tjtanaa
Copy link

@tjtanaa tjtanaa commented Jan 15, 2025

This is a PR to merge from ROCm/main to llama_fp8

Notes:
The test cases in tests/kernels/test_moe.py have been updated based on the main branch. However, these test cases do not take the MOE_SHUFFLE environment variable into account, unlike the llama-fp8 branch. If shuffling is required, we will need to revert the test cases to the version from the llama-fp8 branch.

jeejeelee and others added 30 commits December 24, 2024 13:05
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…project#11494)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: lucast2021 <lucast2021@headroyce.org>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
…ct#11509)

Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…sampler (vllm-project#10681)

Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…#11521)

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…zation (vllm-project#11523)

Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: HandH1998 <1335248067@qq.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
…11465)

Signed-off-by: Alex He <alehe@amd.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
noemotiovon and others added 21 commits January 13, 2025 15:47
Signed-off-by: Chenguang Li <757486878@qq.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
…ect#11998)

Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…llm-project#11935)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
* Commiting the *multilingual* P3L test.

* Created a *multi-lingual* P3L test.

* Making ruff happy.

* .

* Added a reference to the language-scripture Confluence table.

* Typo fixing.

* Harmonizing naming.

* Fixing comments in the header.

---------

Co-authored-by: Alexei V. Ivanov <alivanov@banff-cyxtera-s65-4.amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
* [Bugfix][V1] Fix molmo text-only inputs (vllm-project#11676)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* [Kernel] Move attn_type to Attention.__init__() (vllm-project#11690)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* [V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (vllm-project#11685)

Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Fix LLaVA-NeXT feature size precision error (for real) (vllm-project#11772)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Model] Future-proof Qwen2-Audio multi-modal processor (vllm-project#11776)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [XPU] Make pp group initilized for pipeline-parallelism (vllm-project#11648)

Signed-off-by: yisheng <yi.sheng@intel.com>

* [Doc][3/N] Reorganize Serving section (vllm-project#11766)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Kernel][LoRA]Punica prefill  kernels fusion (vllm-project#11234)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Abatom <abzhonghua@gmail.com>
Co-authored-by: Zhonghua Deng <abatom@163.com>

* [Bugfix] Update attention interface in `Whisper` (vllm-project#11784)

Signed-off-by: Roger Wang <ywang@roblox.com>

* [CI] Fix neuron CI and run offline tests (vllm-project#11779)

Signed-off-by: Liangfu Chen <liangfc@amazon.com>

* fix init error for MessageQueue when n_local_reader is zero (vllm-project#11768)

* [Doc] Create a vulnerability management team (vllm-project#9925)

Signed-off-by: Russell Bryant <rbryant@redhat.com>

* [CI][CPU] adding build number to docker image name (vllm-project#11788)

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

* [V1][Doc] Update V1 support for `LLaVa-NeXT-Video` (vllm-project#11798)

Signed-off-by: Roger Wang <ywang@roblox.com>

* [Bugfix] Comprehensively test and fix LLaVA-NeXT feature size calculation (vllm-project#11800)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [doc] add doc to explain how to use uv (vllm-project#11773)

Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

* [V1] Support audio language models on V1 (vllm-project#11733)

Signed-off-by: Roger Wang <ywang@roblox.com>

* [doc] update how pip can install nightly wheels (vllm-project#11806)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [Doc] Add note to `gte-Qwen2` models (vllm-project#11808)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [optimization] remove python function call for custom op (vllm-project#11750)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [Bugfix] update the prefix for qwen2 (vllm-project#11795)

Co-authored-by: jiadi.jjd <jiadi.jjd@antgroup.com>

* [Doc]Add documentation for using EAGLE in vLLM (vllm-project#11417)

Signed-off-by: Sourashis Roy <sroy@roblox.com>

* [Bugfix] Significant performance drop on CPUs with --num-scheduler-steps > 1 (vllm-project#11794)

* [Doc] Group examples into categories (vllm-project#11782)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Bugfix] Fix image input for Pixtral-HF (vllm-project#11741)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Misc] sort torch profiler table by kernel timing (vllm-project#11813)

* Remove the duplicate imports of MultiModalKwargs and PlaceholderRange… (vllm-project#11824)

* Fixed docker build for ppc64le (vllm-project#11518)

Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>

* [OpenVINO] Fixed Docker.openvino build (vllm-project#11732)

Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com>

* [Bugfix] Add checks for LoRA and CPU offload (vllm-project#11810)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* [Docs] reorganize sponsorship page (vllm-project#11639)

Signed-off-by: simon-mo <simon.mo@hey.com>

* [Bug] Fix pickling of `ModelConfig` when RunAI Model Streamer is used (vllm-project#11825)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [misc] improve memory profiling (vllm-project#11809)

Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

* [doc] update wheels url (vllm-project#11830)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [Docs] Update sponsor name: 'Novita' to 'Novita AI' (vllm-project#11833)

* [Hardware][Apple] Native support for macOS Apple Silicon (vllm-project#11696)

Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>

* [torch.compile] consider relevant code in compilation cache (vllm-project#11614)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [VLM] Reorganize profiling/processing-related code (vllm-project#11812)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Doc] Move examples into categories (vllm-project#11840)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Doc][4/N] Reorganize API Reference (vllm-project#11843)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [CI/Build][Bugfix] Fix CPU CI image clean up (vllm-project#11836)

Signed-off-by: jiang1.li <jiang1.li@intel.com>

* [Bugfix][XPU] fix silu_and_mul (vllm-project#11823)

Signed-off-by: yan ma <yan.ma@intel.com>

* [Misc] Move some model utils into vision file (vllm-project#11848)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Doc] Expand Multimodal API Reference (vllm-project#11852)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Misc]add some explanations for BlockHashType (vllm-project#11847)

* [TPU][Quantization] TPU `W8A8` (vllm-project#11785)

Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

* [Kernel][Triton][AMD] Use block size heuristic for avg 2.8x speedup for int8 models (vllm-project#11698)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* [Docs] Add Google Cloud Meetup (vllm-project#11864)

* [CI] Turn on basic correctness tests for V1 (vllm-project#10864)

* treat do_lower_case in the same way as the sentence-transformers library (vllm-project#11815)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

* [Doc] Recommend uv and python 3.12 for quickstart guide (vllm-project#11849)

Signed-off-by: mgoin <michael@neuralmagic.com>

* [Misc] Move `print_*_once` from utils to logger (vllm-project#11298)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>

* [Doc] Intended links Python multiprocessing library (vllm-project#11878)

* [perf]fix current stream (vllm-project#11870)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [Bugfix] Override dunder methods of placeholder modules (vllm-project#11882)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] fix beam search input errors and latency benchmark script (vllm-project#11875)

Signed-off-by: Ye Qi <yeq@meta.com>
Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com>

* [Doc] Add model development API Reference (vllm-project#11884)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [platform] Allow platform specify attention backend (vllm-project#11609)

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>

* [ci]try to fix flaky multi-step tests (vllm-project#11894)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [Misc] Provide correct Pixtral-HF chat template (vllm-project#11891)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Docs] Add Modal to deployment frameworks (vllm-project#11907)

* [Doc][5/N] Move Community and API Reference to the bottom (vllm-project#11896)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Simon Mo <simon.mo@hey.com>

* [VLM] Enable tokenized inputs for merged multi-modal processor (vllm-project#11900)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Doc] Show default pooling method in a table (vllm-project#11904)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [torch.compile] Hide KV cache behind torch.compile boundary (vllm-project#11677)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* [Bugfix] Validate lora adapters to avoid crashing server (vllm-project#11727)

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

* [BUGFIX] Fix `UnspecifiedPlatform` package name (vllm-project#11916)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* [ci] fix gh200 tests (vllm-project#11919)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [misc] remove python function call for custom activation op (vllm-project#11885)

Co-authored-by: youkaichao <youkaichao@gmail.com>

* [platform] support pytorch custom op pluggable (vllm-project#11328)

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

* Replace "online inference" with "online serving" (vllm-project#11923)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [ci] Fix sampler tests (vllm-project#11922)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [Doc] [1/N] Initial guide for merged multi-modal processor (vllm-project#11925)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [platform] support custom torch.compile backend key (vllm-project#11318)

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>

* [Doc] Rename offline inference examples (vllm-project#11927)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Docs] Fix docstring in `get_ip` function (vllm-project#11932)

Signed-off-by: Kuntai Du <kuntai@uchicago.edu>

* Doc fix in `benchmark_long_document_qa_throughput.py` (vllm-project#11933)

Signed-off-by: Kuntai Du <kuntai@uchicago.edu>

* [Hardware][CPU] Support MOE models on x86 CPU (vllm-project#11831)

Signed-off-by: jiang1.li <jiang1.li@intel.com>

* [Misc] Clean up debug code in Deepseek-V3 (vllm-project#11930)

Signed-off-by: Isotr0py <2037008807@qq.com>

* [Misc] Update benchmark_prefix_caching.py fixed example usage (vllm-project#11920)

Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn>
Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn>

* [Bugfix] Check that number of images matches number of <|image|> tokens with mllama (vllm-project#11939)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

* [mypy] Fix mypy warnings in api_server.py (vllm-project#11941)

Signed-off-by: Fred Reiss <frreiss@us.ibm.com>

* [ci] fix broken distributed-tests-4-gpus (vllm-project#11937)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [Bugfix][SpecDecode] Adjust Eagle model architecture to align with intended design (vllm-project#11672)

Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>

* [Bugfix] fused_experts_impl wrong compute type for float32 (vllm-project#11921)

Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>

* [CI/Build] Move model-specific multi-modal processing tests (vllm-project#11934)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Doc] Basic guide for writing unit tests for new models (vllm-project#11951)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Bugfix] Fix RobertaModel loading (vllm-project#11940)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [Model] Add cogagent model support vLLM (vllm-project#11742)

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>

* [V1] Avoid sending text prompt to core engine (vllm-project#11963)

Signed-off-by: Roger Wang <ywang@roblox.com>

* [CI/Build] Add markdown linter (vllm-project#11857)

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

* [Model] Initialize support for Deepseek-VL2 models (vllm-project#11578)

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

* [Hardware][CPU] Multi-LoRA implementation for the CPU backend (vllm-project#11100)

Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>

* [Hardware][TPU] workaround fix for MoE on TPU (vllm-project#11764)

* [V1][Core][1/n] Logging and Metrics (vllm-project#11962)

Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>

* [Model] Support GGUF models newly added in `transformers` 4.46.0 (vllm-project#9685)

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

* [V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction (vllm-project#11973)

Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>

* [MISC] fix typo in kv transfer send recv test (vllm-project#11983)

* [Bug] Fix usage of `.transpose()` and `.view()` consecutively. (vllm-project#11979)

* [CI][Spec Decode] fix: broken test for EAGLE model (vllm-project#11972)

Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>

* [Misc] Fix Deepseek V2 fp8 kv-scale remapping (vllm-project#11947)

Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>

* [Misc]Minor Changes about Worker (vllm-project#11555)

Signed-off-by: Chenguang Li <757486878@qq.com>

* [platform] add ray_device_key (vllm-project#11948)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* Fix Max Token ID for Qwen-VL-Chat (vllm-project#11980)

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

* [Kernel] unified_attention for Attention.forward (vllm-project#11967)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* [Doc][V1] Update model implementation guide for V1 support (vllm-project#11998)

Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

* [Doc] Organise installation documentation into categories and tabs (vllm-project#11935)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [platform] add device_control env var (vllm-project#12009)

Signed-off-by: youkaichao <youkaichao@gmail.com>

* [Platform] Move get_punica_wrapper() function to Platform (vllm-project#11516)

Signed-off-by: Shanshan Shen <467638484@qq.com>

* bugfix: Fix signature mismatch in benchmark's `get_tokenizer` function (vllm-project#11982)

Signed-off-by: elijah <f1renze.142857@gmail.com>

* Using list

* Revert "[misc] improve memory profiling (vllm-project#11809)"

This reverts commit 889e662.

* Trying to make scales work with compileable attention

* Docs lint

---------

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yisheng <yi.sheng@intel.com>
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: yan ma <yan.ma@intel.com>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
Signed-off-by: Ye Qi <yeq@meta.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Fred Reiss <frreiss@us.ibm.com>
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Chenguang Li <757486878@qq.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Shanshan Shen <467638484@qq.com>
Signed-off-by: elijah <f1renze.142857@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: YiSheng5 <yi.sheng@intel.com>
Co-authored-by: Zhonghua Deng <abatom@163.com>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: XiaobingZhang <xiaobingzhangupc@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Yuan <yuan.zhou@intel.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: jiangjiadi <34134495+jiangjiadi@users.noreply.github.com>
Co-authored-by: jiadi.jjd <jiadi.jjd@antgroup.com>
Co-authored-by: sroy745 <142070531+sroy745@users.noreply.github.com>
Co-authored-by: Jie Fu (傅杰) <jiefu@tencent.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: WangErXiao <863579016@qq.com>
Co-authored-by: Nishidha <nishidha.panpaliya@partner.ibm.com>
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Wallas Henrique <wallashss@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: rasmith <Randall.Smith@amd.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Maximilien de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
Co-authored-by: Guspan Tanadi <36249910+guspan-tanadi@users.noreply.github.com>
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com>
Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Charles Frye <cfrye59@gmail.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: cennn <61925104+cennn@users.noreply.github.com>
Co-authored-by: Kuntai Du <kuntai@uchicago.edu>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: minmin <rmm0811@gmail.com>
Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Fred Reiss <frreiss@us.ibm.com>
Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com>
Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: sixgod <evethwillbeok@outlook.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com>
Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com>
Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: Chenguang Li <757486878@qq.com>
Co-authored-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com>
@tjtanaa tjtanaa marked this pull request as ready for review January 15, 2025 06:09
@tjtanaa tjtanaa marked this pull request as draft January 15, 2025 06:09
@tjtanaa tjtanaa marked this pull request as ready for review January 15, 2025 07:56
@tjtanaa tjtanaa marked this pull request as draft January 15, 2025 07:59
@tjtanaa tjtanaa marked this pull request as ready for review January 15, 2025 08:04
Copy link

@hongxiayang hongxiayang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@hongxiayang hongxiayang merged commit d9385b4 into ROCm:llama_fp8_12062024 Jan 15, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.