Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix readme typos for hunyuanvideo #390

Merged
merged 1 commit into from
Dec 9, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
- [📢 Updates](#updates)
- [🎯 Supported DiTs](#support-dits)
- [📈 Performance](#perf)
- [HunyuanVideo](#perf_hunyuanvideo)
- [Mochi-1](#perf_mochi1)
- [CogVideoX](#perf_cogvideox)
- [Flux.1](#perf_flux)
Expand Down Expand Up @@ -54,7 +55,7 @@ Consequently, multi-GPU and multi-machine deployments are essential to meet the
<h3 id="meet-xdit-parallel">Parallel Inference</h3>

To meet real-time demand for DiTs applications, parallel inference is a must.
xDiT is an inference engine designed for the parallel deployment of DiTs on large scale.
xDiT is an inference engine designed for the parallel deployment of DiTs on a large scale.
xDiT provides a suite of efficient parallel approaches for Diffusion Models, as well as computation accelerations.

The overview of xDiT is shown as follows.
Expand All @@ -77,23 +78,23 @@ The four parallel methods in xDiT can be configured in a hybrid manner, optimizi
As shown in the following picture, xDiT offers a set of APIs to adapt DiT models in [huggingface/diffusers](https://github.com/huggingface/diffusers) to hybrid parallel implementation through simple wrappers.
If the model you require is not available in the model zoo, developing it yourself is straightforward; please refer to our [Dev Guide](#dev-guide).

We also have implemented the following parallel stategies for reference:
We also have implemented the following parallel strategies for reference:

1. Tensor Parallelism
2. [DistriFusion](https://arxiv.org/abs/2402.19481)


<h3 id="meet-xdit-perf">Computing Acceleration</h3>

Optimization orthogonal to parallel focuses on accelerating single GPU performance.
Optimization is orthogonal to parallel focuses on accelerating single GPU performance.

First, xDiT employs a series of kernel acceleration methods. In addition to utilizing well-known Attention optimization libraries, we leverage compilation acceleration technologies such as `torch.compile` and `onediff`.

Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https://github.com/thu-nics/DiTFastAttn), which exploits computational redundancies between different steps of the Diffusion Model to accelerate inference on a single GPU.

<h2 id="updates">📢 Updates</h2>

* 🎉**December 7, 2024**: xDiT is the official parallel inference engine for [HunyuanVideo](https://github.com/Tencent-Hunyuan/HunyuanVideo), reducing 5-sec video generation latency from 31 minutes to 5 minutes!
* 🎉**December 7, 2024**: xDiT is the official parallel inference engine for [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), reducing 5-sec video generation latency from 31 minutes to 5 minutes!
* 🎉**November 28, 2024**: xDiT achieves 1.6 sec end-to-end latency for 28-step [Flux.1-Dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) inference on 4xH100!
* 🎉**November 20, 2024**: xDiT supports [CogVideoX-1.5](https://huggingface.co/THUDM/CogVideoX1.5-5B) and achieved 6.12x speedup compare to the implementation in diffusers!
* 🎉**November 11, 2024**: xDiT has been applied to [mochi-1](https://github.com/xdit-project/mochi-xdit) and achieved 3.54x speedup compare to the official open source implementation!
Expand All @@ -116,6 +117,7 @@ Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https:

| Model Name | CFG | SP | PipeFusion |
| --- | --- | --- | --- |
| [🎬 HunyuanVideo](https://github.com/Tencent/HunyuanVideo) | NA | ✔️ | ❎ |
| [🎬 CogVideoX1.5](https://huggingface.co/THUDM/CogVideoX1.5-5B) | ✔️ | ✔️ | ❎ |
| [🎬 Mochi-1](https://github.com/xdit-project/mochi-xdit) | ✔️ | ✔️ | ❎ |
| [🎬 CogVideoX](https://huggingface.co/THUDM/CogVideoX-2b) | ✔️ | ✔️ | ❎ |
Expand All @@ -128,7 +130,7 @@ Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https:

</div>

### Supported by legacy version only, including DistriFusion and Tensor Parallel as the standalong parallel strategies:
### Supported by legacy version only, including DistriFusion and Tensor Parallel as the standalone parallel strategies:

<div align="center">

Expand Down Expand Up @@ -157,11 +159,11 @@ Currently, if you need the parallel version of ComfyUI, please fill in this [app

<h2 id="perf">📈 Performance</h2>

<h3 id="perf_mochi1">Mochi1</h3>
<h3 id="perf_hunyuanvideo">HunyuanVideo</h3>

1. [HunyuanVideo Performance Report](./docs/performance/hunyuanvideo.md)

<h3 id="perf_cogvideox">CogVideo</h3>
<h3 id="perf_cogvideox">Mochi1</h3>

2. [mochi1-xdit: Reducing the Inference Latency by 3.54x Compare to the Official Open Souce Implementation!](https://github.com/xdit-project/mochi-xdit)

Expand Down
Loading