Skip to content

Commit

Permalink
correct perf numbers and add details (#414)
Browse files Browse the repository at this point in the history
  • Loading branch information
chengzeyi authored Dec 27, 2024
1 parent 70b156d commit 57eb27f
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion docs/performance/hunyuanvideo.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

xDiT is [HunyuanVideo](https://github.com/Tencent/HunyuanVideo?tab=readme-ov-file#-parallel-inference-on-multiple-gpus-by-xdit)'s official parallel inference engine. On H100 and H20 GPUs, xDiT reduces the generation time of 1028x720 videos from 31 minutes to 5 minutes, and 960x960 videos from 28 minutes to 6 minutes.

The H100 and H20 performance benchmarks are done with the official HunyuanVideo repository. The L20 performance benchmarks are done with the `diffusers` implementation.
The L20 performance benchmarks are measured using this [script](examples/hunyuan_video_usp_example.py), along with `flash-attn==2.7.2.post1` and CUDA 12.4.

### 1280x720 Resolution (129 frames, 50 steps) - Ulysses Latency (seconds)

<center>
Expand All @@ -22,6 +25,6 @@ xDiT is [HunyuanVideo](https://github.com/Tencent/HunyuanVideo?tab=readme-ov-fil
|----------|--------|---------|---------|---------|
| H100 | 1,735.01 | 934.09 | 645.45 | 367.02 |
| H20 | 6,621.46 | 3,400.55 | 2,310.48 | 1,214.67 |
| L20 | 6,039.08 | 3,260.62 | 2,070.96 | |
| L20 | 6,039.08 | 3,260.62 | 2,284.74 | |

</center>

0 comments on commit 57eb27f

Please sign in to comment.