From e08e01eb2d446db7460350e2320b0f590c42908c Mon Sep 17 00:00:00 2001 From: Jiarui Fang Date: Fri, 11 Oct 2024 15:08:56 +0800 Subject: [PATCH] [doc] polish readme (#302) --- README.md | 19 +++++++++++++------ docs/methods/ditfastattn.md | 2 +- docs/methods/ditfastattn_zh.md | 2 +- 3 files changed, 15 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index b930e15..4bc7ef7 100644 --- a/README.md +++ b/README.md @@ -92,7 +92,7 @@ Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https:

📢 Updates

-* 🎉**October 10, 2024**: xDiT applied DiTFastAttn to accelerate single GPU inference for Pixart Models! The scripst is [./scripts/run_fast_pixart.py](./scripts/run_fast_pixart.py). +* 🎉**October 10, 2024**: xDiT applied DiTFastAttn to accelerate single GPU inference for Pixart Models! * 🎉**September 26, 2024**: xDiT has been officially used by [THUDM/CogVideo](https://github.com/THUDM/CogVideo)! The inference scripts are placed in [parallel_inference/](https://github.com/THUDM/CogVideo/blob/main/tools/parallel_inference) at their repository. * 🎉**September 23, 2024**: Support CogVideoX. The inference scripts are [examples/cogvideox_example.py](examples/cogvideox_example.py). * 🎉**August 26, 2024**: We apply torch.compile and [onediff](https://github.com/siliconflow/onediff) nexfort backend to accelerate GPU kernels speed. @@ -157,7 +157,7 @@ Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https:

🚀 QuickStart

-### 1. Install from pip (current [version](./xfuser/__version__.py)) +### 1. Install from pip ``` pip install xfuser @@ -189,7 +189,10 @@ You can easily modify the model type, model directory, and parallel options in t bash examples/run.sh ``` -To inspect the available options for the PixArt-alpha example, use the following command: +--- + +
+Click to see available options for the PixArt-alpha example ```bash python ./examples/pixartalpha_example.py -h @@ -249,10 +252,14 @@ Input Options: Number of inference steps. ``` +
+ +--- + Hybriding multiple parallelism techniques togather is essential for efficiently scaling. It's important that the product of all parallel degrees matches the number of devices. -For instance, you can combine CFG, PipeFusion, and sequence parallelism with the command below to generate an image of a cute dog through hybrid parallelism. -Here ulysses_degree * pipefusion_parallel_degree * cfg_degree(use_split_batch) == number of devices == 8. +Note use_cfg_parallel means cfg_parallel=2. For instance, you can combine CFG, PipeFusion, and sequence parallelism with the command below to generate an image of a cute dog through hybrid parallelism. +Here ulysses_degree * pipefusion_parallel_degree * cfg_degree(use_cfg_parallel) == number of devices == 8. ```bash @@ -376,7 +383,7 @@ For usage instructions, refer to the [example/run.sh](./examples/run.sh). Simply xDiT also provides DiTFastAttn for single GPU acceleration. It can reduce computation cost of attention layer by leveraging redundancies between different steps of the Diffusion Model. -[DiTFastAttn](./docs/methods/dittfastattn.md) +[DiTFastAttn: Attention Compression for Diffusion Transformer Models](./docs/methods/ditfastattn.md)

📚 Develop Guide

diff --git a/docs/methods/ditfastattn.md b/docs/methods/ditfastattn.md index 6cc2df4..442e50e 100644 --- a/docs/methods/ditfastattn.md +++ b/docs/methods/ditfastattn.md @@ -1,4 +1,4 @@ -### DiTFastAttn +### DiTFastAttn: Attention Compression for Diffusion Transformer Models [DiTFastAttn](https://github.com/thu-nics/DiTFastAttn) is an acceleration solution for single-GPU DiTs inference, utilizing Input Temporal Reduction to reduce computational complexity through the following three methods: diff --git a/docs/methods/ditfastattn_zh.md b/docs/methods/ditfastattn_zh.md index 469cb08..78d2a43 100644 --- a/docs/methods/ditfastattn_zh.md +++ b/docs/methods/ditfastattn_zh.md @@ -1,4 +1,4 @@ -### DiTFastAttn +### DiTFastAttn: Attention Compression for Diffusion Transformer Models [DiTFastAttn](https://github.com/thu-nics/DiTFastAttn)是一种针对单卡DiTs推理的加速方案,利用Input Temperal Reduction通过如下三种方式来减少计算量: