From e08e01eb2d446db7460350e2320b0f590c42908c Mon Sep 17 00:00:00 2001
From: Jiarui Fang <fangjiarui123@gmail.com>
Date: Fri, 11 Oct 2024 15:08:56 +0800
Subject: [PATCH] [doc] polish readme (#302)

---
 README.md                      | 19 +++++++++++++------
 docs/methods/ditfastattn.md    |  2 +-
 docs/methods/ditfastattn_zh.md |  2 +-
 3 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/README.md b/README.md
index b930e15..4bc7ef7 100644
--- a/README.md
+++ b/README.md
@@ -92,7 +92,7 @@ Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https:
 
 <h2 id="updates">📢 Updates</h2>
 
-* 🎉**October 10, 2024**: xDiT applied DiTFastAttn to accelerate single GPU inference for Pixart Models! The scripst is [./scripts/run_fast_pixart.py](./scripts/run_fast_pixart.py).
+* 🎉**October 10, 2024**: xDiT applied DiTFastAttn to accelerate single GPU inference for Pixart Models!
 * 🎉**September 26, 2024**: xDiT has been officially used by [THUDM/CogVideo](https://github.com/THUDM/CogVideo)! The inference scripts are placed in [parallel_inference/](https://github.com/THUDM/CogVideo/blob/main/tools/parallel_inference) at their repository.
 * 🎉**September 23, 2024**: Support CogVideoX. The inference scripts are [examples/cogvideox_example.py](examples/cogvideox_example.py).
 * 🎉**August 26, 2024**: We apply torch.compile and [onediff](https://github.com/siliconflow/onediff) nexfort backend to accelerate GPU kernels speed.
@@ -157,7 +157,7 @@ Furthermore, xDiT incorporates optimization techniques from [DiTFastAttn](https:
 
 <h2 id="QuickStart">🚀 QuickStart</h2>
 
-### 1. Install from pip (current [version](./xfuser/__version__.py))
+### 1. Install from pip
 
 ```
 pip install xfuser
@@ -189,7 +189,10 @@ You can easily modify the model type, model directory, and parallel options in t
 bash examples/run.sh
 ```
 
-To inspect the available options for the PixArt-alpha example, use the following command:
+---
+
+<details>
+<summary>Click to see available options for the PixArt-alpha example</summary>
 
 ```bash
 python ./examples/pixartalpha_example.py -h
@@ -249,10 +252,14 @@ Input Options:
                         Number of inference steps.
 ```
 
+</details>
+
+---
+
 Hybriding multiple parallelism techniques togather is essential for efficiently scaling. 
 It's important that the product of all parallel degrees matches the number of devices. 
-For instance, you can combine CFG, PipeFusion, and sequence parallelism with the command below to generate an image of a cute dog through hybrid parallelism. 
-Here ulysses_degree * pipefusion_parallel_degree * cfg_degree(use_split_batch) == number of devices == 8.
+Note use_cfg_parallel means cfg_parallel=2. For instance, you can combine CFG, PipeFusion, and sequence parallelism with the command below to generate an image of a cute dog through hybrid parallelism. 
+Here ulysses_degree * pipefusion_parallel_degree * cfg_degree(use_cfg_parallel) == number of devices == 8.
 
 
 ```bash
@@ -376,7 +383,7 @@ For usage instructions, refer to the [example/run.sh](./examples/run.sh). Simply
 
 xDiT also provides DiTFastAttn for single GPU acceleration. It can reduce computation cost of attention layer by leveraging redundancies between different steps of the Diffusion Model.
 
-[DiTFastAttn](./docs/methods/dittfastattn.md)
+[DiTFastAttn: Attention Compression for Diffusion Transformer Models](./docs/methods/ditfastattn.md)
 
 <h2 id="dev-guide">📚  Develop Guide</h2>
 
diff --git a/docs/methods/ditfastattn.md b/docs/methods/ditfastattn.md
index 6cc2df4..442e50e 100644
--- a/docs/methods/ditfastattn.md
+++ b/docs/methods/ditfastattn.md
@@ -1,4 +1,4 @@
-### DiTFastAttn
+### DiTFastAttn: Attention Compression for Diffusion Transformer Models
 
 [DiTFastAttn](https://github.com/thu-nics/DiTFastAttn) is an acceleration solution for single-GPU DiTs inference, utilizing Input Temporal Reduction to reduce computational complexity through the following three methods:
 
diff --git a/docs/methods/ditfastattn_zh.md b/docs/methods/ditfastattn_zh.md
index 469cb08..78d2a43 100644
--- a/docs/methods/ditfastattn_zh.md
+++ b/docs/methods/ditfastattn_zh.md
@@ -1,4 +1,4 @@
-### DiTFastAttn
+### DiTFastAttn: Attention Compression for Diffusion Transformer Models
 
 [DiTFastAttn](https://github.com/thu-nics/DiTFastAttn)是一种针对单卡DiTs推理的加速方案，利用Input Temperal Reduction通过如下三种方式来减少计算量：