FLUX.1 text-to-image pipeline for Gaudi

* Enabled and tested FLUX pipeline on Gaudi for FLUX.1 class of models * Enabled HPU graphs mode * Enabled batching in inference * Added support for quantization (fp8 and hybrid) * Incorporated Gaudi profiler and HPU Synchronization for performance analysis * Boosted performance with Fused SDPA * Added Fused RoPE * Documented FLUX.1 samples * Upgraded and pinned diffusers in Optimum-Habana to official release 0.31.0 * Resolved issues in other pipelines due to diffusers upgrade * Added CI tests (2 unit tests, 1 slow test for perf and quality) Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Baochen Yang <baochen.yang@intel.com> Co-authored-by: Huijuan Zhou <huijuan.zhou@intel.com> Co-authored-by: Sergey Plotnikov <sergey.plotnikov@intel.com> Co-authored-by: Deepak Narayana <deepak.narayana@intel.com>
huggingface · Nov 25, 2024 · f6abe4b · f6abe4b
1 parent 82a1c96
commit f6abe4b
Show file tree

Hide file tree

Showing 16 changed files with 1,204 additions and 136 deletions.
diff --git a/examples/stable-diffusion/README.md b/examples/stable-diffusion/README.md
@@ -28,12 +28,12 @@ First, you should install the requirements:
 pip install -r requirements.txt
 ```
 
-
 ## Text-to-image Generation
 
 ### Single Prompt
 
 Here is how to generate images with one prompt:
+
 ```bash
 python text_to_image_generation.py \
     --model_name_or_path CompVis/stable-diffusion-v1-4 \
@@ -51,10 +51,10 @@ python text_to_image_generation.py \
 > The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
 > You can enable this mode with `--use_hpu_graphs`.
 
-
 ### Multiple Prompts
 
 Here is how to generate images with several prompts:
+
 ```bash
 python text_to_image_generation.py \
     --model_name_or_path CompVis/stable-diffusion-v1-4 \
@@ -69,7 +69,9 @@ python text_to_image_generation.py \
 ```
 
 ### Distributed inference with multiple HPUs
+
 Here is how to generate images with two prompts on two HPUs:
+
 ```bash
 python ../gaudi_spawn.py \
     --world_size 2 text_to_image_generation.py \
@@ -109,10 +111,10 @@ python text_to_image_generation.py \
 ```
 
 > There are two different checkpoints for Stable Diffusion 2:
+>
 > - use [stabilityai/stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1) for generating 768x768 images
 > - use [stabilityai/stable-diffusion-2-1-base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) for generating 512x512 images
 
-
 ### Latent Diffusion Model for 3D (LDM3D)
 
 [LDM3D](https://arxiv.org/abs/2305.10853) generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts.
@@ -135,7 +137,9 @@ python text_to_image_generation.py \
     --ldm3d \
     --bf16
 ```
+
 Here is how to generate images and depth maps with two prompts on two HPUs:
+
 ```bash
 python ../gaudi_spawn.py \
     --world_size 2 text_to_image_generation.py \
@@ -154,6 +158,7 @@ python ../gaudi_spawn.py \
 ```
 
 > There are three different checkpoints for LDM3D:
+>
 > - use [original checkpoint](https://huggingface.co/Intel/ldm3d) to generate outputs from the paper
 > - use [the latest checkpoint](https://huggingface.co/Intel/ldm3d-4c) for generating improved results
 > - use [the pano checkpoint](https://huggingface.co/Intel/ldm3d-pano) to generate panoramic view
@@ -163,6 +168,7 @@ python ../gaudi_spawn.py \
 Stable Diffusion XL was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://arxiv.org/pdf/2307.01952.pdf) by the Stability AI team.
 
 Here is how to generate SDXL images with a single prompt:
+
 ```bash
 python text_to_image_generation.py \
     --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
@@ -182,6 +188,7 @@ python text_to_image_generation.py \
 > You can enable this mode with `--use_hpu_graphs`.
 
 Here is how to generate SDXL images with several prompts:
+
 ```bash
 python text_to_image_generation.py \
     --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
@@ -199,6 +206,7 @@ python text_to_image_generation.py \
 SDXL combines a second text encoder (OpenCLIP ViT-bigG/14) with the original text encoder to significantly
 increase the number of parameters. Here is how to generate images with several prompts for both `prompt`
 and `prompt_2` (2nd text encoder), as well as their negative prompts:
+
 ```bash
 python text_to_image_generation.py \
     --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
@@ -217,6 +225,7 @@ python text_to_image_generation.py \
 ```
 
 Here is how to generate SDXL images with two prompts on two HPUs:
+
 ```bash
 python ../gaudi_spawn.py \
     --world_size 2 text_to_image_generation.py \
@@ -235,14 +244,17 @@ python ../gaudi_spawn.py \
     --bf16 \
     --distributed
 ```
+
 > HPU graphs are recommended when generating images by batches to get the fastest possible generations.
 > The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
 > You can enable this mode with `--use_hpu_graphs`.
 
 ### SDXL-Turbo
+
 SDXL-Turbo is a distilled version of SDXL 1.0, trained for real-time synthesis.
 
 Here is how to generate images with multiple prompts:
+
 ```bash
 python text_to_image_generation.py \
     --model_name_or_path stabilityai/sdxl-turbo \
@@ -275,11 +287,13 @@ Before running SD3 pipeline, you need to:
 
 1. Agree to the Terms and Conditions for using SD3 model at [HuggingFace model page](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
 2. Authenticate with HuggingFace using your HF Token. For authentication, run:
+
 ```bash
 huggingface-cli login
 ```
 
 Here is how to generate SD3 images with a single prompt:
+
 ```bash
 PT_HPU_MAX_COMPOUND_OP_SIZE=1 \
 python text_to_image_generation.py \
@@ -299,12 +313,100 @@ python text_to_image_generation.py \
 > For improved performance of the SD3 pipeline on Gaudi, it is recommended to configure the environment
 > by setting PT_HPU_MAX_COMPOUND_OP_SIZE to 1.
 
+### FLUX.1
+
+FLUX.1 was introduced by Black Forest Labs [here](https://blackforestlabs.ai/announcing-black-forest-labs/).
+
+Here is how to run FLUX.1-schnell model (fast version of FLUX.1):
+
+```bash
+python text_to_image_generation.py \
+    --model_name_or_path black-forest-labs/FLUX.1-schnell \
+    --prompts "A cat holding a sign that says hello world" \
+    --num_images_per_prompt 10 \
+    --batch_size 1 \
+    --num_inference_steps 4 \
+    --image_save_dir /tmp/flux_1_images \
+    --scheduler flow_match_euler_discrete\
+    --use_habana \
+    --use_hpu_graphs \
+    --gaudi_config Habana/stable-diffusion \
+    --bf16
+```
+
+Before running FLUX.1-dev model, you need to:
+
+1. Agree to the Terms and Conditions for using FLUX.1-dev model at [HuggingFace model page](https://huggingface.co/black-forest-labs/FLUX.1-dev)
+2. Authenticate with HuggingFace using your HF Token. For authentication, run:
+
+```bash
+huggingface-cli login
+```
+
+Here is how to run FLUX.1-dev model:
+
+```bash
+python text_to_image_generation.py \
+    --model_name_or_path black-forest-labs/FLUX.1-dev \
+    --prompts "A cat holding a sign that says hello world" \
+    --num_images_per_prompt 10 \
+    --batch_size 1 \
+    --num_inference_steps 30 \
+    --image_save_dir /tmp/flux_1_images \
+    --scheduler flow_match_euler_discrete\
+    --use_habana \
+    --use_hpu_graphs \
+    --gaudi_config Habana/stable-diffusion \
+    --bf16
+```
+
+This model can also be quantized with some ops running in FP8 precision.
+
+Before quantization, run stats collection using measure mode:
+
+```bash
+QUANT_CONFIG=quantization/flux/measure_config.json \
+python text_to_image_generation.py \
+    --model_name_or_path black-forest-labs/FLUX.1-dev \
+    --prompts "A cat holding a sign that says hello world" \
+    --num_images_per_prompt 10 \
+    --batch_size 1 \
+    --num_inference_steps 30 \
+    --image_save_dir /tmp/flux_1_images \
+    --scheduler flow_match_euler_discrete\
+    --use_habana \
+    --use_hpu_graphs \
+    --gaudi_config Habana/stable-diffusion \
+    --bf16 \
+    --quant_mode measure
+```
+
+After stats collection, here is how to run FLUX.1-dev in quantization mode:
+
+```bash
+QUANT_CONFIG=quantization/flux/quantize_config.json \
+python text_to_image_generation.py \
+    --model_name_or_path black-forest-labs/FLUX.1-dev \
+    --prompts "A cat holding a sign that says hello world" \
+    --num_images_per_prompt 10 \
+    --batch_size 1 \
+    --num_inference_steps 30 \
+    --image_save_dir /tmp/flux_1_images \
+    --scheduler flow_match_euler_discrete\
+    --use_habana \
+    --use_hpu_graphs \
+    --gaudi_config Habana/stable-diffusion \
+    --bf16 \
+    --quant_mode quantize
+```
+
 ## ControlNet
 
-ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models ](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala.
+ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala.
 It is a type of model for controlling StableDiffusion by conditioning the model with an additional input image.
 
 Here is how to generate images conditioned by canny edge model:
+
 ```bash
 python text_to_image_generation.py \
     --model_name_or_path CompVis/stable-diffusion-v1-4 \
@@ -321,6 +423,7 @@ python text_to_image_generation.py \
 ```
 
 Here is how to generate images conditioned by canny edge model and with multiple prompts:
+
 ```bash
 python text_to_image_generation.py \
     --model_name_or_path CompVis/stable-diffusion-v1-4 \
@@ -337,6 +440,7 @@ python text_to_image_generation.py \
 ```
 
 Here is how to generate images conditioned by canny edge model and with two prompts on two HPUs:
+
 ```bash
 python ../gaudi_spawn.py \
     --world_size 2 text_to_image_generation.py \
@@ -355,6 +459,7 @@ python ../gaudi_spawn.py \
 ```
 
 Here is how to generate images conditioned by open pose model:
+
 ```bash
 python text_to_image_generation.py \
     --model_name_or_path CompVis/stable-diffusion-v1-4 \
@@ -372,6 +477,7 @@ python text_to_image_generation.py \
 ```
 
 Here is how to generate images with conditioned by canny edge model using Stable Diffusion 2
+
 ```bash
 python text_to_image_generation.py \
     --model_name_or_path stabilityai/stable-diffusion-2-1 \
@@ -395,6 +501,7 @@ Inpainting replaces or edits specific areas of an image. For more details,
 please refer to [Hugging Face Diffusers doc](https://huggingface.co/docs/diffusers/en/using-diffusers/inpaint).
 
 ### Stable Diffusion Inpainting
+
 ```bash
 python text_to_image_generation.py \
     --model_name_or_path  stabilityai/stable-diffusion-2-inpainting \
@@ -412,6 +519,7 @@ python text_to_image_generation.py \
 ```
 
 ### Stable Diffusion XL Inpainting
+
 ```bash
 python text_to_image_generation.py \
     --model_name_or_path  diffusers/stable-diffusion-xl-1.0-inpainting-0.1\
@@ -457,10 +565,10 @@ python image_to_image_generation.py \
 > The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
 > You can enable this mode with `--use_hpu_graphs`.
 
-
 ### Multiple Prompts
 
 Here is how to generate images with several prompts and one image.
+
 ```bash
 python image_to_image_generation.py \
     --model_name_or_path "timbrooks/instruct-pix2pix" \
@@ -482,10 +590,10 @@ python image_to_image_generation.py \
 > The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
 > You can enable this mode with `--use_hpu_graphs`.
 
-
 ### Stable Diffusion XL Refiner
 
 Here is how to generate SDXL images with a single prompt and one image:
+
 ```bash
 python image_to_image_generation.py \
     --model_name_or_path "stabilityai/stable-diffusion-xl-refiner-1.0" \
@@ -505,6 +613,7 @@ python image_to_image_generation.py \
 ### Stable Diffusion Image Variations
 
 Here is how to generate images with one image, it does not accept prompt input
+
 ```bash
 python image_to_image_generation.py \
     --model_name_or_path "lambdalabs/sd-image-variations-diffusers" \
@@ -625,6 +734,7 @@ Script `image_to_video_generation.py` showcases how to perform image-to-video ge
 ### Single Image Prompt
 
 Here is how to generate video with one image prompt:
+
 ```bash
 PT_HPU_MAX_COMPOUND_OP_SIZE=1 \
 python image_to_video_generation.py \
@@ -645,6 +755,7 @@ python image_to_video_generation.py \
 ### Multiple Image Prompts
 
 Here is how to generate videos with several image prompts:
+
 ```bash
 PT_HPU_MAX_COMPOUND_OP_SIZE=1 \
 python image_to_video_generation.py \

diff --git a/examples/stable-diffusion/quantization/flux/measure_config.json b/examples/stable-diffusion/quantization/flux/measure_config.json
@@ -0,0 +1,5 @@
+{
+    "method": "HOOKS",
+    "mode": "MEASURE",
+    "dump_stats_path": "quantization/flux/measure_all/fp8"
+}
diff --git a/examples/stable-diffusion/quantization/flux/quantize_config.json b/examples/stable-diffusion/quantization/flux/quantize_config.json
@@ -0,0 +1,6 @@
+{
+    "method": "HOOKS",
+    "mode": "QUANTIZE",
+    "scale_method": "maxabs_hw_opt_weight",
+    "dump_stats_path": "quantization/flux/measure_all/fp8"
+}
diff --git a/examples/stable-diffusion/requirements.txt b/examples/stable-diffusion/requirements.txt
@@ -1,2 +1,3 @@
 opencv-python
-compel
+compel
+sentencepiece