Skip to content

Commit

Permalink
FLUX.1 text-to-image pipeline for Gaudi
Browse files Browse the repository at this point in the history
* Enabled and tested FLUX pipeline on Gaudi for FLUX.1 class of models
* Enabled HPU graphs mode
* Enabled batching in inference
* Added support for quantization (fp8 and hybrid)
* Incorporated Gaudi profiler and HPU Synchronization for performance analysis
* Boosted performance with Fused SDPA
* Added Fused RoPE
* Documented FLUX.1 samples
* Upgraded and pinned diffusers in Optimum-Habana to official release 0.31.0
* Resolved issues in other pipelines due to diffusers upgrade
* Added CI tests (2 unit tests, 1 slow test for perf and quality)

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Baochen Yang <baochen.yang@intel.com>
Co-authored-by: Huijuan Zhou <huijuan.zhou@intel.com>
Co-authored-by: Sergey Plotnikov <sergey.plotnikov@intel.com>
Co-authored-by: Deepak Narayana <deepak.narayana@intel.com>
  • Loading branch information
5 people committed Nov 25, 2024
1 parent 82a1c96 commit f6abe4b
Show file tree
Hide file tree
Showing 16 changed files with 1,204 additions and 136 deletions.
123 changes: 117 additions & 6 deletions examples/stable-diffusion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@ First, you should install the requirements:
pip install -r requirements.txt
```


## Text-to-image Generation

### Single Prompt

Here is how to generate images with one prompt:

```bash
python text_to_image_generation.py \
--model_name_or_path CompVis/stable-diffusion-v1-4 \
Expand All @@ -51,10 +51,10 @@ python text_to_image_generation.py \
> The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
> You can enable this mode with `--use_hpu_graphs`.

### Multiple Prompts

Here is how to generate images with several prompts:

```bash
python text_to_image_generation.py \
--model_name_or_path CompVis/stable-diffusion-v1-4 \
Expand All @@ -69,7 +69,9 @@ python text_to_image_generation.py \
```

### Distributed inference with multiple HPUs

Here is how to generate images with two prompts on two HPUs:

```bash
python ../gaudi_spawn.py \
--world_size 2 text_to_image_generation.py \
Expand Down Expand Up @@ -109,10 +111,10 @@ python text_to_image_generation.py \
```

> There are two different checkpoints for Stable Diffusion 2:
>
> - use [stabilityai/stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1) for generating 768x768 images
> - use [stabilityai/stable-diffusion-2-1-base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) for generating 512x512 images

### Latent Diffusion Model for 3D (LDM3D)

[LDM3D](https://arxiv.org/abs/2305.10853) generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts.
Expand All @@ -135,7 +137,9 @@ python text_to_image_generation.py \
--ldm3d \
--bf16
```

Here is how to generate images and depth maps with two prompts on two HPUs:

```bash
python ../gaudi_spawn.py \
--world_size 2 text_to_image_generation.py \
Expand All @@ -154,6 +158,7 @@ python ../gaudi_spawn.py \
```

> There are three different checkpoints for LDM3D:
>
> - use [original checkpoint](https://huggingface.co/Intel/ldm3d) to generate outputs from the paper
> - use [the latest checkpoint](https://huggingface.co/Intel/ldm3d-4c) for generating improved results
> - use [the pano checkpoint](https://huggingface.co/Intel/ldm3d-pano) to generate panoramic view
Expand All @@ -163,6 +168,7 @@ python ../gaudi_spawn.py \
Stable Diffusion XL was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://arxiv.org/pdf/2307.01952.pdf) by the Stability AI team.

Here is how to generate SDXL images with a single prompt:

```bash
python text_to_image_generation.py \
--model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
Expand All @@ -182,6 +188,7 @@ python text_to_image_generation.py \
> You can enable this mode with `--use_hpu_graphs`.
Here is how to generate SDXL images with several prompts:

```bash
python text_to_image_generation.py \
--model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
Expand All @@ -199,6 +206,7 @@ python text_to_image_generation.py \
SDXL combines a second text encoder (OpenCLIP ViT-bigG/14) with the original text encoder to significantly
increase the number of parameters. Here is how to generate images with several prompts for both `prompt`
and `prompt_2` (2nd text encoder), as well as their negative prompts:

```bash
python text_to_image_generation.py \
--model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
Expand All @@ -217,6 +225,7 @@ python text_to_image_generation.py \
```

Here is how to generate SDXL images with two prompts on two HPUs:

```bash
python ../gaudi_spawn.py \
--world_size 2 text_to_image_generation.py \
Expand All @@ -235,14 +244,17 @@ python ../gaudi_spawn.py \
--bf16 \
--distributed
```

> HPU graphs are recommended when generating images by batches to get the fastest possible generations.
> The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
> You can enable this mode with `--use_hpu_graphs`.
### SDXL-Turbo

SDXL-Turbo is a distilled version of SDXL 1.0, trained for real-time synthesis.

Here is how to generate images with multiple prompts:

```bash
python text_to_image_generation.py \
--model_name_or_path stabilityai/sdxl-turbo \
Expand Down Expand Up @@ -275,11 +287,13 @@ Before running SD3 pipeline, you need to:

1. Agree to the Terms and Conditions for using SD3 model at [HuggingFace model page](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
2. Authenticate with HuggingFace using your HF Token. For authentication, run:

```bash
huggingface-cli login
```

Here is how to generate SD3 images with a single prompt:

```bash
PT_HPU_MAX_COMPOUND_OP_SIZE=1 \
python text_to_image_generation.py \
Expand All @@ -299,12 +313,100 @@ python text_to_image_generation.py \
> For improved performance of the SD3 pipeline on Gaudi, it is recommended to configure the environment
> by setting PT_HPU_MAX_COMPOUND_OP_SIZE to 1.
### FLUX.1

FLUX.1 was introduced by Black Forest Labs [here](https://blackforestlabs.ai/announcing-black-forest-labs/).

Here is how to run FLUX.1-schnell model (fast version of FLUX.1):

```bash
python text_to_image_generation.py \
--model_name_or_path black-forest-labs/FLUX.1-schnell \
--prompts "A cat holding a sign that says hello world" \
--num_images_per_prompt 10 \
--batch_size 1 \
--num_inference_steps 4 \
--image_save_dir /tmp/flux_1_images \
--scheduler flow_match_euler_discrete\
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16
```

Before running FLUX.1-dev model, you need to:

1. Agree to the Terms and Conditions for using FLUX.1-dev model at [HuggingFace model page](https://huggingface.co/black-forest-labs/FLUX.1-dev)
2. Authenticate with HuggingFace using your HF Token. For authentication, run:

```bash
huggingface-cli login
```

Here is how to run FLUX.1-dev model:

```bash
python text_to_image_generation.py \
--model_name_or_path black-forest-labs/FLUX.1-dev \
--prompts "A cat holding a sign that says hello world" \
--num_images_per_prompt 10 \
--batch_size 1 \
--num_inference_steps 30 \
--image_save_dir /tmp/flux_1_images \
--scheduler flow_match_euler_discrete\
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16
```

This model can also be quantized with some ops running in FP8 precision.

Before quantization, run stats collection using measure mode:

```bash
QUANT_CONFIG=quantization/flux/measure_config.json \
python text_to_image_generation.py \
--model_name_or_path black-forest-labs/FLUX.1-dev \
--prompts "A cat holding a sign that says hello world" \
--num_images_per_prompt 10 \
--batch_size 1 \
--num_inference_steps 30 \
--image_save_dir /tmp/flux_1_images \
--scheduler flow_match_euler_discrete\
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16 \
--quant_mode measure
```

After stats collection, here is how to run FLUX.1-dev in quantization mode:

```bash
QUANT_CONFIG=quantization/flux/quantize_config.json \
python text_to_image_generation.py \
--model_name_or_path black-forest-labs/FLUX.1-dev \
--prompts "A cat holding a sign that says hello world" \
--num_images_per_prompt 10 \
--batch_size 1 \
--num_inference_steps 30 \
--image_save_dir /tmp/flux_1_images \
--scheduler flow_match_euler_discrete\
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--bf16 \
--quant_mode quantize
```

## ControlNet

ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models ](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala.
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala.
It is a type of model for controlling StableDiffusion by conditioning the model with an additional input image.

Here is how to generate images conditioned by canny edge model:

```bash
python text_to_image_generation.py \
--model_name_or_path CompVis/stable-diffusion-v1-4 \
Expand All @@ -321,6 +423,7 @@ python text_to_image_generation.py \
```

Here is how to generate images conditioned by canny edge model and with multiple prompts:

```bash
python text_to_image_generation.py \
--model_name_or_path CompVis/stable-diffusion-v1-4 \
Expand All @@ -337,6 +440,7 @@ python text_to_image_generation.py \
```

Here is how to generate images conditioned by canny edge model and with two prompts on two HPUs:

```bash
python ../gaudi_spawn.py \
--world_size 2 text_to_image_generation.py \
Expand All @@ -355,6 +459,7 @@ python ../gaudi_spawn.py \
```

Here is how to generate images conditioned by open pose model:

```bash
python text_to_image_generation.py \
--model_name_or_path CompVis/stable-diffusion-v1-4 \
Expand All @@ -372,6 +477,7 @@ python text_to_image_generation.py \
```

Here is how to generate images with conditioned by canny edge model using Stable Diffusion 2

```bash
python text_to_image_generation.py \
--model_name_or_path stabilityai/stable-diffusion-2-1 \
Expand All @@ -395,6 +501,7 @@ Inpainting replaces or edits specific areas of an image. For more details,
please refer to [Hugging Face Diffusers doc](https://huggingface.co/docs/diffusers/en/using-diffusers/inpaint).

### Stable Diffusion Inpainting

```bash
python text_to_image_generation.py \
--model_name_or_path stabilityai/stable-diffusion-2-inpainting \
Expand All @@ -412,6 +519,7 @@ python text_to_image_generation.py \
```

### Stable Diffusion XL Inpainting

```bash
python text_to_image_generation.py \
--model_name_or_path diffusers/stable-diffusion-xl-1.0-inpainting-0.1\
Expand Down Expand Up @@ -457,10 +565,10 @@ python image_to_image_generation.py \
> The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
> You can enable this mode with `--use_hpu_graphs`.

### Multiple Prompts

Here is how to generate images with several prompts and one image.

```bash
python image_to_image_generation.py \
--model_name_or_path "timbrooks/instruct-pix2pix" \
Expand All @@ -482,10 +590,10 @@ python image_to_image_generation.py \
> The first batch of images entails a performance penalty. All subsequent batches will be generated much faster.
> You can enable this mode with `--use_hpu_graphs`.

### Stable Diffusion XL Refiner

Here is how to generate SDXL images with a single prompt and one image:

```bash
python image_to_image_generation.py \
--model_name_or_path "stabilityai/stable-diffusion-xl-refiner-1.0" \
Expand All @@ -505,6 +613,7 @@ python image_to_image_generation.py \
### Stable Diffusion Image Variations

Here is how to generate images with one image, it does not accept prompt input

```bash
python image_to_image_generation.py \
--model_name_or_path "lambdalabs/sd-image-variations-diffusers" \
Expand Down Expand Up @@ -625,6 +734,7 @@ Script `image_to_video_generation.py` showcases how to perform image-to-video ge
### Single Image Prompt

Here is how to generate video with one image prompt:

```bash
PT_HPU_MAX_COMPOUND_OP_SIZE=1 \
python image_to_video_generation.py \
Expand All @@ -645,6 +755,7 @@ python image_to_video_generation.py \
### Multiple Image Prompts

Here is how to generate videos with several image prompts:

```bash
PT_HPU_MAX_COMPOUND_OP_SIZE=1 \
python image_to_video_generation.py \
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"method": "HOOKS",
"mode": "MEASURE",
"dump_stats_path": "quantization/flux/measure_all/fp8"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"method": "HOOKS",
"mode": "QUANTIZE",
"scale_method": "maxabs_hw_opt_weight",
"dump_stats_path": "quantization/flux/measure_all/fp8"
}
3 changes: 2 additions & 1 deletion examples/stable-diffusion/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
opencv-python
compel
compel
sentencepiece
Loading

0 comments on commit f6abe4b

Please sign in to comment.