[Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU #6683

ustcuna · 2024-01-23T11:48:04Z

Hi, this pipeline aims to speed up the inference of Stable Diffusion XL on Intel Xeon CPUs on Linux. It is much alike the previous one merged #3105 for Stable Diffusion.
By using this optimized pipeline, we can get about 1.4-2 times performance acceleration with BFloat16 on fourth generation of Intel Xeon CPUs, code-named Sapphire Rapids.
It is also recommended to run on Pytorch/IPEX v2.0 and above to get the best performance boost.
The main profits are illustrated as below, which are the same with our previous PR:
-For Pytorch/IPEX v2.0 and above, it benefits from MHA optimization with Flash Attention and TorchScript mode optimization in IPEX.
-For Pytorch/IPEX v1.13, it benefits from TorchScript mode optimization in IPEX.
Below are the tables which show the test results for SDXL-Turbo (a model that is a distilled version of SDXL 1.0, trained for real-time synthesis) with 1 step/4 steps on Intel® Xeon® Platinum 8468V Processor (48cores/socket, 1socket):

Based on the code and test results, could u pls help to review? Thanks!

linlifan · 2024-01-23T15:45:06Z

examples/community/pipeline_stable_diffusion_xl_ipex.py

+            )
+
+        if latents is None:
+            latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)


better to change dtype = torch.float32

ustcuna · 2024-01-31T05:43:14Z

Hi @patrickvonplaten and @pcuenca, could you pls help to review this PR? since this one is much alike the previous one we proposed. [Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU [https://github.com//pull/3105]. Thanks!

HuggingFaceDocBuilderDev · 2024-02-09T15:53:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

patrickvonplaten · 2024-02-09T16:33:07Z

Can we try to fix the quality test? I think running make style should be enough here :-)

ustcuna · 2024-02-18T06:35:44Z

Can we try to fix the quality test? I think running make style should be enough here :-)

Hi @patrickvonplaten , thanks for the approval and sorry for the late response since we just come back from Lunar New Year sabbatical. I have fixed the code format with make style and re-committed the code. Seems that all checks have passed. Thanks!

pcuenca

Looks good to me, made a couple of suggestions about the docs.

pcuenca · 2024-02-19T07:14:27Z

examples/community/README.md

+To use this pipeline, you need to:
+1. Install [IPEX](https://github.com/intel/intel-extension-for-pytorch)
+
+**Note:** For each PyTorch release, there is a corresponding release of the IPEX. Here is the mapping relationship. It is recommended to install Pytorch/IPEX2.0 to get the best performance.


Suggested change

**Note:** For each PyTorch release, there is a corresponding release of the IPEX. Here is the mapping relationship. It is recommended to install Pytorch/IPEX2.0 to get the best performance.

**Note:** For each PyTorch release, there is a corresponding release of IPEX. Here is the mapping relationship. It is recommended to install Pytorch/IPEX2.0 to get the best performance.

pcuenca · 2024-02-19T07:15:01Z

examples/community/README.md

+
+2. After pipeline initialization, `prepare_for_ipex()` should be called to enable IPEX accelaration. Supported inference datatypes are Float32 and BFloat16.
+
+**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.


Suggested change

**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.

**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.

pcuenca · 2024-02-19T07:16:50Z

examples/community/README.md

+##############     bf16 inference performance    ###############
+
+# 1. IPEX Pipeline initialization
+pipe = StableDiffusionXLPipelineIpex.from_pretrained(model_id, use_auth_token=True, low_cpu_mem_usage=True, use_safetensors=True)


Suggested change

pipe = StableDiffusionXLPipelineIpex.from_pretrained(model_id, use_auth_token=True, low_cpu_mem_usage=True, use_safetensors=True)

pipe = StableDiffusionXLPipelineIpex.from_pretrained(model_id, low_cpu_mem_usage=True, use_safetensors=True)

Do we need use_auth_token?

pcuenca · 2024-02-19T07:18:35Z

examples/community/README.md

+    image = pipe(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=guidance_scale).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()'
+```
+
+The following code compares the performance of the original stable diffusion xl pipeline with the ipex-optimized pipeline.


Can we get a brief qualitative comment about the kind of speedup we should expect?

Hi @pcuenca, thanks so much for the detailed suggestions. I have modified all parts as you suggested. Except for some wording part, most important modifications are:

data_type change to torch.float32/torch.bfloat16 accordingly

use_auth_token is not needed in this pipeline, so I simply remove all of it

give an estimated performance boost (1.4-2x) in the comment, which is tested using BFloat16 on the fourth generation of Intel Xeon CPU code-named SPR

remove the long-line comments and make it appear only once at the beginning
Pls help to review the updated code, thanks!

pcuenca · 2024-02-19T07:19:58Z

examples/community/README.md

+
+2. After pipeline initialization, `prepare_for_ipex()` should be called to enable IPEX accelaration. Supported inference datatypes are Float32 and BFloat16.
+
+**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.


Suggested change

**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.

**Note:** The values of `height` and `width` used during preparation with `prepare_for_ipex()` should be the same when running inference with the prepared pipeline.

pcuenca · 2024-02-19T07:22:13Z

examples/community/README.md

+pipe.prepare_for_ipex(data_type, prompt, height=512, width=512) #value of image height/width should be consistent with the pipeline inference
+# For BFloat16
+pipe.prepare_for_ipex(data_type, prompt, height=512, width=512) #value of image height/width should be consistent with the pipeline inference


The two lines are identical. Maybe:

Spell out the data types torch.bfloat16, torch.float32 instead of using the same variable name.

Remove the long comments in the invocation lines, we could put it once at the beginning of the snippet.

pcuenca · 2024-02-19T07:22:53Z

examples/community/README.md

+image = pipe(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=guidance_scale).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()'
+# For BFloat16
+with torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
+    image = pipe(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=guidance_scale).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()'


Same here, the long lines with the comments can be distracting

pcuenca · 2024-02-19T07:23:36Z

examples/community/pipeline_stable_diffusion_xl_ipex.py

+        >>> # For Float32
+        >>> pipe.prepare_for_ipex(data_type, prompt, height=512, width=512) #value of image height/width should be consistent with the pipeline inference
+        >>> # For BFloat16
+        >>> pipe.prepare_for_ipex(data_type, prompt, height=512, width=512) #value of image height/width should be consistent with the pipeline inference


Same comments as in the docs

add stable_diffusion_xl_ipex community pipeline

fd2305c

linlifan reviewed Jan 23, 2024

View reviewed changes

ustcuna requested a review from linlifan January 31, 2024 05:37

patrickvonplaten approved these changes Feb 9, 2024

View reviewed changes

Merge branch 'main' into ipex_pipeline_for_sdxl

d3fee06

make style for code quality check

bc50967

pcuenca reviewed Feb 19, 2024

View reviewed changes

update docs as suggested

dbe9a77

pcuenca merged commit 12004bf into huggingface:main Feb 19, 2024
13 checks passed

ustcuna mentioned this pull request Jun 20, 2024

[Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU #8643

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU #6683

[Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU #6683

ustcuna commented Jan 23, 2024

linlifan Jan 23, 2024

ustcuna commented Jan 31, 2024

HuggingFaceDocBuilderDev commented Feb 9, 2024

patrickvonplaten commented Feb 9, 2024

ustcuna commented Feb 18, 2024 •

edited

Loading

pcuenca left a comment

pcuenca Feb 19, 2024

pcuenca Feb 19, 2024

pcuenca Feb 19, 2024

pcuenca Feb 19, 2024

ustcuna Feb 19, 2024

pcuenca Feb 19, 2024

pcuenca Feb 19, 2024

pcuenca Feb 19, 2024

pcuenca Feb 19, 2024

	Note: For each PyTorch release, there is a corresponding release of the IPEX. Here is the mapping relationship. It is recommended to install Pytorch/IPEX2.0 to get the best performance.
	Note: For each PyTorch release, there is a corresponding release of IPEX. Here is the mapping relationship. It is recommended to install Pytorch/IPEX2.0 to get the best performance.


		2. After pipeline initialization, `prepare_for_ipex()` should be called to enable IPEX accelaration. Supported inference datatypes are Float32 and BFloat16.

		Note: The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.

	pipe = StableDiffusionXLPipelineIpex.from_pretrained(model_id, use_auth_token=True, low_cpu_mem_usage=True, use_safetensors=True)
	pipe = StableDiffusionXLPipelineIpex.from_pretrained(model_id, low_cpu_mem_usage=True, use_safetensors=True)

	Note: The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.
	Note: The values of `height` and `width` used during preparation with `prepare_for_ipex()` should be the same when running inference with the prepared pipeline.

[Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU #6683

[Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU #6683

Conversation

ustcuna commented Jan 23, 2024

Choose a reason for hiding this comment

ustcuna commented Jan 31, 2024

HuggingFaceDocBuilderDev commented Feb 9, 2024

patrickvonplaten commented Feb 9, 2024

ustcuna commented Feb 18, 2024 • edited Loading

pcuenca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ustcuna commented Feb 18, 2024 •

edited

Loading