Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU #6683

Merged
merged 4 commits into from
Feb 19, 2024

Conversation

ustcuna
Copy link
Contributor

@ustcuna ustcuna commented Jan 23, 2024

Hi, this pipeline aims to speed up the inference of Stable Diffusion XL on Intel Xeon CPUs on Linux. It is much alike the previous one merged #3105 for Stable Diffusion.
By using this optimized pipeline, we can get about 1.4-2 times performance acceleration with BFloat16 on fourth generation of Intel Xeon CPUs, code-named Sapphire Rapids.
It is also recommended to run on Pytorch/IPEX v2.0 and above to get the best performance boost.
The main profits are illustrated as below, which are the same with our previous PR:
-For Pytorch/IPEX v2.0 and above, it benefits from MHA optimization with Flash Attention and TorchScript mode optimization in IPEX.
-For Pytorch/IPEX v1.13, it benefits from TorchScript mode optimization in IPEX.
Below are the tables which show the test results for SDXL-Turbo (a model that is a distilled version of SDXL 1.0, trained for real-time synthesis) with 1 step/4 steps on Intel® Xeon® Platinum 8468V Processor (48cores/socket, 1socket):
image
image
Based on the code and test results, could u pls help to review? Thanks!

)

if latents is None:
latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to change dtype = torch.float32

@ustcuna ustcuna requested a review from linlifan January 31, 2024 05:37
@ustcuna
Copy link
Contributor Author

ustcuna commented Jan 31, 2024

Hi @patrickvonplaten and @pcuenca, could you pls help to review this PR? since this one is much alike the previous one we proposed. [Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU [https://github.com//pull/3105]. Thanks!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@patrickvonplaten
Copy link
Contributor

Can we try to fix the quality test? I think running make style should be enough here :-)

@ustcuna
Copy link
Contributor Author

ustcuna commented Feb 18, 2024

Can we try to fix the quality test? I think running make style should be enough here :-)

Hi @patrickvonplaten , thanks for the approval and sorry for the late response since we just come back from Lunar New Year sabbatical. I have fixed the code format with make style and re-committed the code. Seems that all checks have passed. Thanks!

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, made a couple of suggestions about the docs.

To use this pipeline, you need to:
1. Install [IPEX](https://github.com/intel/intel-extension-for-pytorch)

**Note:** For each PyTorch release, there is a corresponding release of the IPEX. Here is the mapping relationship. It is recommended to install Pytorch/IPEX2.0 to get the best performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Note:** For each PyTorch release, there is a corresponding release of the IPEX. Here is the mapping relationship. It is recommended to install Pytorch/IPEX2.0 to get the best performance.
**Note:** For each PyTorch release, there is a corresponding release of IPEX. Here is the mapping relationship. It is recommended to install Pytorch/IPEX2.0 to get the best performance.


2. After pipeline initialization, `prepare_for_ipex()` should be called to enable IPEX accelaration. Supported inference datatypes are Float32 and BFloat16.

**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.
**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.

############## bf16 inference performance ###############

# 1. IPEX Pipeline initialization
pipe = StableDiffusionXLPipelineIpex.from_pretrained(model_id, use_auth_token=True, low_cpu_mem_usage=True, use_safetensors=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pipe = StableDiffusionXLPipelineIpex.from_pretrained(model_id, use_auth_token=True, low_cpu_mem_usage=True, use_safetensors=True)
pipe = StableDiffusionXLPipelineIpex.from_pretrained(model_id, low_cpu_mem_usage=True, use_safetensors=True)

Do we need use_auth_token?

image = pipe(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=guidance_scale).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()'
```

The following code compares the performance of the original stable diffusion xl pipeline with the ipex-optimized pipeline.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a brief qualitative comment about the kind of speedup we should expect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pcuenca, thanks so much for the detailed suggestions. I have modified all parts as you suggested. Except for some wording part, most important modifications are:

  1. data_type change to torch.float32/torch.bfloat16 accordingly
  2. use_auth_token is not needed in this pipeline, so I simply remove all of it
  3. give an estimated performance boost (1.4-2x) in the comment, which is tested using BFloat16 on the fourth generation of Intel Xeon CPU code-named SPR
  4. remove the long-line comments and make it appear only once at the beginning
    Pls help to review the updated code, thanks!


2. After pipeline initialization, `prepare_for_ipex()` should be called to enable IPEX accelaration. Supported inference datatypes are Float32 and BFloat16.

**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.
**Note:** The values of `height` and `width` used during preparation with `prepare_for_ipex()` should be the same when running inference with the prepared pipeline.

Comment on lines 1740 to 1742
pipe.prepare_for_ipex(data_type, prompt, height=512, width=512) #value of image height/width should be consistent with the pipeline inference
# For BFloat16
pipe.prepare_for_ipex(data_type, prompt, height=512, width=512) #value of image height/width should be consistent with the pipeline inference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two lines are identical. Maybe:

  • Spell out the data types torch.bfloat16, torch.float32 instead of using the same variable name.
  • Remove the long comments in the invocation lines, we could put it once at the beginning of the snippet.

Comment on lines 1748 to 1751
image = pipe(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=guidance_scale).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()'
# For BFloat16
with torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
image = pipe(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=guidance_scale).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, the long lines with the comments can be distracting

Comment on lines 88 to 91
>>> # For Float32
>>> pipe.prepare_for_ipex(data_type, prompt, height=512, width=512) #value of image height/width should be consistent with the pipeline inference
>>> # For BFloat16
>>> pipe.prepare_for_ipex(data_type, prompt, height=512, width=512) #value of image height/width should be consistent with the pipeline inference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as in the docs

@pcuenca pcuenca merged commit 12004bf into huggingface:main Feb 19, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants