-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU #6683
Conversation
) | ||
|
||
if latents is None: | ||
latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to change dtype = torch.float32
Hi @patrickvonplaten and @pcuenca, could you pls help to review this PR? since this one is much alike the previous one we proposed. [Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU [https://github.com//pull/3105]. Thanks! |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Can we try to fix the quality test? I think running |
Hi @patrickvonplaten , thanks for the approval and sorry for the late response since we just come back from Lunar New Year sabbatical. I have fixed the code format with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, made a couple of suggestions about the docs.
examples/community/README.md
Outdated
To use this pipeline, you need to: | ||
1. Install [IPEX](https://github.com/intel/intel-extension-for-pytorch) | ||
|
||
**Note:** For each PyTorch release, there is a corresponding release of the IPEX. Here is the mapping relationship. It is recommended to install Pytorch/IPEX2.0 to get the best performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Note:** For each PyTorch release, there is a corresponding release of the IPEX. Here is the mapping relationship. It is recommended to install Pytorch/IPEX2.0 to get the best performance. | |
**Note:** For each PyTorch release, there is a corresponding release of IPEX. Here is the mapping relationship. It is recommended to install Pytorch/IPEX2.0 to get the best performance. |
examples/community/README.md
Outdated
|
||
2. After pipeline initialization, `prepare_for_ipex()` should be called to enable IPEX accelaration. Supported inference datatypes are Float32 and BFloat16. | ||
|
||
**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference. | |
**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference. | |
examples/community/README.md
Outdated
############## bf16 inference performance ############### | ||
|
||
# 1. IPEX Pipeline initialization | ||
pipe = StableDiffusionXLPipelineIpex.from_pretrained(model_id, use_auth_token=True, low_cpu_mem_usage=True, use_safetensors=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipe = StableDiffusionXLPipelineIpex.from_pretrained(model_id, use_auth_token=True, low_cpu_mem_usage=True, use_safetensors=True) | |
pipe = StableDiffusionXLPipelineIpex.from_pretrained(model_id, low_cpu_mem_usage=True, use_safetensors=True) |
Do we need use_auth_token
?
image = pipe(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=guidance_scale).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()' | ||
``` | ||
|
||
The following code compares the performance of the original stable diffusion xl pipeline with the ipex-optimized pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a brief qualitative comment about the kind of speedup we should expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pcuenca, thanks so much for the detailed suggestions. I have modified all parts as you suggested. Except for some wording part, most important modifications are:
data_type
change totorch.float32
/torch.bfloat16
accordinglyuse_auth_token
is not needed in this pipeline, so I simply remove all of it- give an estimated performance boost (1.4-2x) in the comment, which is tested using BFloat16 on the fourth generation of Intel Xeon CPU code-named SPR
- remove the long-line comments and make it appear only once at the beginning
Pls help to review the updated code, thanks!
examples/community/README.md
Outdated
|
||
2. After pipeline initialization, `prepare_for_ipex()` should be called to enable IPEX accelaration. Supported inference datatypes are Float32 and BFloat16. | ||
|
||
**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference. | |
**Note:** The values of `height` and `width` used during preparation with `prepare_for_ipex()` should be the same when running inference with the prepared pipeline. |
examples/community/README.md
Outdated
pipe.prepare_for_ipex(data_type, prompt, height=512, width=512) #value of image height/width should be consistent with the pipeline inference | ||
# For BFloat16 | ||
pipe.prepare_for_ipex(data_type, prompt, height=512, width=512) #value of image height/width should be consistent with the pipeline inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two lines are identical. Maybe:
- Spell out the data types
torch.bfloat16
,torch.float32
instead of using the same variable name. - Remove the long comments in the invocation lines, we could put it once at the beginning of the snippet.
examples/community/README.md
Outdated
image = pipe(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=guidance_scale).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()' | ||
# For BFloat16 | ||
with torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16): | ||
image = pipe(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=guidance_scale).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, the long lines with the comments can be distracting
>>> # For Float32 | ||
>>> pipe.prepare_for_ipex(data_type, prompt, height=512, width=512) #value of image height/width should be consistent with the pipeline inference | ||
>>> # For BFloat16 | ||
>>> pipe.prepare_for_ipex(data_type, prompt, height=512, width=512) #value of image height/width should be consistent with the pipeline inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as in the docs
Hi, this pipeline aims to speed up the inference of Stable Diffusion XL on Intel Xeon CPUs on Linux. It is much alike the previous one merged #3105 for Stable Diffusion.
By using this optimized pipeline, we can get about 1.4-2 times performance acceleration with BFloat16 on fourth generation of Intel Xeon CPUs, code-named Sapphire Rapids.
It is also recommended to run on Pytorch/IPEX v2.0 and above to get the best performance boost.
The main profits are illustrated as below, which are the same with our previous PR:
-For Pytorch/IPEX v2.0 and above, it benefits from MHA optimization with Flash Attention and TorchScript mode optimization in IPEX.
-For Pytorch/IPEX v1.13, it benefits from TorchScript mode optimization in IPEX.
Below are the tables which show the test results for SDXL-Turbo (a model that is a distilled version of SDXL 1.0, trained for real-time synthesis) with 1 step/4 steps on Intel® Xeon® Platinum 8468V Processor (48cores/socket, 1socket):
Based on the code and test results, could u pls help to review? Thanks!