Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kolors compile #1007

Merged
merged 19 commits into from
Jul 23, 2024
Merged

Add kolors compile #1007

merged 19 commits into from
Jul 23, 2024

Conversation

lixiang007666
Copy link
Contributor

@lixiang007666 lixiang007666 commented Jul 11, 2024

This PR is done:

  • Add kolors compile&readme.

@lixiang007666
Copy link
Contributor Author

当前支持的 kolors diffusers 在:
huggingface/diffusers#8812

目前还没有合并。

@lixiang007666 lixiang007666 requested a review from strint July 11, 2024 08:35
@clackhan clackhan closed this Jul 16, 2024
@lixiang007666 lixiang007666 reopened this Jul 16, 2024
@lixiang007666
Copy link
Contributor Author

动态 shape 日志:

oneflow:

oneflow backend compile...
Starting warmup...
Warmup complete.
Warmup time: 39.30 seconds
Generated image saved to kolors_oneflow_compile.png in 3.50 seconds.
Max used CUDA memory : 20.627GiB
Test run with multiple resolutions...
Running at resolution: 1024x1024
Inference time: 4.17 seconds
Running at resolution: 1024x768
Inference time: 2.86 seconds
Running at resolution: 1024x576
Inference time: 2.40 seconds
Running at resolution: 1024x512
Inference time: 2.20 seconds
Running at resolution: 1024x256
Inference time: 1.30 seconds
Running at resolution: 768x1024
Inference time: 2.89 seconds
Running at resolution: 768x768
Inference time: 2.42 seconds
Running at resolution: 768x576
Inference time: 2.10 seconds
Running at resolution: 768x512
Inference time: 1.68 seconds
Running at resolution: 768x256
Inference time: 1.03 seconds
Running at resolution: 576x1024
Inference time: 2.43 seconds
Running at resolution: 576x768
Inference time: 2.10 seconds
Running at resolution: 576x576
Inference time: 1.53 seconds
Running at resolution: 576x512
Inference time: 1.48 seconds
Running at resolution: 576x256
Inference time: 0.93 seconds
Running at resolution: 512x1024
Inference time: 2.22 seconds
Running at resolution: 512x768
Inference time: 1.72 seconds
Running at resolution: 512x576
Inference time: 1.47 seconds
Running at resolution: 512x512
Inference time: 1.34 seconds
Running at resolution: 512x256
Inference time: 0.86 seconds
Running at resolution: 256x1024
Inference time: 1.33 seconds
Running at resolution: 256x768
Inference time: 1.04 seconds
Running at resolution: 256x576
Inference time: 0.93 seconds
Running at resolution: 256x512
Inference time: 0.86 seconds
Running at resolution: 256x256
Inference time: 0.78 seconds

nexfort:

nexfort backend compile...
Starting warmup...
Warmup complete.
Warmup time: 314.58 seconds
Generated image saved to kolors_nexfort_compile.png in 2.31 seconds.
Max used CUDA memory : 19.435GiB
Test run with multiple resolutions...
Running at resolution: 1024x1024
Inference time: 5.95 seconds
Running at resolution: 1024x768
Inference time: 4.04 seconds
Running at resolution: 1024x576
Inference time: 3.48 seconds
Running at resolution: 1024x512
Inference time: 3.13 seconds
Running at resolution: 1024x256
Inference time: 2.39 seconds
Running at resolution: 768x1024
Inference time: 4.09 seconds
Running at resolution: 768x768
Inference time: 3.50 seconds
Running at resolution: 768x576
Inference time: 3.07 seconds
Running at resolution: 768x512
Inference time: 2.45 seconds
Running at resolution: 768x256
Inference time: 2.40 seconds
Running at resolution: 576x1024
Inference time: 3.57 seconds
Running at resolution: 576x768
Inference time: 3.07 seconds
Running at resolution: 576x576
Inference time: 2.49 seconds
Running at resolution: 576x512
Inference time: 2.48 seconds
Running at resolution: 576x256
Inference time: 2.46 seconds
Running at resolution: 512x1024
Inference time: 3.18 seconds
Running at resolution: 512x768
Inference time: 2.51 seconds
Running at resolution: 512x576
Inference time: 2.49 seconds
Running at resolution: 512x512
Inference time: 2.49 seconds
Running at resolution: 512x256
Inference time: 2.46 seconds
Running at resolution: 256x1024
Inference time: 2.49 seconds
Running at resolution: 256x768
Inference time: 2.50 seconds
Running at resolution: 256x576
Inference time: 2.48 seconds
Running at resolution: 256x512
Inference time: 2.47 seconds
Running at resolution: 256x256
Inference time: 2.49 seconds

@lixiang007666
Copy link
Contributor Author

发现 nexfort 最近的更新有些优化导致速度变慢了很多,我定位下是哪个提交。

@lixiang007666
Copy link
Contributor Author

lixiang007666 commented Jul 19, 2024

发现 nexfort 最近的更新有些优化导致速度变慢了很多,我定位下是哪个提交。

pipe = compile_pipe(
pipe, backend="nexfort", options=options, ignores=['text_encoder'], fuse_qkv_projections=True
)

加了 ignores=['text_encoder'] 后,性能变差很多,不太合理。

于是定位到 https://github.com/siliconflow/nexfort/pull/91 ,发现这个 commit 之前不会出现这种情况。

不过这个问题对这个 PR 的合并没影响。

@lixiang007666
Copy link
Contributor Author

lixiang007666 commented Jul 19, 2024

TODO:

  • a100 测速。
  • 质量报告。

@strint
Copy link
Collaborator

strint commented Jul 19, 2024

加了 ignores=['text_encoder'] 后,性能变差很多,不太合理。

因为 kolors 用的 chatglm,印象中模型会比之前 sdxl 的 text encoder 大,这块开销应该是明显的

@lixiang007666 lixiang007666 merged commit a489df5 into main Jul 23, 2024
7 checks passed
@lixiang007666 lixiang007666 deleted the Add_kolors_compile branch July 23, 2024 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants