-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a reference for the model/architecture used by diffusers anywhere? It doesn't seem to match to the original stable-diffusion repo #901
Comments
Hey @tonetechnician, Thanks a lot for the write-up! Could you by any chance add a reproducible code snippet that compares the two? In our experiments, two months ago, Just to be clear - we want to ensure 100% 1-to-1 the same output as the original repo and if that's not the case currently it's clearly a bug! Thanks for bringing this clearer to our attention! |
Regarding, the architecture - it's one to one the same as the original architecture. We just renamed some keys to make the naming clearer and the overall architecture more general - so that we're able to add more model architectures than just stable diffusion in the future :-) Apart from this there are two main differences:
I'll try to solve the differences and keep you updated here :-) Any additional code snippets I could test would be extremely useful! |
No worries! Thanks for the detailed response. I'd be super keen to help however I can to figure it out. Will give a bit more info and code to compare the two implementations. Currently I've just been testing using Automatic's + NMKD SD-GUI implementation vs a diffusers to compare which I think could be a place to start if you want to confirm the differences in output. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What API design would you like to have changed or added to the library? Why?
Hey there!
I'm not sure if this is the right section to post this, but I have a request/question for a write up on the inference configuration used by diffusers. Similar to a config.yaml in other model repos.
Recently I have been digging in quite a bit to diffusers and comparing with other stable diffusion implementations to compare their outputs (see post here).
I've noticed that there quite noticeable differences (both in output and code) between diffusers and the regular stable-diffusion inference model https://github.com/CompVis/stable-diffusion/blob/main/configs/stable-diffusion/v1-inference.yaml as implemented in both Automatic1111 and SD-GUI which give the same results to one another, but diffusers is an outlier.
I dug deeper into the model architecture in diffusers and did notice there are a few differences in the default values set for each block for just about all steps of the stable diffusion process. however, my knowledge of the architecture itself isn't as good as I'd like it to be so I'm mostly comparing the stable diffusion architecture and trying to match it with diffusers. That being said, I did try to match the settings best I could in order to try get a one to one result. Modifying parameters given in the VAE encoder seems to have a quite an effect on what image gets outputted and it's led me to believe there must be a fundamental difference between the inference model and base stable diffusion model.
I did find this script https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py and ran through the procedure, but wasn't entirely sure what it actually does and how the variables fit in exactly the models used in diffusers, but I do see some defaults differ. I figure @patil-suraj may have a bit more info on the architecture within diffusers, and how it differs from the original stable-diffusion repo.
I've noticed the largest differences seem to be in the img2img pipelines, where I believe the output is not as crisp and sharp as the base stable-diffusion library, and felt that this is something that should probably be solved one way or another.
Would love to know if a config file, or write up on the usage of the conversion scripts in the /scripts directory would be possible!
The text was updated successfully, but these errors were encountered: