SDXL Inpainting VAE Normalization #7225

AlekseyKorshuk · 2024-03-05T16:59:28Z

What does this PR do?

Since the release of Playground V2.5 with "custom" VAE we should normalize and denormalize latents.
This is already implemented in StableDiffusionXLPipeline:

diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

Lines 1230 to 1243 in 687bc27

    
           # unscale/denormalize the latents 
        
           # denormalize with the mean and std if available and not None 
        
           has_latents_mean = hasattr(self.vae.config, "latents_mean") and self.vae.config.latents_mean is not None 
        
           has_latents_std = hasattr(self.vae.config, "latents_std") and self.vae.config.latents_std is not None 
        
           if has_latents_mean and has_latents_std: 
        
               latents_mean = ( 
        
                   torch.tensor(self.vae.config.latents_mean).view(1, 4, 1, 1).to(latents.device, latents.dtype) 
        
               ) 
        
               latents_std = ( 
        
                   torch.tensor(self.vae.config.latents_std).view(1, 4, 1, 1).to(latents.device, latents.dtype) 
        
               ) 
        
               latents = latents * latents_std / self.vae.config.scaling_factor + latents_mean 
        
           else: 
        
               latents = latents / self.vae.config.scaling_factor

Who can review?

cc: @sayakpaul @patil-suraj

sayakpaul · 2024-03-05T18:36:00Z

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py

@@ -308,6 +308,25 @@ def retrieve_timesteps(
    return timesteps, num_inference_steps


+def requires_vae_latents_normalization(vae):


I think we can definitely do this in place without having to delegate it to a method. As that way the readability of the code stays linear and the reader doesn't have to refer to another method to see what's going on.

sayakpaul · 2024-03-05T18:36:07Z

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py

+def normalize_vae_latents(latents, latents_mean, latents_std):
+    latents_mean = latents_mean.to(device=latents.device, dtype=latents.dtype)
+    latents_std = latents_std.to(device=latents.device, dtype=latents.dtype)
+    latents = (latents - latents_mean) / latents_std
+    return latents


Same as above.

sayakpaul · 2024-03-05T18:36:14Z

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py

+def denormalize_vae_latents(latents, latents_mean, latents_std):
+    latents_mean = latents_mean.to(device=latents.device, dtype=latents.dtype)
+    latents_std = latents_std.to(device=latents.device, dtype=latents.dtype)
+    latents = latents * latents_std + latents_mean
+    return latents


Same as above.

sayakpaul

Do you have some results for us to see how the Playground v2.5 checkpoint plays out with inpainting.

I am okay with that changes you're introducing given the comments are addressed.

Cc: @patil-suraj as well.

AlekseyKorshuk · 2024-03-05T18:43:48Z

@sayakpaul Thank for commenting!

Don't you think that the code should be DRY (Don't Repeat Yourself)? Overall these 3 functions should be used in any place of SDXL pipeline with this VAE. Maybe it is reasonable to move it to "utils"/"common" or in base SDXL pipeline.
Just my thoughts, otherwise happy to put inplace without delegating the function.

Don't have checkpoint yet. I trained one but figured out this normalisation too late...

sayakpaul · 2024-03-05T18:50:29Z

Don't you think that the code should be DRY (Don't Repeat Yourself)? Overall these 3 functions should be used in any place of SDXL pipeline with this VAE. Maybe it is reasonable to move it to "utils"/"common" or in base SDXL pipeline.

Please refer to https://huggingface.co/docs/diffusers/en/conceptual/philosophy. It's okay to trade away DRY in the interest of readability as our pipelines are also read for educational purposes.

AlekseyKorshuk · 2024-03-05T19:05:05Z

Addressed all the comments. Please let me know if I can be helpful.

Btw while you are replying, if it is possible by any chance to share training args used for this PR to train inpainting model:

[WIP] Add inpainting training script #6592 (comment)

This would be very helpful, thank you!

sayakpaul · 2024-03-05T19:11:20Z

Btw while you are replying, if it is possible by any chance to share training args used for this PR to train inpainting model:

Will defer that to the training ninja @patil-suraj here :)

yiyixuxu · 2024-03-09T03:46:16Z

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py

@@ -939,6 +939,17 @@ def _encode_vae_image(self, image: torch.Tensor, generator: torch.Generator):
        else:
            image_latents = retrieve_latents(self.vae.encode(image), generator=generator)

+        has_latents_mean = hasattr(self.vae.config, "latents_mean") and self.vae.config.latents_mean is not None


why do we need it here?
we didn't normalize the latents in SDXL tex to image

I think this would be needed because here we are encoding the input image. This is what we do in the SDXL DreamBooth LoRA training script.

But I agree that #7132 tackles it from a comprehensive perspective.

ok I re-opene it and I'm happy to be proven wrong
once we merge #7132, you can update the branch, and you can then test it out and show some results with and without the normalization.

Works for me!

yiyixuxu · 2024-03-09T04:35:11Z

just saw this PR #7132
I think it is correctly done there I'm closing this PR for now. Feel free to leave any questions and we will continue the discussion here

HuggingFaceDocBuilderDev · 2024-03-09T06:04:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2024-04-05T15:02:40Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2024-06-30T05:38:48Z

@yiyixuxu could you give this another look?

github-actions · 2024-10-09T15:08:06Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2024-10-09T15:25:40Z

@yiyixuxu a gentle ping.

github-actions · 2024-11-03T15:06:25Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions · 2024-11-28T15:06:31Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2024-11-28T15:22:35Z

Safe to close it now.

SDXL Inpainting VAE Normalization

6c3540e

sayakpaul reviewed Mar 5, 2024

View reviewed changes

AlekseyKorshuk added 2 commits March 5, 2024 21:59

review fix

cd1f90b

review fix

9f39aef

device issue fix

5e05aa5

yiyixuxu reviewed Mar 9, 2024

View reviewed changes

yiyixuxu closed this Mar 9, 2024

yiyixuxu reopened this Mar 9, 2024

github-actions bot added the stale Issues that haven't received updates label Apr 5, 2024

sayakpaul requested a review from yiyixuxu June 30, 2024 05:38

github-actions bot removed the stale Issues that haven't received updates label Sep 14, 2024

github-actions bot added the stale Issues that haven't received updates label Oct 9, 2024

sayakpaul removed the stale Issues that haven't received updates label Oct 9, 2024

github-actions bot added the stale Issues that haven't received updates label Nov 3, 2024

a-r-r-o-w removed the stale Issues that haven't received updates label Nov 3, 2024

github-actions bot added the stale Issues that haven't received updates label Nov 28, 2024

sayakpaul closed this Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDXL Inpainting VAE Normalization #7225

SDXL Inpainting VAE Normalization #7225

AlekseyKorshuk commented Mar 5, 2024

sayakpaul Mar 5, 2024

sayakpaul Mar 5, 2024

sayakpaul Mar 5, 2024

sayakpaul left a comment

AlekseyKorshuk commented Mar 5, 2024

sayakpaul commented Mar 5, 2024

AlekseyKorshuk commented Mar 5, 2024

sayakpaul commented Mar 5, 2024

yiyixuxu Mar 9, 2024 •

edited

Loading

sayakpaul Mar 9, 2024

sayakpaul Mar 9, 2024

yiyixuxu Mar 9, 2024

sayakpaul Mar 9, 2024

yiyixuxu commented Mar 9, 2024

HuggingFaceDocBuilderDev commented Mar 9, 2024

github-actions bot commented Apr 5, 2024

sayakpaul commented Jun 30, 2024

github-actions bot commented Oct 9, 2024

sayakpaul commented Oct 9, 2024

github-actions bot commented Nov 3, 2024

github-actions bot commented Nov 28, 2024

sayakpaul commented Nov 28, 2024

	# unscale/denormalize the latents
	# denormalize with the mean and std if available and not None
	has_latents_mean = hasattr(self.vae.config, "latents_mean") and self.vae.config.latents_mean is not None
	has_latents_std = hasattr(self.vae.config, "latents_std") and self.vae.config.latents_std is not None
	if has_latents_mean and has_latents_std:
	latents_mean = (
	torch.tensor(self.vae.config.latents_mean).view(1, 4, 1, 1).to(latents.device, latents.dtype)
	)
	latents_std = (
	torch.tensor(self.vae.config.latents_std).view(1, 4, 1, 1).to(latents.device, latents.dtype)
	)
	latents = latents * latents_std / self.vae.config.scaling_factor + latents_mean
	else:
	latents = latents / self.vae.config.scaling_factor

		@@ -308,6 +308,25 @@ def retrieve_timesteps(
		return timesteps, num_inference_steps


		def requires_vae_latents_normalization(vae):

SDXL Inpainting VAE Normalization #7225

SDXL Inpainting VAE Normalization #7225

Conversation

AlekseyKorshuk commented Mar 5, 2024

What does this PR do?

Who can review?

sayakpaul Mar 5, 2024

Choose a reason for hiding this comment

sayakpaul Mar 5, 2024

Choose a reason for hiding this comment

sayakpaul Mar 5, 2024

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

AlekseyKorshuk commented Mar 5, 2024

sayakpaul commented Mar 5, 2024

AlekseyKorshuk commented Mar 5, 2024

sayakpaul commented Mar 5, 2024

yiyixuxu Mar 9, 2024 • edited Loading

Choose a reason for hiding this comment

sayakpaul Mar 9, 2024

Choose a reason for hiding this comment

sayakpaul Mar 9, 2024

Choose a reason for hiding this comment

yiyixuxu Mar 9, 2024

Choose a reason for hiding this comment

sayakpaul Mar 9, 2024

Choose a reason for hiding this comment

yiyixuxu commented Mar 9, 2024

HuggingFaceDocBuilderDev commented Mar 9, 2024

github-actions bot commented Apr 5, 2024

sayakpaul commented Jun 30, 2024

github-actions bot commented Oct 9, 2024

sayakpaul commented Oct 9, 2024

github-actions bot commented Nov 3, 2024

github-actions bot commented Nov 28, 2024

sayakpaul commented Nov 28, 2024

yiyixuxu Mar 9, 2024 •

edited

Loading