-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add inpainting example script #241
Conversation
It may be worth adding insights from the RePaint paper to avoid having semantic artifacts in the inpainting. All it would take would be additional backtracking (Xt -> Xt-1 -> masked addition -> noising -> Xt -> Xt-1). section 4.2. Resampling in the paper |
I believe this inpainting attempt may be defective, it adds the masked initial latent at every time step, should be adding noised latents.
results are still semantically poor, whereas before it was all blurry. RePaint may be the only way to go. |
Thanks! That change does improve the results a lot. I will try to read the RePaint paper when I get a chance (although it looks like work is ongoing on that). This was just me trying the first thing that seemed to work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @nagolinc for adding this example! I just left a some nits. Let me know if you want to tackle those, otherwise I would be happy to do it :)
Also @jackloomen said, the result are not perfect but are very interesting nonetheless. This would be a cool simple example for in-painting.
I've added a colab here to play with it. https://colab.research.google.com/drive/196L1Kfodck2ZXkdIdLXPCGP2PMwJ2d5z?usp=sharing.
Think we can merge this as an initial simple example for in-painting @anton-l @patrickvonplaten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice addition @nagolinc !
The only thing that I'd feel quite strongly about is "API" consistency here in a sense that both mask_image
and init_image
will both be processed inside the pipeline
Is it ok if we tweet about your contribution here as it's the first major "community" contribution to the repo? I'd maybe tag you in a tweet (https://twitter.com/nagolinc) if that's ok?
Proposition: could we merge all the different stable diffusion pipelines into a single one ? Currently there is:
Would'nt it make more sense to have a single pipeline doing all of the above (including inpainting and imgtoimg at the same time if needed). This pipeline would have as argument:
There would be 4 ways to use it:
|
If the schedulers could implement an undo_step() operation, we could get RePaint working easily I think. |
It looks like @anton-l has written a RePaint Scheduler here (https://github.com/huggingface/diffusers/blob/aa6da5ad722ff361a599d2196e2be91f06744813/src/diffusers/schedulers/scheduling_repaint.py), so after this is merged, I will see if I can get that working with latent-diffusion. |
Sounds great! |
@jackloomen The We (@EteriaAI) integrated #243 into the pipeline in this pull request in 3 different implementations, and none of them seem to produce good results. It's possible there's an error in our implementations though, so I am not attempting to discourage further experimentation here. EDIT: the noise sampled technique in this pipeline works okay, we posted a sample image here |
This is awesome. Thanks @nagolinc . Just a thought - the approach here seems similar to https://arxiv.org/pdf/2206.02779.pdf and a I wonder if borrowing the insight around fine-tuning the decoder’s weights is worthwhile to explore in the future here? |
I'm getting the same results, but there should be no reason for it to happen. Worst case scario nothing meaningful would change in the results. I can only assume that undo_step was done incorrectly, or there is some mismatch in what noise level is applied. |
@jackloomen undo step seems to be correct because you can apply the same at any timestep t (in a regular non-repaint loop) to get more noise in the whole image, which doesn't collapse into a single color. I output some debug images at each step in the inference process and you can slowly see the noise in the known area decreasing but the noise in the unknown area just collapses to a single color really fast. Increasing the 'r' or 'j' values exacerbates the problem. It looks like in some way the jump and undo causes a mismatch in the noise levels between masked and unmasked parts of the image and then that probably causes some kind of issue with the model. |
I added a unified pipeline (examples/inference/unified.py) that does diffusion if you supply neither a init_image nor a mask, img2img if you just give it init_image and inpainting if you provide both. |
examples/inference/unified.py
Outdated
mask = torch.cat([mask] * batch_size) | ||
# check sizes | ||
if not mask.shape == init_latents.shape: | ||
raise ValueError(f"The mask and init_image should be the same size!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flake8 reports: F541 f-string is missing placeholders
, the f
is not necessary
That's great! It works well, but you can't do a pure inpainting now (without image help). I was thinking having three images as input (init_image (drawing for example) , inpainting_image and inpainting_mask) |
To help everyone to test this, in attachment is the same code as a script with arguments, executable directly: |
@leszekhanusz I’m not sure I follow. By pure inpainting do you mean something different than inpainting with strength=1 (so init image has no effect on inpainted regions)? |
Hi @leszekhanusz , those are really good points ! As stated in the I'm very much in favor of adding the in-painting examples, it's a really nice example, but not in favor of unified pipeline. This PR is good for merge once we remove the new example :) Thanks a lot for working on this @nagolinc , great work! |
Also , very nice discussion here. We have a discord channel for such discussions, feel free to join if you are interested. |
@patil-suraj Okay, I've moved the unified to a different branch (main...nagolinc:diffusers:unified) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! @nagolinc Do you have any colab that we could link from readme ?
no worries if not, we could link the colab that I shared if you want :)
@patil-suraj this is the notebook I have been using for testing: https://github.com/nagolinc/notebooks/blob/main/inpainting.ipynb |
Great, would it be alright if we use the notebook I shared, want to use something simpler :) |
@patil-suraj For sure |
looking closer at it, shouldn't the original latent be noised to t-1 instead of t, and for t=1 no noise at all?
|
Yes, that's what I meant. I see now that your approach with only 2 images is perfectly fine. |
Thanks, but it appears that this link is not available for me. Should I join somewhere first? |
some results, using some modifications:
prompt: golden retriever, dog, depth of field, centered, photo original: (from prompt) mask is dog face and 2 grass spots on either side. inpainted: (using the same prompt) No shared seeds. 60-130 steps. This was done without resampling anything, just standard. I suspect results highly depend on whether the original came from the model, when trying custom non generated images results are always poorer. Using the same prompt is probably also important. If there is a way to come up with a 'text' embedding based on an arbitrary image, it may improve results for non SD generated images. |
* add inpainting * added proper noising of init_latent as reccommened by jackloomen (#241 (comment)) * move image preprocessing inside pipeline and allow non 512x512 mask
huggingface#241) * Upload benchmark results for every test-models workflow (excl. Vulkan)
This script is a copy of https://github.com/huggingface/diffusers/blob/main/examples/inference/image_to_image.py but for inpainting.
example usage