VQ-diffusion #658

williamberman · 2022-09-27T19:21:49Z

VQ-diffusion

original issue: #319

So far, this PR only focusses on the vq-diffusion model for the ITHQ dataset. It is possible that the models for the other datasets require additional changes.

The commits are broken out by the model component they port. See the individual commit messages for more in depth write ups on changes

Example usage

See linked notebook for how to produce pretrained model for diffusers.

from diffusers.pipelines import VQDiffusionPipeline
import torch
from PIL import Image

OUTPUT_FILE = "<File to save image grid>"

torch.manual_seed(0)

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols
    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

device = 'cuda'

pipeline = VQDiffusionPipeline.from_pretrained("microsoft/vq-diffusion-ithq").to(device)

images = pipeline("teddy bear playing in the pool", truncation_rate=0.86, num_images_per_prompt=4)

image_grid(images.images, 2, 2).save(OUTPUT_FILE)

Progress

Comparing against original model

I put together a python notebook that compares the currently ported components against eachother. It can be run on google colab and will handle installing dependencies and model weights. See its README for more details.

https://github.com/williamberman/vq-diffusion-notebook

HuggingFaceDocBuilderDev · 2022-09-27T19:25:10Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2022-09-29T17:27:16Z

Wooow - this looks amazing! @patil-suraj mind taking a look here?

williamberman · 2022-09-29T18:18:17Z

Wooow - this looks amazing! @patil-suraj mind taking a look here?

Thanks! Would love to know if the minimum unit to merge is the completed pipeline or if smaller chunks like this are acceptable to merge.

scripts/convert_vq_diffusion_to_diffusers.py

patrickvonplaten · 2022-10-07T22:05:49Z

Hey @williamberman and @345ishaan,

Sorry for the delay - I have more time going forward and think we can merge this model by next week!

It's a super nice PR. Extremely easy to understand and to follow - thanks a bunch!

I've fiddled a bit into the PR to make it a bit more light-weight :-) We don't really need a new attention layer as Conv2D layers when used for attention are just like linear layers for which an attention class already exists.

That's why I removed the Conv2DAttention class and changed the conversion script slightly so that your script still works as expected. I'm getting visually identical reconstruction images, so I think we're good with the linear attention layer that was already implemented (could you maybe double check?).

Now I think in a next step we can implement the U-Net and scheduler for the forward pass, no? Do you need help/guidance here?

williamberman · 2022-10-07T22:40:45Z

Hey @williamberman and @345ishaan,

Sorry for the delay - I have more time going forward and think we can merge this model by next week!

It's a super nice PR. Extremely easy to understand and to follow - thanks a bunch!

I've fiddled a bit into the PR to make it a bit more light-weight :-) We don't really need a new attention layer as Conv2D layers when used for attention are just like linear layers for which an attention class already exists.

That's why I removed the Conv2DAttention class and changed the conversion script slightly so that your script still works as expected. I'm getting visually identical reconstruction images, so I think we're good with the linear attention layer that was already implemented (could you maybe double check?).

Now I think in a next step we can implement the U-Net and scheduler for the forward pass, no? Do you need help/guidance here?

Thanks! Pinged in discord as well but the model has a transformer (just the encoder iirc) for the reverse diffusion process instead of a unet. I have the transformer ported on another branch. I think the open question is would you prefer that on this PR or to merge this PR first and then merge the transformer on a separate PR?

patrickvonplaten · 2022-10-10T10:01:39Z

Hey @williamberman and @345ishaan,
Sorry for the delay - I have more time going forward and think we can merge this model by next week!
It's a super nice PR. Extremely easy to understand and to follow - thanks a bunch!
I've fiddled a bit into the PR to make it a bit more light-weight :-) We don't really need a new attention layer as Conv2D layers when used for attention are just like linear layers for which an attention class already exists.
That's why I removed the Conv2DAttention class and changed the conversion script slightly so that your script still works as expected. I'm getting visually identical reconstruction images, so I think we're good with the linear attention layer that was already implemented (could you maybe double check?).
Now I think in a next step we can implement the U-Net and scheduler for the forward pass, no? Do you need help/guidance here?

Thanks! Pinged in discord as well but the model has a transformer (just the encoder iirc) for the reverse diffusion process instead of a unet. I have the transformer ported on another branch. I think the open question is would you prefer that on this PR or to merge this PR first and then merge the transformer on a separate PR?

Hey @williamberman,

Great to hear that you already have it on a branch. Could you maybe add it directly to this PR? Maybe in a next step we could verify that a forward pass through the transformer (replacement of the unet) gives identical results to the official implementation. If that works, we can integrate the scheduler and then in a last step make a whole denoising process work :-)

Overall, everything should ideally be in this PR. Since VQ-diffusion will be one of our first community pipeline contributions, there are lots of new things in this PR and I'm more than happy to help you with it (don't hesitate to ping me :-))

williamberman · 2022-10-10T16:59:02Z

Great to hear that you already have it on a branch. Could you maybe add it directly to this PR? Maybe in a next step we could verify that a forward pass through the transformer (replacement of the unet) gives identical results to the official implementation. If that works, we can integrate the scheduler and then in a last step make a whole denoising process work :-)

Overall, everything should ideally be in this PR. Since VQ-diffusion will be one of our first community pipeline contributions, there are lots of new things in this PR and I'm more than happy to help you with it (don't hesitate to ping me :-))

SG!

Follow up:

Merge transformer into this branch.
Add script I've been using to test transformer to pr description.
Merge CLIP in pipeline/convert script for text embeddings into this branch
Add script for testing CLIP to pr description
Add initial skeleton for scheduler/pipeline to this branch (I also have this on branch with transformer)

patrickvonplaten · 2022-10-11T18:26:22Z

Great to hear that you already have it on a branch. Could you maybe add it directly to this PR? Maybe in a next step we could verify that a forward pass through the transformer (replacement of the unet) gives identical results to the official implementation. If that works, we can integrate the scheduler and then in a last step make a whole denoising process work :-)
Overall, everything should ideally be in this PR. Since VQ-diffusion will be one of our first community pipeline contributions, there are lots of new things in this PR and I'm more than happy to help you with it (don't hesitate to ping me :-))

SG!

Follow up:

Merge transformer into this branch.

Add script I've been using to test transformer to pr description.

Merge CLIP in pipeline/convert script for text embeddings into this branch

Add script for testing CLIP to pr description

Add initial skeleton for scheduler/pipeline to this branch (I also have this on branch with transformer)

This sounds like a great plan!

williamberman · 2022-10-12T16:57:12Z

PR description is updated to reflect progress on merging full model. Notebook which compares outputs from autoencoder, transformer, and text embedder is here and linked in PR description https://github.com/williamberman/vq-diffusion-notebook. Once the scheduler is complete, will also add it to the notebook :)

patrickvonplaten · 2022-10-14T17:48:20Z

@williamberman let me know if you'd like me to review already now or better once the scheduler is integrated as well :-)

patrickvonplaten · 2022-10-14T17:48:27Z

Great progress!

williamberman · 2022-10-14T17:56:19Z

@williamberman let me know if you'd like me to review already now or better once the scheduler is integrated as well :-)

Let’s wait until the scheduler is integrated. I cleaned some non scheduler components up while working on it that I’d like to add to this branch first :)

tests/pipelines/vq_diffusion/test_vq_diffusion.py

…erman/diffusers into vq-diffusion-ithq-vqvae

patrickvonplaten · 2022-11-02T19:38:34Z

tests/pipelines/vq_diffusion/test_vq_diffusion.py

+        torch.cuda.empty_cache()
+
+    def test_vq_diffusion(self):
+        expected_image = load_image(


added a test here @williamberman - pipeline works like a charm and fits nicely with the existing API :-)

Great, exciting!

patrickvonplaten · 2022-11-02T19:49:50Z

Hey @williamberman,

You've really done an amazing job here! Very impressed by how you were able to add such a new complex model into the existing API!

The conversion script and your notebook is very easy to follow :-)

I've uploaded the ithq model now to the microsoft org here: https://huggingface.co/microsoft/vq-diffusion-ithq and added a slow test that makes sure the model works as expected.
Besides that I've done some minor naming changes.

The failing tests are unrelated and we can merge this PR more or less already as is. If ok for you, I would do some final renaming changes tomorrow morning (Paris time) to make it fit a bit better with existing configuration names (will have to sync with @patil-suraj @anton-l and @pcuenca ) and then we can merge this to be in the next release IMO.

@patil-suraj @pcuenca @anton-l could you maybe already review this PR? IMO, besides some re-naming it's ready! I've also made sure that all existing slow tests are passing!

@williamberman if you're interesting we could do the following follow-up projects to promote this model a bit more:

Write a short blog post about this model and put in on https://huggingface.co/blog (If you want you could open a blog here: https://github.com/huggingface/blog) - I think the community might be really interested in finding out the difference between latent diffusion models and this vq-diffusion model :-)
Make the model card a bit nicer: https://huggingface.co/microsoft/vq-diffusion-ithq (if you want you could open a PR on the repo)
Send me an email to patrick@huggingface.co and I could connect you to the authors of vq-diffusion model (we could sync with them a bit on promoting this integration on Twitter/Linked-In if you want :-) )

Obviously no need to do any of the above points if you don't want or are too busy! Regarding this PR, I think we can merge it tomorrow morning!

Really great job here 🚀

williamberman · 2022-11-02T22:24:23Z

Hey @williamberman,

You've really done an amazing job here! Very impressed by how you were able to add such a new complex model into the existing API!

The conversion script and your notebook is very easy to follow :-)

I've uploaded the ithq model now to the microsoft org here: https://huggingface.co/microsoft/vq-diffusion-ithq and added a slow test that makes sure the model works as expected. Besides that I've done some minor naming changes.

The failing tests are unrelated and we can merge this PR more or less already as is. If ok for you, I would do some final renaming changes tomorrow morning (Paris time) to make it fit a bit better with existing configuration names (will have to sync with @patil-suraj @anton-l and @pcuenca ) and then we can merge this to be in the next release IMO.

@patil-suraj @pcuenca @anton-l could you maybe already review this PR? IMO, besides some re-naming it's ready! I've also made sure that all existing slow tests are passing!

@williamberman if you're interesting we could do the following follow-up projects to promote this model a bit more:

Write a short blog post about this model and put in on https://huggingface.co/blog (If you want you could open a blog here: https://github.com/huggingface/blog) - I think the community might be really interested in finding out the difference between latent diffusion models and this vq-diffusion model :-)

Make the model card a bit nicer: https://huggingface.co/microsoft/vq-diffusion-ithq (if you want you could open a PR on the repo)

Send me an email to patrick@huggingface.co and I could connect you to the authors of vq-diffusion model (we could sync with them a bit on promoting this integration on Twitter/Linked-In if you want :-) )

Obviously no need to do any of the above points if you don't want or are too busy! Regarding this PR, I think we can merge it tomorrow morning!

Really great job here 🚀

Awesome all sound good!

I think a blog post sounds great, sent you an email :)

345ishaan · 2022-11-03T05:26:26Z

@williamberman I am wondering did you try training it or just verified in infer mode?

williamberman · 2022-11-03T06:02:16Z

@williamberman I am wondering did you try training it or just verified in infer mode?

Just inference using the weights microsoft published. Training would have been a good amount more work 😅

anton-l

Awesome integration @williamberman, thank you for working on it!

src/diffusers/schedulers/scheduling_vq_diffusion.py

src/diffusers/pipelines/vq_diffusion/pipeline_vq_diffusion.py

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

pcuenca

Amazing work!

src/diffusers/pipelines/vq_diffusion/pipeline_vq_diffusion.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

Just happened to have this card on my Windows machine and verified that the SD demo works on it. ``` Average step time: 144.26142692565918ms/it Clip Inference Avg time (ms) = (205.001 + 44.000) / 2 = 124.501 VAE Inference time (ms): 281.001 Total image generation time: 7.856997728347778sec ``` I'd love to add an API upstream to derive compiler tuning flags from a host device.

* Changes for VQ-diffusion VQVAE Add specify dimension of embeddings to VQModel: `VQModel` will by default set the dimension of embeddings to the number of latent channels. The VQ-diffusion VQVAE has a smaller embedding dimension, 128, than number of latent channels, 256. Add AttnDownEncoderBlock2D and AttnUpDecoderBlock2D to the up and down unet block helpers. VQ-diffusion's VQVAE uses those two block types. * Changes for VQ-diffusion transformer Modify attention.py so SpatialTransformer can be used for VQ-diffusion's transformer. SpatialTransformer: - Can now operate over discrete inputs (classes of vector embeddings) as well as continuous. - `in_channels` was made optional in the constructor so two locations where it was passed as a positional arg were moved to kwargs - modified forward pass to take optional timestep embeddings ImagePositionalEmbeddings: - added to provide positional embeddings to discrete inputs for latent pixels BasicTransformerBlock: - norm layers were made configurable so that the VQ-diffusion could use AdaLayerNorm with timestep embeddings - modified forward pass to take optional timestep embeddings CrossAttention: - now may optionally take a bias parameter for its query, key, and value linear layers FeedForward: - Internal layers are now configurable ApproximateGELU: - Activation function in VQ-diffusion's feedforward layer AdaLayerNorm: - Norm layer modified to incorporate timestep embeddings * Add VQ-diffusion scheduler * Add VQ-diffusion pipeline * Add VQ-diffusion convert script to diffusers * Add VQ-diffusion dummy objects * Add VQ-diffusion markdown docs * Add VQ-diffusion tests * some renaming * some fixes * more renaming * correct * fix typo * correct weights * finalize * fix tests * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * finish * finish * up Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

williamberman mentioned this pull request Sep 27, 2022

VQ-Diffusion #319

Closed

2 tasks

williamberman force-pushed the vq-diffusion-ithq-vqvae branch from 12e0d2a to f242bf8 Compare September 27, 2022 19:31

patrickvonplaten requested a review from patil-suraj September 29, 2022 17:27

patrickvonplaten self-assigned this Sep 29, 2022

345ishaan reviewed Oct 1, 2022

View reviewed changes

scripts/convert_vq_diffusion_to_diffusers.py Outdated Show resolved Hide resolved

williamberman closed this Oct 10, 2022

williamberman deleted the vq-diffusion-ithq-vqvae branch October 10, 2022 23:08

williamberman restored the vq-diffusion-ithq-vqvae branch October 10, 2022 23:09

williamberman reopened this Oct 10, 2022

williamberman changed the title ~~Port VQ-diffusion VQVAE for the ITHQ dataset to diffusers~~ [WIP] VQ-diffusion Oct 10, 2022

williamberman force-pushed the vq-diffusion-ithq-vqvae branch 3 times, most recently from c564b04 to 496524c Compare October 11, 2022 06:54

williamberman force-pushed the vq-diffusion-ithq-vqvae branch from a8c9074 to d14dff7 Compare October 13, 2022 07:25

williamberman force-pushed the vq-diffusion-ithq-vqvae branch 4 times, most recently from 9d33a33 to 6b2bc69 Compare October 20, 2022 10:15

williamberman commented Nov 2, 2022

View reviewed changes

tests/pipelines/vq_diffusion/test_vq_diffusion.py Outdated Show resolved Hide resolved

williamberman requested review from patrickvonplaten and removed request for patil-suraj November 2, 2022 07:58

patrickvonplaten added 4 commits November 2, 2022 20:31

Merge branch 'main' into vq-diffusion-ithq-vqvae

fbc3f5a

some renaming

3d7eb27

Merge branch 'vq-diffusion-ithq-vqvae' of https://github.com/williamb…

1e2102a

…erman/diffusers into vq-diffusion-ithq-vqvae

some fixes

c981f09

patrickvonplaten reviewed Nov 2, 2022

View reviewed changes

patrickvonplaten added 5 commits November 3, 2022 10:05

more renaming

f88e6a2

correct

1ed3752

fix typo

5fe7cfa

correct weights

8350615

finalize

558195d

anton-l reviewed Nov 3, 2022

View reviewed changes

src/diffusers/schedulers/scheduling_vq_diffusion.py Show resolved Hide resolved

src/diffusers/pipelines/vq_diffusion/pipeline_vq_diffusion.py Show resolved Hide resolved

patrickvonplaten and others added 2 commits November 3, 2022 12:53

fix tests

c72b8e9

Apply suggestions from code review

cc303e4

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

pcuenca approved these changes Nov 3, 2022

View reviewed changes

src/diffusers/pipelines/vq_diffusion/pipeline_vq_diffusion.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/vq_diffusion/pipeline_vq_diffusion.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/vq_diffusion/pipeline_vq_diffusion.py Show resolved Hide resolved

patrickvonplaten and others added 5 commits November 3, 2022 15:26

Apply suggestions from code review

303a365

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

finish

963c71a

finish

213af1f

Merge branch 'main' into vq-diffusion-ithq-vqvae

56f6e4d

up

8d2d88f

patrickvonplaten merged commit ef2ea33 into huggingface:main Nov 3, 2022

patrickvonplaten mentioned this pull request Nov 4, 2022

Music Spectrogram diffusion pipeline #1044

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VQ-diffusion #658

VQ-diffusion #658

williamberman commented Sep 27, 2022 •

edited by patrickvonplaten

Loading

HuggingFaceDocBuilderDev commented Sep 27, 2022 •

edited

Loading

patrickvonplaten commented Sep 29, 2022

williamberman commented Sep 29, 2022

patrickvonplaten commented Oct 7, 2022 •

edited

Loading

williamberman commented Oct 7, 2022 •

edited

Loading

patrickvonplaten commented Oct 10, 2022

williamberman commented Oct 10, 2022 •

edited

Loading

patrickvonplaten commented Oct 11, 2022

williamberman commented Oct 12, 2022

patrickvonplaten commented Oct 14, 2022

patrickvonplaten commented Oct 14, 2022

williamberman commented Oct 14, 2022

patrickvonplaten Nov 2, 2022

williamberman Nov 2, 2022

patrickvonplaten commented Nov 2, 2022

williamberman commented Nov 2, 2022

345ishaan commented Nov 3, 2022

williamberman commented Nov 3, 2022

anton-l left a comment

pcuenca left a comment

VQ-diffusion #658

VQ-diffusion #658

Conversation

williamberman commented Sep 27, 2022 • edited by patrickvonplaten Loading

VQ-diffusion

Example usage

Progress

Comparing against original model

HuggingFaceDocBuilderDev commented Sep 27, 2022 • edited Loading

patrickvonplaten commented Sep 29, 2022

williamberman commented Sep 29, 2022

patrickvonplaten commented Oct 7, 2022 • edited Loading

williamberman commented Oct 7, 2022 • edited Loading

patrickvonplaten commented Oct 10, 2022

williamberman commented Oct 10, 2022 • edited Loading

patrickvonplaten commented Oct 11, 2022

williamberman commented Oct 12, 2022

patrickvonplaten commented Oct 14, 2022

patrickvonplaten commented Oct 14, 2022

williamberman commented Oct 14, 2022

patrickvonplaten Nov 2, 2022

Choose a reason for hiding this comment

williamberman Nov 2, 2022

Choose a reason for hiding this comment

patrickvonplaten commented Nov 2, 2022

williamberman commented Nov 2, 2022

345ishaan commented Nov 3, 2022

williamberman commented Nov 3, 2022

anton-l left a comment

Choose a reason for hiding this comment

pcuenca left a comment

Choose a reason for hiding this comment

williamberman commented Sep 27, 2022 •

edited by patrickvonplaten

Loading

HuggingFaceDocBuilderDev commented Sep 27, 2022 •

edited

Loading

patrickvonplaten commented Oct 7, 2022 •

edited

Loading

williamberman commented Oct 7, 2022 •

edited

Loading

williamberman commented Oct 10, 2022 •

edited

Loading