VQ-Diffusion #319

patrickvonplaten · 2022-09-01T12:55:15Z

Model/Pipeline/Scheduler description

VQ-Diffusion is based on a VQ-VAE whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). It produces significantly better text-to-image generation results when compared with Autoregressive models with similar numbers of parameters. Compared with previous GAN-based methods, VQ-Diffusion can handle more complex scenes and improve the synthesized image quality by a large margin.

https://github.com/microsoft/VQ-Diffusion

Open source status

The model implementation is available
The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

VQ-Diffusion would be a super cool addition to diffusers. cc @cientgu and @zzctan .

Also cc @patil-suraj here

The text was updated successfully, but these errors were encountered:

unography · 2022-09-06T18:55:30Z

Hi @patrickvonplaten, would love to take this up!

patrickvonplaten · 2022-09-09T15:27:31Z

This would be great! Let me know if you need any help :-) To begin with I think we should try to get it running with the original codebase and then port the code to diffusers

patil-suraj · 2022-09-09T15:58:37Z

Hey @unography awesome! Happy to help here if you have any questions.

patrickvonplaten · 2022-09-16T13:17:28Z

Any progress here @unography ? Do you already have an open PR :-) Otherwise let's maybe open it again to the community

345ishaan · 2022-09-19T06:49:51Z

Hi, I will be happy to contribute / collaborate on this :)

unography · 2022-09-19T14:27:24Z

Hi @patrickvonplaten, unfortunately, I've been unable to spend time on this right now due to some other commitments, we can open this up again to the community

patrickvonplaten · 2022-09-22T15:00:03Z

No worries! @345ishaan would you be interested in giving it a go?

345ishaan · 2022-09-23T04:28:00Z

@patrickvonplaten Yes, happy to start with this. Do you have any documentation / suggestions / reference CLs on how to quickstart?

345ishaan · 2022-09-26T06:19:44Z

Update: I was getting familiarized with the paper and author's code. I also checked how other models are integrated into diffuser's pipeline for inference only mode, so plan to do the same for VQ-Diffusion as next step using original code impln.

pcuenca · 2022-09-27T10:27:49Z

That's awesome, @345ishaan! Let us know if you need any help :)

williamberman · 2022-09-27T19:29:48Z

Hello, super sorry wasn't aware someone was already working on this! I ported the VQVAE for the ITHQ dataset. Would love to help contribute if possible :)

I put up a draft PR #658 for the VQVAE with docs on how to compare it against VQ-diffusion. Is the standard to wait until the whole pipeline is complete before merging anything, or is it ok to incrementally merge functionality? I.e. for VQ-diffusion, it might be easier to get the individual autoencoders to work one at a time in their own commits before moving on to the rest of the model.

Any advice is appreciated, thanks!

345ishaan · 2022-09-28T05:06:59Z

Hmm ok, if you have crossed the finish line, then go ahead! I was mostly working on adding the implentation to diffusers in inference mode. If you need any help further, happy to collaborate.

Going forward, what is the best way to avoid such overlaps? I thought it was via proposing/updating through issues.

williamberman · 2022-09-28T05:16:11Z

@345ishaan definitely not over the finish line, just ported the autoencoder for one of the models! Happy to collaborate :)

345ishaan · 2022-09-28T07:50:59Z

SG! I will check your CL. Do you want to chat over discord?

williamberman · 2022-09-30T17:04:17Z

@cientgu @zzctan

Could I have some help parsing q_posterior?

https://github.com/microsoft/VQ-Diffusion/blob/3c98e77f721db7c787b76304fa2c96a36c7b00af/image_synthesis/modeling/transformers/diffusion_transformer.py#L235-L267

I believe it's computing equation 11 in log space, but I still have a few questions. I understand it's adapted from https://github.com/ehoogeboom/multinomial_diffusion/blob/9d907a60536ad793efd6d2a6067b3c3d6ba9fce7/diffusion_utils/diffusion_multinomial.py#L171-L193 which provides the initial derivation that makes sense.

        # q(xt-1 | xt, x0) = q(xt | xt-1, x0) * q(xt-1 | x0) / q(xt | x0)
        # where q(xt | xt-1, x0) = q(xt | xt-1).

However, the later comment is a bit vague :)

        # Note: _NOT_ x_tmin1, which is how the formula is typically used!!!
        # Not very easy to see why this is true. But it is :)
        unnormed_logprobs = log_EV_qxtmin_x0 + self.q_pred_one_timestep(log_x_t, t)

Because it seems like the actual equation it's using is q(xt+1 | xt) * q(xt-1 | x0) / q(xt | x0).

Additional questions,

Some context on how you're handling masks in q_posterior would be helpful
What is the summation over in equation 11 and how does it map to q_posterior?
I don't see an analog for the lines starting from 262 onward in multinomial diffusion, could you provide some additional context there as well?

Lmk if any of that wasn't clear, thank you!

345ishaan · 2022-10-01T19:19:52Z

@williamberman I will be able to take some tasks today and tomorrow. I just checked your CL, it seems like you ported the vq-vae encoder there. Do you want to chat over discord to split tasks? My username is 345ishaan#9676

williamberman · 2022-10-01T19:42:00Z

pinged you in discord @345ishaan!

Zeqiang-Lai · 2023-03-02T08:25:20Z

@cientgu @zzctan

Could I have some help parsing q_posterior?

https://github.com/microsoft/VQ-Diffusion/blob/3c98e77f721db7c787b76304fa2c96a36c7b00af/image_synthesis/modeling/transformers/diffusion_transformer.py#L235-L267

I believe it's computing equation 11 in log space, but I still have a few questions. I understand it's adapted from https://github.com/ehoogeboom/multinomial_diffusion/blob/9d907a60536ad793efd6d2a6067b3c3d6ba9fce7/diffusion_utils/diffusion_multinomial.py#L171-L193 which provides the initial derivation that makes sense.
        # q(xt-1 | xt, x0) = q(xt | xt-1, x0) * q(xt-1 | x0) / q(xt | x0)
        # where q(xt | xt-1, x0) = q(xt | xt-1).
However, the later comment is a bit vague :)
        # Note: _NOT_ x_tmin1, which is how the formula is typically used!!!
        # Not very easy to see why this is true. But it is :)
        unnormed_logprobs = log_EV_qxtmin_x0 + self.q_pred_one_timestep(log_x_t, t)
Because it seems like the actual equation it's using is q(xt+1 | xt) * q(xt-1 | x0) / q(xt | x0).

Additional questions,

Some context on how you're handling masks in q_posterior would be helpful

What is the summation over in equation 11 and how does it map to q_posterior?

I don't see an analog for the lines starting from 262 onward in multinomial diffusion, could you provide some additional context there as well?

Lmk if any of that wasn't clear, thank you!

Have you figure out questions here ? I am also confused about that the actual computation seems to be q(xt+1 | xt) * q(xt-1 | x0) / q(xt | x0)

williamberman · 2023-04-26T23:44:32Z

Hey @Zeqiang-Lai I did actually figure out what was going on here!

This class is heavily commented

diffusers/src/diffusers/schedulers/scheduling_vq_diffusion.py

Line 299 in e0a2bd1

    
           # p_0(x_0=C_0 | x_t) / q(x_t | x_0=C_0)          ...      p_n(x_0=C_0 | x_t) / q(x_t | x_0=C_0)

I reverse engineered it through trial and error and a lot of whiteboard markers!

I don't remember all of it exactly, but the main components are doing the calculation in log space for numerical stability and avoiding a logspace matmul for which there's no memory efficient pytorch kernel. A few of the other components are just cheeky linear algebra.

I later discovered that there's an explanation in the appendix of the multinomial diffusion paper. I didn't read it exhaustively but from skimming, it looks like it's on similar material. https://arxiv.org/pdf/2102.05379.pdf

williamberman · 2023-04-26T23:46:19Z

@Zeqiang-Lai if you have any other questions on the math, feel free to shoot me an email wlbberman@gmail.com

patrickvonplaten added the New pipeline/model label Sep 1, 2022

williamberman mentioned this issue Oct 12, 2022

VQ-diffusion #658

Merged

12 tasks

williamberman closed this as completed Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VQ-Diffusion #319

VQ-Diffusion #319

patrickvonplaten commented Sep 1, 2022

unography commented Sep 6, 2022

patrickvonplaten commented Sep 9, 2022

patil-suraj commented Sep 9, 2022

patrickvonplaten commented Sep 16, 2022

345ishaan commented Sep 19, 2022

unography commented Sep 19, 2022

patrickvonplaten commented Sep 22, 2022

345ishaan commented Sep 23, 2022

345ishaan commented Sep 26, 2022

pcuenca commented Sep 27, 2022

williamberman commented Sep 27, 2022

345ishaan commented Sep 28, 2022

williamberman commented Sep 28, 2022

345ishaan commented Sep 28, 2022

williamberman commented Sep 30, 2022

345ishaan commented Oct 1, 2022

williamberman commented Oct 1, 2022

Zeqiang-Lai commented Mar 2, 2023

williamberman commented Apr 26, 2023

williamberman commented Apr 26, 2023

VQ-Diffusion #319

VQ-Diffusion #319

Comments

patrickvonplaten commented Sep 1, 2022

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

unography commented Sep 6, 2022

patrickvonplaten commented Sep 9, 2022

patil-suraj commented Sep 9, 2022

patrickvonplaten commented Sep 16, 2022

345ishaan commented Sep 19, 2022

unography commented Sep 19, 2022

patrickvonplaten commented Sep 22, 2022

345ishaan commented Sep 23, 2022

345ishaan commented Sep 26, 2022

pcuenca commented Sep 27, 2022

williamberman commented Sep 27, 2022

345ishaan commented Sep 28, 2022

williamberman commented Sep 28, 2022

345ishaan commented Sep 28, 2022

williamberman commented Sep 30, 2022

345ishaan commented Oct 1, 2022

williamberman commented Oct 1, 2022

Zeqiang-Lai commented Mar 2, 2023

williamberman commented Apr 26, 2023

williamberman commented Apr 26, 2023