Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoRA from train_dreambooth_lora_sdxl.py is not working in A1111 anymore #6894

Closed
patryk-bartkowiak-nitid opened this issue Feb 7, 2024 · 32 comments
Assignees
Labels
bug Something isn't working conversion script stale Issues that haven't received updates

Comments

@patryk-bartkowiak-nitid
Copy link

patryk-bartkowiak-nitid commented Feb 7, 2024

Describe the bug

I have been using train_dreambooth_lora_sdxl.py and convert_diffusers_sdxl_lora_to_webui.py to train LoRA for specific character, It was working till like a week ago. I am using the same baseline model and the same data.

I realized that previous size of all the LoRA files had 29967176 bytes, now it has 29889672 and less keys in dict after I load it as pure .safetensors file.

I realized that it works fine with inference guide in README:

import torch
from diffusers import DiffusionPipeline

pretrained_model = "./pretrained_models/dreamshaper-xl"
lora_weights = "./outputs/dreamshaper-xl_claire/checkpoint-2000/"

prompt = "photo of wff woman, sitting in train"
negative_prompt = "text, watermark, low quality, medium quality, blurry, censored, wrinkles, deformed, mutated text, watermark, low quality, medium quality, blurry, censored, wrinkles, deformed, mutated"

pipe = DiffusionPipeline.from_pretrained(pretrained_model, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.load_lora_weights(lora_weights)

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=50,
    seed=420,
).images[0]

image.save("lora_inference.png")

But after I convert and load to A1111 (it loads correctly) it doesnt work anymore, looks like its adding some noise to the output only.

I already tried checkpointing to previous commits on diffusers, torch and torchvision, but nothing really helps. I am still not able to use LoRA in A1111.

Reproduction

Code to train LoRA:

export MODEL_NAME="pretrained_models/dreamshaper-xl"
export INSTANCE_DIR="data/claire"
export MAX_TRAIN_STEPS=5000
export CHECKPOINTING_STEPS=500


export OUTPUT_DIR="outputs/$(basename ${MODEL_NAME})_$(basename ${INSTANCE_DIR})_tmp"
export CUDA_LAUNCH_BLOCKING=1
export TORCH_USE_CUDA_DSA=1

printf "\n\nTraining Claire model with $MODEL_NAME on $INSTANCE_DIR, saving to $OUTPUT_DIR\n\n"

accelerate launch diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py \
	--instance_prompt="photo of wff woman, isolated on white background" \
	--pretrained_model_name_or_path=$MODEL_NAME \
	--instance_data_dir=$INSTANCE_DIR \
	--output_dir=$OUTPUT_DIR \
	--resolution=1024 \
	--train_batch_size=2 \
	--gradient_accumulation_steps=4 \
	--learning_rate=1e-4 \
	--lr_scheduler="constant" \
	--lr_warmup_steps=0 \
	--max_train_steps=$MAX_TRAIN_STEPS \
	--seed="0" \
	--train_text_encoder \
	--enable_xformers_memory_efficient_attention \
	--gradient_checkpointing \
	--use_8bit_adam \
	--checkpointing_steps=$CHECKPOINTING_STEPS

Code to convert to A1111 format

python /project/diffusers/scripts/convert_diffusers_sdxl_lora_to_webui.py {input_path} {output_path}

Logs

Can't really post any errors, looks like typical image generation, no errors or warning during training and conversion

System Info

- `diffusers` version: 0.26.0.dev0
- Platform: Linux-5.15.0-92-generic-x86_64-with-glibc2.27
- Python version: 3.10.9
- PyTorch version (GPU?): 2.0.0 (True)
- Huggingface_hub version: 0.20.3
- Transformers version: 4.37.2
- Accelerate version: 0.26.1
- xFormers version: 0.0.19
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Who can help?

@yiyixuxu @sayakpaul @DN6 @patrickvonplaten

@patryk-bartkowiak-nitid patryk-bartkowiak-nitid added the bug Something isn't working label Feb 7, 2024
@sayakpaul
Copy link
Member

Thanks for the detailed thread. Can you pin me a version that was working as expected for you?

I am asking because none of those scripts went through significant logical changes in the past 7 days.

@patryk-bartkowiak-nitid
Copy link
Author

Yeah that's the thing, I am unable to restore the environment perfectly and I'm blocked right now, not sure where the issue is :/

@sayakpaul
Copy link
Member

Ah then it's a bit of a pity. In any case, please do ping me here if you're able to give me a pinpointed version. I am happy to look further from there :-)

@patryk-bartkowiak-nitid
Copy link
Author

Anyway going through README guide it's not working properly, I am happy to meet or whatever to solve this issue :)

@sayakpaul
Copy link
Member

README guide? Do you mean the commands from https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sdxl.md don't work? Can you provide a fully reproducible snippet for me?

I am happy to meet or whatever to solve this issue :)

Sorry, we cannot do that. As maintainers, we need to be cognizant of our time and keep the discussions as open as possible,

@patryk-bartkowiak-nitid
Copy link
Author

patryk-bartkowiak-nitid commented Feb 8, 2024

I mean command from
https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sdxl.md
combined with
https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_sdxl_lora_to_webui.py

Not sure on what part of the pipeline there is an issue, like I said I am able to use LoRA using code for inference that you provided in README, but can't correctly convert it. Might be both the conversion itself or LoRA has some different properties that conversion script can't handle.

Let me send you full pipeline for you to reproduce the issue, I will try to include as many details as possible:

  1. Create VM with this docker image: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel
  2. Install dependencies:
apt update
apt install vim git tmux ffmpeg libsm6 libxext6 wget python3 python3-venv libgl1 libglib2.0-0 google-perftools -y

git clone https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .
cd examples/dreambooth
pip install -r requirements.txt
accelerate config default
pip install bitsandbytes xformers==0.0.19
  1. Download baseline SDXL model:
wget https://civitai.com/api/download/models/333449 -O DreamShaperXL.safetensors
  1. Convert .safetensors to suitable format using python:
import diffusers
pipe = diffusers.StableDiffusionXLPipeline.from_single_file("DreamShaperXL.safetensors")
pipe.save_pretrained("DreamShaperXL")
  1. Train LoRA (6 images with the same woman on white background):
export MODEL_NAME="DreamShaperXL"
export INSTANCE_DIR="data/claire"
export MAX_TRAIN_STEPS=5000
export CHECKPOINTING_STEPS=500


export OUTPUT_DIR="outputs/$(basename ${MODEL_NAME})_$(basename ${INSTANCE_DIR})"
export CUDA_LAUNCH_BLOCKING=1
export TORCH_USE_CUDA_DSA=1

printf "\n\nTraining Claire model with $MODEL_NAME on $INSTANCE_DIR, saving to $OUTPUT_DIR\n\n"

accelerate launch diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py \
	--instance_prompt="photo of wff woman, isolated on white background" \
	--pretrained_model_name_or_path=$MODEL_NAME \
	--instance_data_dir=$INSTANCE_DIR \
	--output_dir=$OUTPUT_DIR \
	--resolution=1024 \
	--train_batch_size=2 \
	--gradient_accumulation_steps=4 \
	--learning_rate=1e-4 \
	--lr_scheduler="constant" \
	--lr_warmup_steps=0 \
	--max_train_steps=$MAX_TRAIN_STEPS \
	--seed="0" \
	--train_text_encoder \
	--enable_xformers_memory_efficient_attention \
	--gradient_checkpointing \
	--use_8bit_adam \
	--checkpointing_steps=$CHECKPOINTING_STEPS
  1. Convert to Kohya format:
python /diffusers/scripts/convert_diffusers_sdxl_lora_to_webui.py outputs/DreamShaperXL_claire/pytorch_lora_weights.safetensors test.safetensors
  1. Move to A1111:
mv test.safetensors stable-diffusion-webui/models/Lora/

@sayakpaul
Copy link
Member

As mentioned I need to know a version that was working as expected for you.

CC: @linoytsaban @apolinario here.

@patryk-bartkowiak-nitid
Copy link
Author

Well because I can't really provide it - can we just focus on the current version that is probably not working properly?

I was also considering A1111 to not work, but I am able to work with my previous LoRA's so I think it has to be something in this pipeline

@sayakpaul
Copy link
Member

sayakpaul commented Feb 8, 2024

That makes it thousand times more difficult for us to make progress here actually, hence I am a bit adamant on it. To be able to pinpoint the issue -- can we say the trained LoRA provides expected results when the inference is done from diffusers?

Your initial issue description suggests so. So, I quite suspect that it's the conversion script that's the culprit here.

@patryk-bartkowiak-nitid
Copy link
Author

Yes, LoRA provides expected results when the inference is done from diffusers.

When it's done in A1111 it actually changes the output image (same seed), but not in a way that it should, looks like its just adding some noise at the beginning of the generation process. I will send an example in 3 minutes

@sayakpaul
Copy link
Member

Then it's quite likely that the conversion script is the problem as mentioned. So, I will let @apolinario and @linoytsaban comment further (as they are the developers of that script).

@patryk-bartkowiak-nitid
Copy link
Author

A1111 Config:

photo of wff woman, rides gondola in Venice,
Negative prompt: text, watermark, low quality, medium quality, blurry, censored, wrinkles, deformed, mutated text, watermark, low quality, medium quality, blurry, censored, wrinkles, deformed, mutated, BadDream, UnrealisticDream
Steps: 7, Sampler: DPM++ SDE Karras, CFG scale: 2, Seed: 420, Size: 1024x1024, Model hash: 676f0d60c8, Model: DreamShaperXL, Version: v1.7.0

Image without any LoRA:
image
Image with previously trained LoRA that works - trained for 8000 iterations with batch_size=1:
image
Image with new LoRA - trained for 4000 iterations with batch_size=2:
image

@patryk-bartkowiak-nitid
Copy link
Author

Also adding an image generated locally with new LoRA that doesn't work in A1111 - trained for 4000 iterations with batch_size=2

Code to generate:

import torch
from diffusers import DiffusionPipeline

pretrained_model = "DreamShaperXL"
lora_weights = "./outputs/DreamShaperXL_claire/checkpoint-4000/"

prompt = "photo of wff woman, rides gondola in Venice,"
negative_prompt = "text, watermark, low quality, medium quality, blurry, censored, wrinkles, deformed, mutated text, watermark, low quality, medium quality, blurry, censored, wrinkles, deformed, mutated"

pipe = DiffusionPipeline.from_pretrained(pretrained_model, torch_dtype=torch.float32)
pipe = pipe.to("cuda")
pipe.load_lora_weights(lora_weights)

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=50,
    seed=420,
).images[0]

image.save("lora_inference.png")

Image:

image

Note

As you can see it's much closer, of course quality is not good enough because in AUTOMATIC1111 there are some additional things that make it look better like negative embeddings etc.

@patryk-bartkowiak-nitid
Copy link
Author

Update

I tried to load exact same model after the conversion in ComfyUI and it works properly, but I found this issue from a week ago: #6777

Do you think it's related? Did any of LoRA keys changed? Looks like A1111 do not support it yet

@sayakpaul
Copy link
Member

Could be related but the LoRA keys didn’t change. We have got multiple tests ensuring that.

@linoytsaban
Copy link
Collaborator

Hey @patryk-bartkowiak-nitid, thanks for creating this issue! Just to make sure I understand, right now comfyUI conversion works fine but A111 doesn't?

@patryk-bartkowiak-nitid
Copy link
Author

Hey @patryk-bartkowiak-nitid, thanks for creating this issue! Just to make sure I understand, right now comfyUI conversion works fine but A111 doesn't?

Exactly

@linoytsaban
Copy link
Collaborator

Hmm, I'm not sure what have caused this since we haven't made any changes to the conversion script, and the changes made to the training script should not affect that. @sayakpaul was there any change in the peft keys maybe that would make the conversion script incompatible?

@sayakpaul
Copy link
Member

No, I don’t think so. There were no changes to the training script or the underlying utils that would lead to key incompatibilities.

@patryk-bartkowiak-nitid
Copy link
Author

Could this have had an impact? #6895

@sayakpaul
Copy link
Member

Pretty sure not as it only touches the model card which has nothing to do with the state dict.

@patryk-bartkowiak-nitid
Copy link
Author

Any ideas @sayakpaul @linoytsaban ? Still trying to figure this out

@sayakpaul
Copy link
Member

Sorry but I don't work with A1111 or ComfyUI either. And I cannot offer any help related to conversion to non-diffusers formats right now.

@linoytsaban
Copy link
Collaborator

@patryk-bartkowiak-nitid can you check the state_dict of the previous Loras that worked fine on A1111 and the new ones and see if there are differences (assuming there are if it's incompatible) and what are they?

@patryk-bartkowiak-nitid
Copy link
Author

patryk-bartkowiak-nitid commented Feb 9, 2024

I compared converted .safetensors files and already worked on restoring the exact same structure, this is how I restored it so you can see the difference between them:

before = load_file("claire.safetensors")
after = load_file("test.safetensors")

for k in after.keys():
    v = after[k]

    del after[k]

    k = k.replace("lora.down", "lora_down")
    k = k.replace("lora.up", "lora_up")
    k = k.replace("to_k_lora", "to_k.lora")
    k = k.replace("_lora_down", ".lora_down")
    k = k.replace("_lora_up", ".lora_up")

    after[k] = v

for layer_name in [x for x in after.keys() if x.endswith("lora_up.weight")]:
    layer_name = layer_name.replace("lora_up.weight", "alpha")
    layer_name = layer_name.replace("_alpha", ".alpha")
    after[layer_name] = torch.tensor(4)

Now I got two .safetensors files with exact same keys and shapes, but different values in weights ofc

@patryk-bartkowiak-nitid
Copy link
Author

patryk-bartkowiak-nitid commented Feb 9, 2024

Before:

intersection = set(before.keys()) & set(after.keys())

len(before), len(after), len(intersection)
(2208, 1648, 528)

After

intersection = set(before.keys()) & set(after.keys())

len(before), len(after), len(intersection)
(2208, 2208, 2208)

@qwerdf4
Copy link

qwerdf4 commented Mar 3, 2024

I also encountered the same problem

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Mar 27, 2024
@yiyixuxu yiyixuxu removed the stale Issues that haven't received updates label Mar 27, 2024
@yiyixuxu
Copy link
Collaborator

@sayakpaul
is this the fix? #7435

@sayakpaul
Copy link
Member

Yeah could be.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Apr 21, 2024
@yiyixuxu
Copy link
Collaborator

assuming fixed in #7435
let us know if it is still an issue, and we will reopen this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working conversion script stale Issues that haven't received updates
Projects
None yet
Development

No branches or pull requests

5 participants