Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[advanced dreambooth lora sdxl script]: cannot train --with_prior_preservation, shape mismatch #6967

Closed
To-jak opened this issue Feb 13, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@To-jak
Copy link

To-jak commented Feb 13, 2024

Describe the bug

I came across this while testing new features from #6691 (many thanks for supporting micro-conditioning!)

Using train_dreambooth_lora_sdxl_advanced.py --with_prior_preservation results in an invalid shape for prediction with the unet_added_conditions['time_ids'] tensor.

It may be related to the way the class_time_ids are computed.

Reproduction

Follow instructions from advanced_diffusion_training README:

  • Install from source
  • Download dataset for testing:
from huggingface_hub import snapshot_download

local_dir = "./3d_icon"
snapshot_download(
    "LinoyTsaban/3d_icon",
    local_dir=local_dir, repo_type="dataset",
    ignore_patterns=".gitattributes",
)

Execute training with prior preservation (see last arguments):

export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export DATASET_NAME="./3d_icon"
export OUTPUT_DIR="3d-icon-SDXL-LoRA"
export CLASS_DATA_DIR="./class_data_dir/icons"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"

accelerate launch train_dreambooth_lora_sdxl_advanced.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --dataset_name=$DATASET_NAME \
  --instance_prompt="3d icon in the style of ohwx" \
  --validation_prompt="a ohwx icon of an astronaut riding a horse, in the style of ohwx" \
  --output_dir=$OUTPUT_DIR \
  --caption_column="prompt" \
  --mixed_precision="bf16" \
  --resolution=1024 \
  --train_batch_size=1 \
  --repeats=1 \
  --gradient_accumulation_steps=1 \
  --gradient_checkpointing \
  --learning_rate=1.0 \
  --text_encoder_lr=1.0 \
  --optimizer="prodigy"\
  --train_text_encoder \
  --train_text_encoder_frac=0.5 \
  --snr_gamma=5.0 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --rank=8 \
  --max_train_steps=1000 \
  --checkpointing_steps=2000 \
  --seed="0" \
  --with_prior_preservation \
  --class_prompt="icon" \
  --class_data_dir=$CLASS_DATA_DIR \
  --num_class_images=5

Logs

02/13/2024 16:04:39 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: bf16

{'image_encoder', 'feature_extractor'} was not found in config. Values will be initialized to default values.
Loading pipeline components...:   0%|                                                                            | 0/7 [00:00<?, ?it/s]{'rescale_betas_zero_snr', 'sigma_max', 'timestep_type', 'sigma_min'} was not found in config. Values will be initialized to default values.
Loaded scheduler as EulerDiscreteScheduler from `scheduler` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
{'reverse_transformer_layers_per_block', 'attention_type', 'dropout'} was not found in config. Values will be initialized to default values.
Loaded unet as UNet2DConditionModel from `unet` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...:  29%|███████████████████▍                                                | 2/7 [00:04<00:11,  2.29s/it]Loaded text_encoder_2 as CLIPTextModelWithProjection from `text_encoder_2` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...:  43%|█████████████████████████████▏                                      | 3/7 [00:05<00:07,  1.89s/it]Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...:  57%|██████████████████████████████████████▊                             | 4/7 [00:06<00:03,  1.22s/it]Loaded tokenizer_2 as CLIPTokenizer from `tokenizer_2` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loaded vae as AutoencoderKL from `vae` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...:  86%|██████████████████████████████████████████████████████████▎         | 6/7 [00:06<00:00,  1.53it/s]Loaded text_encoder as CLIPTextModel from `text_encoder` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████| 7/7 [00:06<00:00,  1.06it/s]
02/13/2024 16:04:47 - INFO - __main__ - Number of class images to sample: 5.
Generating class images: 100%|███████████████████████████████████████████████████████████████████████████| 2/2 [00:27<00:00, 13.99s/it]
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'clip_sample_range', 'variance_type', 'rescale_betas_zero_snr', 'thresholding', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
{'reverse_transformer_layers_per_block', 'attention_type', 'dropout'} was not found in config. Values will be initialized to default values.
/home/thomas/code/temp/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py:1534: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(
02/13/2024 16:05:30 - WARNING - __main__ - Learning rates were provided both for the unet and the text encoder- e.g. text_encoder_lr: 1.0 and learning_rate: 1.0. When using prodigy only learning_rate is used as the initial learning rate.
Using decoupled weight decay
02/13/2024 16:05:30 - INFO - datasets - PyTorch version 2.2.0 available.
Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 50773.15it/s]
Generating train split: 22 examples [00:00, 2150.22 examples/s]
/home/thomas/code/temp/venv/lib/python3.10/site-packages/PIL/Image.py:3186: DecompressionBombWarning: Image size (122880000 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.
  warnings.warn(
02/13/2024 16:05:40 - INFO - __main__ - ***** Running training *****
02/13/2024 16:05:40 - INFO - __main__ -   Num examples = 22
02/13/2024 16:05:40 - INFO - __main__ -   Num batches each epoch = 22
02/13/2024 16:05:40 - INFO - __main__ -   Num Epochs = 46
02/13/2024 16:05:40 - INFO - __main__ -   Instantaneous batch size per device = 1
02/13/2024 16:05:40 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
02/13/2024 16:05:40 - INFO - __main__ -   Gradient Accumulation steps = 1
02/13/2024 16:05:40 - INFO - __main__ -   Total optimization steps = 1000
Steps:   0%|                                                                                                  | 0/1000 [00:00<?, ?it/s]/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
Traceback (most recent call last):
  File "/home/thomas/code/temp/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py", line 2196, in <module>
    main(args)
  File "/home/thomas/code/temp/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py", line 1872, in main
    model_pred = unet(
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 817, in forward
    return model_forward(*args, **kwargs)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 805, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/thomas/code/temp/diffusers/src/diffusers/models/unets/unet_2d_condition.py", line 1027, in forward
    aug_emb = self.add_embedding(add_embeds)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/thomas/code/temp/diffusers/src/diffusers/models/embeddings.py", line 228, in forward
    sample = self.linear_1(sample)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2048 and 2816x1280)
Steps:   0%|                                                                                                  | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/thomas/code/temp/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/home/thomas/code/temp/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/thomas/code/temp/venv/bin/python', 'train_dreambooth_lora_sdxl_advanced.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--dataset_name=./3d_icon', '--instance_prompt=3d icon in the style of ohwx', '--validation_prompt=a ohwx icon of an astronaut riding a horse, in the style of ohwx', '--output_dir=3d-icon-SDXL-LoRA', '--caption_column=prompt', '--mixed_precision=bf16', '--resolution=1024', '--train_batch_size=1', '--repeats=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--learning_rate=1.0', '--text_encoder_lr=1.0', '--optimizer=prodigy', '--train_text_encoder', '--train_text_encoder_frac=0.5', '--snr_gamma=5.0', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--rank=8', '--max_train_steps=1000', '--checkpointing_steps=2000', '--seed=0', '--with_prior_preservation', '--class_prompt=icon', '--class_data_dir=./class_data_dir/icons', '--num_class_images=5']' returned non-zero exit status 1.

System Info

  • Installed diffusers from source with advanced dreambooth lora sdxl script requirements.
  • Python 3.10.12

Who can help?

@linoytsaban It may have been introduced with your last PR? (Thanks again!)

@To-jak To-jak added the bug Something isn't working label Feb 13, 2024
@linoytsaban
Copy link
Collaborator

Hey @To-jak ! Thanks, yes I think it's related to the micro-conditioning, will check now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants