Issue running stable diffusion dreambooth on mac m3 max apple silicon. #7498

sagargulabani · 2024-03-27T18:28:39Z

Describe the bug

I am trying to run dreambooth stable diffusion on m3 max.
However I am running into an issue because of which whenever I am trying to generate the class images for the concepts, it fails.

Reproduction

To reproduce the errors, try to setup dreambooth extension of m3 max apple silicon.
Then try to generate class images. It will fail.

As per this issue, someone suggested us to open an issue in this respository.

Please help us. Thank you.

Logs

400 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/sagargulabani/dev/automatic1111/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/ui_functions.py", line 735, in start_training
    result = main(class_gen_method=class_gen_method)
  File "/Users/sagargulabani/dev/automatic1111/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/train_dreambooth.py", line 2003, in main
    return inner_loop()
  File "/Users/sagargulabani/dev/automatic1111/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/memory.py", line 126, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "/Users/sagargulabani/dev/automatic1111/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/train_dreambooth.py", line 380, in inner_loop
    count, instance_prompts, class_prompts = generate_classifiers(
  File "/Users/sagargulabani/dev/automatic1111/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/utils/gen_utils.py", line 211, in generate_classifiers
    new_images = builder.generate_images(prompts, pbar)
  File "/Users/sagargulabani/dev/automatic1111/stable-diffusion-webui/extensions/sd_dreambooth_extension/helpers/image_builder.py", line 235, in generate_images
    with self.accelerator.autocast(), torch.inference_mode():
  File "/opt/anaconda3/envs/automatic1111/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/opt/anaconda3/envs/automatic1111/lib/python3.10/site-packages/accelerate/accelerator.py", line 2907, in autocast
    autocast_context = get_mixed_precision_context_manager(self.native_amp, cache_enabled=cache_enabled)
  File "/opt/anaconda3/envs/automatic1111/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1372, in get_mixed_precision_context_manager
    return torch.autocast(device_type=state.device.type, dtype=torch.float16, cache_enabled=cache_enabled)
  File "/opt/anaconda3/envs/automatic1111/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 241, in __init__
    raise RuntimeError(
RuntimeError: User specified an unsupported autocast device_type 'mps'
Generating class images 0/1400::   0%|

System Info

Apple M3 Max 30 CPU 40 GPU, 16 inch, 48 GB of RAM.
Python version - 3.10.14
diffusers - 0.27.2
transformers - 4.30.2
torch - 2.1.0

Who can help?

@Sayakp

tolgacangoz · 2024-03-27T19:02:55Z

Hi @sagargulabani,
Isn't this issue related to Stable Diffusion web UI's sd_dreambooth_extension extension?
Did/Could you try diffusers's DreamBooth? Also, see mps related page.
But, I guess autocast is not supported yet in mps. They started a PR, but unfortunately, it seems that they abandoned 😞. Nevertheless, I guess there is an ongoing PR here that may be a solution.

sagargulabani · 2024-03-28T04:52:43Z

yes, that is true. The issue is related to the webui.
I tried running training dreambooth sdxl locally and I am running into the following error

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'thresholding', 'rescale_betas_zero_snr', 'clip_sample_range', 'variance_type', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
{'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'reverse_transformer_layers_per_block', 'dropout', 'attention_type'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
  File "/Users/sagargulabani/.cache/huggingface/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1964, in <module>
    main(args)
  File "/Users/sagargulabani/.cache/huggingface/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1167, in main
    unet_lora_config = LoraConfig(
TypeError: LoraConfig.__init__() got an unexpected keyword argument 'use_dora'
Traceback (most recent call last):
  File "/opt/anaconda3/envs/hf/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
    simple_launcher(args)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/anaconda3/envs/hf/bin/python', 'train_dreambooth_lora_sdxl.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--instance_data_dir=dog', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--output_dir=lora-trained-xl', '--mixed_precision=fp16', '--instance_prompt=a photo of sks dog', '--resolution=1024', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--learning_rate=1e-4', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=A photo of sks dog in a bucket', '--validation_epochs=25', '--seed=0', '--push_to_hub']' returned non-zero exit status 1.

My peft version is 0.7.0

and this is my command to run

export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="lora-trained-xl"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"


accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub

sagargulabani · 2024-03-28T04:56:43Z

I did run it by removing the dora flag from the script. [here] (

diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py

Line 1169 in 0cc5630

use_dora=args.use_dora,

) @linoytsaban

After that I ran into the following issue.

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'variance_type', 'dynamic_thresholding_ratio', 'thresholding', 'clip_sample_range', 'rescale_betas_zero_snr'} was not found in config. Values will be initialized to default values.
{'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'dropout', 'attention_type', 'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
  File "/Users/sagargulabani/.cache/huggingface/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1963, in <module>
    main(args)
  File "/Users/sagargulabani/.cache/huggingface/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1503, in main
    unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1263, in prepare
    result = tuple(
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1264, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1140, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1330, in prepare_model
    autocast_context = get_mixed_precision_context_manager(self.native_amp, self.autocast_handler)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1745, in get_mixed_precision_context_manager
    return torch.autocast(device_type=device_type, dtype=torch.float16, **autocast_kwargs)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 241, in __init__
    raise RuntimeError(
RuntimeError: User specified an unsupported autocast device_type 'mps'
Traceback (most recent call last):
  File "/opt/anaconda3/envs/hf/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
    simple_launcher(args)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/anaconda3/envs/hf/bin/python', 'train_dreambooth_lora_sdxl.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--instance_data_dir=dog', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--output_dir=lora-trained-xl', '--mixed_precision=fp16', '--instance_prompt=a photo of sks dog', '--resolution=1024', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--learning_rate=1e-4', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=A photo of sks dog in a bucket', '--validation_epochs=25', '--seed=0', '--push_to_hub']' returned non-zero exit status 1.

sayakpaul · 2024-03-28T08:26:15Z

You should remove "mixed_precision="fp16"" when using M3. Cc: @bghira

sayakpaul · 2024-03-28T08:27:09Z

And yes #7447 should be helpful.

sagargulabani · 2024-03-28T14:39:29Z

Hi @sayakpaul ,

I did remove that and run but it looks like the code gets stuck

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'rescale_betas_zero_snr', 'variance_type', 'dynamic_thresholding_ratio', 'thresholding', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
{'latents_std', 'latents_mean'} was not found in config. Values will be initialized to default values.
{'dropout', 'attention_type', 'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
03/28/2024 20:06:58 - INFO - __main__ - ***** Running training *****
03/28/2024 20:06:58 - INFO - __main__ -   Num examples = 5
03/28/2024 20:06:58 - INFO - __main__ -   Num batches each epoch = 5
03/28/2024 20:06:58 - INFO - __main__ -   Num Epochs = 250
03/28/2024 20:06:58 - INFO - __main__ -   Instantaneous batch size per device = 1
03/28/2024 20:06:58 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
03/28/2024 20:06:58 - INFO - __main__ -   Gradient Accumulation steps = 4
03/28/2024 20:06:58 - INFO - __main__ -   Total optimization steps = 500
Steps:   0%|                                                                                                                            | 0/500 [00:00<?, ?it/s]

Its not progressing beyond this.
I am using an m3 max with 48 GB of RAM.

also I had to remove use_dora flag

diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py

Line 1169 in 0cc5630

use_dora=args.use_dora,

from here to run the script.

sagargulabani · 2024-03-28T14:49:07Z

So I figure that is moving, but it is extremely extremely slow.

accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub
03/28/2024 20:06:39 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: mps

Mixed precision type: no

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'rescale_betas_zero_snr', 'variance_type', 'dynamic_thresholding_ratio', 'thresholding', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
{'latents_std', 'latents_mean'} was not found in config. Values will be initialized to default values.
{'dropout', 'attention_type', 'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
03/28/2024 20:06:58 - INFO - __main__ - ***** Running training *****
03/28/2024 20:06:58 - INFO - __main__ -   Num examples = 5
03/28/2024 20:06:58 - INFO - __main__ -   Num batches each epoch = 5
03/28/2024 20:06:58 - INFO - __main__ -   Num Epochs = 250
03/28/2024 20:06:58 - INFO - __main__ -   Instantaneous batch size per device = 1
03/28/2024 20:06:58 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
03/28/2024 20:06:58 - INFO - __main__ -   Gradient Accumulation steps = 4
03/28/2024 20:06:58 - INFO - __main__ -   Total optimization steps = 500
Steps:   0%|▏                                                                                       | 1/500 [10:04<83:43:30, 604.03s/it, loss=0.0871, lr=0.0001]

Any suggestions to make it faster.

bghira · 2024-03-28T16:54:36Z

are you on 14.4? i've been using pytorch 2.2 and i get about 10 seconds per step with 1 megapixel images on a M3 Max 128G. do you observe any memory / swap pressure?

bghira · 2024-03-28T17:00:19Z

also, in my environment, i've been running with --mixed_precision=fp16 but i'm not sure why that's erroring out for you the way it is.

the code only returns an error to the user when mixed_precision="bf16", informing them to use fp16 instead. the default is actually fp32, which seems to be in use here hence the extreme slowdown.

the goal should be to ensure that mixed_precision=fp16 works on mps.

the relevant section from the linked PR:

    # Some configurations require autocast to be disabled.
    enable_autocast = True
    if torch.backends.mps.is_available() or (
        accelerator.mixed_precision == "fp16" or accelerator.mixed_precision == "bf16"
    ):
        enable_autocast = False

disables autocast on MPS.

wasn't sure whether the initial report included that PR or not. if it didn't, could you re-attempt with --mixed_precision=fp16

sagargulabani · 2024-03-29T03:50:30Z

so this is the error I see when I run in it with mixed precision fp16

export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="lora-trained-xl"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"

accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision=fp16 \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub
/opt/anaconda3/envs/hf/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py:126: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn(
03/29/2024 09:18:34 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: mps

Mixed precision type: fp16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'thresholding', 'rescale_betas_zero_snr', 'variance_type', 'clip_sample_range', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
{'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
{'attention_type', 'dropout', 'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
  File "/Users/sagargulabani/dev/huggingface-transformers/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1985, in <module>
    main(args)
  File "/Users/sagargulabani/dev/huggingface-transformers/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py", line 1525, in main
    unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1263, in prepare
    result = tuple(
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1264, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1140, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/accelerator.py", line 1330, in prepare_model
    autocast_context = get_mixed_precision_context_manager(self.native_amp, self.autocast_handler)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1745, in get_mixed_precision_context_manager
    return torch.autocast(device_type=device_type, dtype=torch.float16, **autocast_kwargs)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 241, in __init__
    raise RuntimeError(
RuntimeError: User specified an unsupported autocast device_type 'mps'
Traceback (most recent call last):
  File "/opt/anaconda3/envs/hf/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
    simple_launcher(args)
  File "/opt/anaconda3/envs/hf/lib/python3.10/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/anaconda3/envs/hf/bin/python', 'train_dreambooth_lora_sdxl.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--instance_data_dir=dog', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--output_dir=lora-trained-xl', '--mixed_precision=fp16', '--instance_prompt=a photo of sks dog', '--resolution=1024', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--learning_rate=1e-4', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=A photo of sks dog in a bucket', '--validation_epochs=25', '--seed=0', '--push_to_hub']' returned non-zero exit status 1.

yes I am on Mac OS Sonoma 14.4, just upgraded it.

sagargulabani · 2024-03-29T03:53:43Z

When I run the code without mixed precision fp16, these are the screenshots of what I see in my activity monitor, in htop and asitop. I see that the GPU is not being utilized a lot.

bghira · 2024-03-29T03:57:45Z

are you running the latest main branch?

sagargulabani · 2024-03-29T04:19:44Z

yes, I took a pull yesterday.

I also took a pull right now - 34c90db (this is the commit)

did run pip install -e ..

and after that also getting the same error.

This is what my pip list command for diffusers shows.
diffusers 0.28.0.dev0 /Users/sagargulabani/dev/huggingface-transformers/diffusers

bghira · 2024-03-31T23:28:30Z

#7530 might fix this one @sagargulabani

sagargulabani · 2024-04-01T03:46:23Z

Hi @bghira
I checked out to this commit - bghira@ad3eb80

and tried to run the same command above with the same script - train_dreambooth_lora_sdxl.py
but still running into the same issue -

RuntimeError: User specified an unsupported autocast device_type 'mps'

bghira · 2024-04-01T11:54:29Z

@sagargulabani i've updated that script in particular for that PR. it now uses native_amp = False in the Accelerator config.

can you please re-run with that change? i will put it to the rest of the scripts after

akospalfi · 2024-04-01T15:56:15Z

@bghira I've been having the same problem as @sagargulabani and your new changes with explicity disabling native amp leads to a different error type:

loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/ce725a5f-c761-11ee-a4ec-b6ef2fd8d87b/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":233:0)): error: input types 'tensor<2x1280xf16>' and 'tensor<1280xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
Traceback (most recent call last):
  File "/Users/palfia/jax-metal/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
    simple_launcher(args)
  File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

edit: script parameters (pytorch 2.2.2)

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a sks dog" \
  --class_prompt="a dog" \
  --mixed_precision=fp16 \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=2e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=100 \
  --max_train_steps=800

bghira · 2024-04-01T16:14:52Z

was there more to the traceback before that one? that's the traceback from Accelerate, but the one from the trainer is needed to know where this error originated. i believe it's in log_validations where the dtypes change. this is something i saw when also updating to pytorch 2.2 latest.

i'm really hoping we don't have to run .to() on all of the embeds.

bghira · 2024-04-01T16:16:22Z

@sayakpaul i think i'm in a bit of a need of rescuing on this issue. do you have an ideas how to proceed? maybe a dummycast wrapper in train utils as i mentioned last week? the dtypes have to be the same everywhere for MPS.

akospalfi · 2024-04-01T16:20:00Z

was there more to the traceback before that one? that's the traceback from Accelerate, but the one from the trainer is needed to know where this error originated. i believe it's in log_validations where the dtypes change. this is something i saw when also updating to pytorch 2.2 latest.

i'm really hoping we don't have to run .to() on all of the embeds.

This is the full log, I can't see anything more useful:

/Users/palfia/jax-metal/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
/Users/palfia/jax-metal/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
/Users/palfia/jax-metal/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py:126: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn(
04/01/2024 17:50:18 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: mps

Mixed precision type: fp16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'dynamic_thresholding_ratio', 'clip_sample_range', 'variance_type', 'rescale_betas_zero_snr', 'sample_max_value', 'thresholding'} was not found in config. Values will be initialized to default values.
04/01/2024 17:50:20 - INFO - __main__ - ***** Running training *****
04/01/2024 17:50:20 - INFO - __main__ -   Num examples = 100
04/01/2024 17:50:20 - INFO - __main__ -   Num batches each epoch = 100
04/01/2024 17:50:20 - INFO - __main__ -   Num Epochs = 8
04/01/2024 17:50:20 - INFO - __main__ -   Instantaneous batch size per device = 1
04/01/2024 17:50:20 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
04/01/2024 17:50:20 - INFO - __main__ -   Gradient Accumulation steps = 1
04/01/2024 17:50:20 - INFO - __main__ -   Total optimization steps = 800
Steps:   0%|                                                                                                                                                                                  | 0/800 [00:00<?, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/ce725a5f-c761-11ee-a4ec-b6ef2fd8d87b/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":233:0)): error: input types 'tensor<2x1280xf16>' and 'tensor<1280xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
Traceback (most recent call last):
  File "/Users/palfia/jax-metal/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
    simple_launcher(args)
  File "/Users/palfia/jax-metal/lib/python3.9/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/Users/palfia/jax-metal/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=/Users/palfia/fun/converted_dreamshaper_v8', '--instance_data_dir=/Users/palfia/fun/train_db/J/instance_images/prepared', '--class_data_dir=/Users/palfia/fun/train_db/J/class_images', '--output_dir=/Users/palfia/fun/dreambooth_models', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=a sks dog', '--class_prompt=a dog', '--mixed_precision=fp16', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=2e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=100', '--max_train_steps=800']' died with <Signals.SIGABRT: 6>.

sagargulabani · 2024-04-01T17:55:15Z

Hi @bghira, I also see the same error as @akospalfi

bghira · 2024-04-01T19:18:15Z

i'm able to reproduce this one locally, but it's not clear why it's happening. the text encoder hidden states are fp16, the noisy inputs are fp16.

i can train locally on SimpleTuner, which handles dtypes differently, but it's not clear which difference is causing this problem.

sagargulabani · 2024-04-08T15:02:39Z

Hi @bghira @sayakpaul
Just following up on this one on how we could go about it.

bghira · 2024-04-08T15:24:42Z

it's been complicated to do in a non-invasive way for the diffusers project.

for now, i've been running dreambooth via simpletuner for the last few days successfully, introducing single subjects via these config values on pytorch 2.4 nightly.

github-actions · 2024-05-03T15:03:09Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

bghira · 2024-05-03T16:00:37Z

not stale, just waiting on some pytorch improvements

sagargulabani · 2024-07-02T04:17:04Z

we can close this now that pytorch supports mps ?

sayakpaul · 2024-07-02T04:17:50Z

Have you verified if it runs successfully?

sagargulabani · 2024-07-02T04:39:23Z

no I haven't verified it. Will verify and let you know.

bghira · 2024-07-02T11:24:51Z

well, no. it's not even in a release yet :-)

bghira · 2024-07-02T13:35:04Z

and it was now reverted out of pytorch/main due to regressions :[

github-actions · 2024-09-14T15:16:47Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2024-09-26T09:02:18Z

@sagargulabani does this work now?

github-actions · 2024-10-21T15:06:49Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2024-10-21T15:14:38Z

Closing due to inactivity.

sagargulabani added the bug Something isn't working label Mar 27, 2024

github-actions bot added the stale Issues that haven't received updates label May 3, 2024

yiyixuxu removed the stale Issues that haven't received updates label May 3, 2024

github-actions bot added the stale Issues that haven't received updates label Sep 14, 2024

github-actions bot removed the stale Issues that haven't received updates label Sep 26, 2024

github-actions bot added the stale Issues that haven't received updates label Oct 21, 2024

sayakpaul closed this as completed Oct 21, 2024

Issue running stable diffusion dreambooth on mac m3 max apple silicon. #7498

Issue running stable diffusion dreambooth on mac m3 max apple silicon. #7498

Comments

sagargulabani commented Mar 27, 2024

Describe the bug

Reproduction

Logs

System Info

Who can help?

tolgacangoz commented Mar 27, 2024 • edited Loading

sagargulabani commented Mar 28, 2024

sagargulabani commented Mar 28, 2024

sayakpaul commented Mar 28, 2024

sayakpaul commented Mar 28, 2024

sagargulabani commented Mar 28, 2024

sagargulabani commented Mar 28, 2024

bghira commented Mar 28, 2024

bghira commented Mar 28, 2024

sagargulabani commented Mar 29, 2024

sagargulabani commented Mar 29, 2024

bghira commented Mar 29, 2024

sagargulabani commented Mar 29, 2024

bghira commented Mar 31, 2024

sagargulabani commented Apr 1, 2024

bghira commented Apr 1, 2024

akospalfi commented Apr 1, 2024 • edited Loading

bghira commented Apr 1, 2024

bghira commented Apr 1, 2024

akospalfi commented Apr 1, 2024

sagargulabani commented Apr 1, 2024

bghira commented Apr 1, 2024

sagargulabani commented Apr 8, 2024

bghira commented Apr 8, 2024

github-actions bot commented May 3, 2024

bghira commented May 3, 2024

sagargulabani commented Jul 2, 2024

sayakpaul commented Jul 2, 2024

sagargulabani commented Jul 2, 2024

bghira commented Jul 2, 2024

bghira commented Jul 2, 2024

github-actions bot commented Sep 14, 2024

sayakpaul commented Sep 26, 2024

github-actions bot commented Oct 21, 2024

sayakpaul commented Oct 21, 2024

tolgacangoz commented Mar 27, 2024 •

edited

Loading

akospalfi commented Apr 1, 2024 •

edited

Loading