Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneTrainer with ZLUDA failing #16

Closed
Santodan opened this issue Jan 9, 2025 · 17 comments
Closed

OneTrainer with ZLUDA failing #16

Santodan opened this issue Jan 9, 2025 · 17 comments

Comments

@Santodan
Copy link

Santodan commented Jan 9, 2025

I was following your steps to install OneTrainer but I'm getting the following error when I try to do the training.

The only difference from my envrionment for the guide is that i have the latest HIP SDK since I've isntalled SD.Next and it was requiring v6.2+

My system
CPU: Ryzen 3600
GPU: AMD RX6800

Python:

C:\Users\danny>python --version
Python 3.10.6

C:\Users\danny>pip list
Package    Version
---------- -------
pip        22.2.1
setuptools 63.2.0

[notice] A new release of pip available: 22.2.1 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip
Clearing cache directory workspace-cache/run! You can disable this if you want to continue using the same cache.
TensorFlow installation not found - running with reduced feature set.
model.safetensors:  13%|███████▌                                                   | 62.9M/492M [00:00<00:06, 65.0MB/s]Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.17.0 at http://localhost:6006/ (Press CTRL+C to quit)
model.safetensors: 100%|████████████████████████████████████████████████████████████| 492M/492M [00:07<00:00, 65.1MB/s]
model.safetensors: 100%|██████████████████████████████████████████████████████████| 2.78G/2.78G [00:43<00:00, 64.0MB/s]
diffusion_pytorch_model.safetensors: 100%|██████████████████████████████████████████| 335M/335M [00:05<00:00, 65.3MB/s]
diffusion_pytorch_model.safetensors: 100%|████████████████████████████████████████| 10.3G/10.3G [02:40<00:00, 63.9MB/s]
Could not enable memory efficient attention. Make sure xformers is installed correctly and a GPU is available: CUDA error: named symbol not found
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

enumerating sample paths: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 212.84it/s]
caching: 100%|█████████████████████████████████████████████████████████████████████████| 58/58 [01:54<00:00,  1.98s/it]
caching: 100%|█████████████████████████████████████████████████████████████████████████| 58/58 [00:03<00:00, 16.22it/s]
sampling:   0%|                                                                                 | 0/20 [00:00<?, ?it/s]
Traceback (most recent call last):                                                              | 0/29 [00:00<?, ?it/s]
  File "D:\AI Generated\OneTrainer\modules\trainer\GenericTrainer.py", line 249, in __sample_loop 0/20 [00:00<?, ?it/s]
    self.model_sampler.sample(
  File "D:\AI Generated\OneTrainer\modules\modelSampler\StableDiffusionXLSampler.py", line 634, in sample
    image = self.__sample_base(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\modules\modelSampler\StableDiffusionXLSampler.py", line 232, in __sample_base
    noise_pred = unet(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\unets\unet_2d_condition.py", line 1209, in forward
    sample, res_samples = downsample_block(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\unets\unet_2d_blocks.py", line 1288, in forward
    hidden_states = attn(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_2d.py", line 442, in forward
    hidden_states = block(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\modules\modelSetup\stableDiffusion\checkpointing_util.py", line 42, in forward
    return custom_forward(None, *args, **kwargs)
  File "D:\AI Generated\OneTrainer\modules\modelSetup\stableDiffusion\checkpointing_util.py", line 21, in custom_forward
    return orig_forward(
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention.py", line 453, in forward
    attn_output = self.attn1(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention_processor.py", line 454, in forward
    return self.processor(
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention_processor.py", line 1279, in __call__
    hidden_states = xformers.ops.memory_efficient_attention(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 276, in memory_efficient_attention
    return _memory_efficient_attention(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 395, in _memory_efficient_attention
    return _memory_efficient_attention_forward(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 418, in _memory_efficient_attention_forward
    out, *_ = op.apply(inp, needs_gradient=False)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\cutlass.py", line 217, in apply
    return cls.apply_bmhk(inp, needs_gradient=needs_gradient)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\cutlass.py", line 281, in apply_bmhk
    out, lse, rng_seed, rng_offset, _, _ = cls.OPERATOR(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_ops.py", line 854, in __call__
    return self_._op(*args, **(kwargs or {}))
RuntimeError: CUDA error: named symbol not found
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Error during sampling, proceeding without sampling
step:   0%|                                                                                     | 0/29 [00:03<?, ?it/s]
epoch:   0%|                                                                                     | 0/5 [02:05<?, ?it/s]
Traceback (most recent call last):
  File "D:\AI Generated\OneTrainer\modules\ui\TrainUI.py", line 544, in __training_thread_function
    trainer.train()
  File "D:\AI Generated\OneTrainer\modules\trainer\GenericTrainer.py", line 576, in train
    model_output_data = self.model_setup.predict(self.model, batch, self.config, train_progress)
  File "D:\AI Generated\OneTrainer\modules\modelSetup\BaseStableDiffusionXLSetup.py", line 465, in predict
    predicted_latent_noise = model.unet(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\unets\unet_2d_condition.py", line 1209, in forward
    sample, res_samples = downsample_block(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\unets\unet_2d_blocks.py", line 1278, in forward
    hidden_states = attn(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_2d.py", line 430, in forward
    hidden_states = torch.utils.checkpoint.checkpoint(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_dynamo\eval_frame.py", line 451, in _fn
    return fn(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_dynamo\external_utils.py", line 36, in inner
    return fn(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\utils\checkpoint.py", line 494, in checkpoint
    ret = function(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_2d.py", line 425, in custom_forward
    return module(*inputs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\modules\modelSetup\stableDiffusion\checkpointing_util.py", line 34, in forward
    return checkpoint(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_dynamo\eval_frame.py", line 451, in _fn
    return fn(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_dynamo\external_utils.py", line 36, in inner
    return fn(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\utils\checkpoint.py", line 494, in checkpoint
    ret = function(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\modules\modelSetup\stableDiffusion\checkpointing_util.py", line 21, in custom_forward
    return orig_forward(
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention.py", line 453, in forward
    attn_output = self.attn1(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention_processor.py", line 454, in forward
    return self.processor(
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention_processor.py", line 1279, in __call__
    hidden_states = xformers.ops.memory_efficient_attention(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 276, in memory_efficient_attention
    return _memory_efficient_attention(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 403, in _memory_efficient_attention
    return _fMHA.apply(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\autograd\function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 74, in forward
    out, op_ctx = _memory_efficient_attention_forward_requires_grad(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 431, in _memory_efficient_attention_forward_requires_grad
    out = op.apply(inp, needs_gradient=True)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\cutlass.py", line 217, in apply
    return cls.apply_bmhk(inp, needs_gradient=needs_gradient)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\cutlass.py", line 281, in apply_bmhk
    out, lse, rng_seed, rng_offset, _, _ = cls.OPERATOR(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_ops.py", line 854, in __call__
    return self_._op(*args, **(kwargs or {}))
RuntimeError: CUDA error: named symbol not found
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
@Santodan
Copy link
Author

Santodan commented Jan 9, 2025

tried with zluda-hipblas and got he same error

D:\AI Generated\OneTrainer>start-ui.bat
activating venv D:\AI Generated\OneTrainer\venv
Using Python "D:\AI Generated\OneTrainer\venv\Scripts\python.exe"
Failed to load ZLUDA: Could not find module 'D:\AI Generated\OneTrainer\.zluda\nvrtc64_112_0.dll' (or one of its dependencies). Try using the full path with constructor syntax.
D:\AI Generated\OneTrainer\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
Clearing cache directory workspace-cache/run! You can disable this if you want to continue using the same cache.
tokenizer/tokenizer_config.json: 100%|████████████████████████████████████████████████| 737/737 [00:00<00:00, 1.41MB/s]
tokenizer/vocab.json:   0%|                                                                | 0.00/1.06M [00:00<?, ?B/s]TensorFlow installation not found - running with reduced feature set.
tokenizer/vocab.json: 100%|███████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 3.11MB/s]
tokenizer/merges.txt: 100%|█████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 19.5MB/s]
tokenizer/special_tokens_map.json: 100%|██████████████████████████████████████████████████████| 472/472 [00:00<?, ?B/s]
tokenizer_2/tokenizer_config.json: 100%|██████████████████████████████████████████████████████| 725/725 [00:00<?, ?B/s]
tokenizer_2/special_tokens_map.json: 100%|█████████████████████████████████████████████| 460/460 [00:00<00:00, 901kB/s]
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.17.0 at http://localhost:6006/ (Press CTRL+C to quit)
scheduler/scheduler_config.json: 100%|████████████████████████████████████████████████████████| 479/479 [00:00<?, ?B/s]
text_encoder/config.json: 100%|███████████████████████████████████████████████████████████████| 565/565 [00:00<?, ?B/s]
model.safetensors: 100%|████████████████████████████████████████████████████████████| 492M/492M [00:07<00:00, 63.7MB/s]
text_encoder_2/config.json: 100%|█████████████████████████████████████████████████████| 575/575 [00:00<00:00, 1.12MB/s]
model.safetensors: 100%|██████████████████████████████████████████████████████████| 2.78G/2.78G [00:43<00:00, 64.0MB/s]
vae/config.json: 100%|████████████████████████████████████████████████████████████████████████| 642/642 [00:00<?, ?B/s]
diffusion_pytorch_model.safetensors: 100%|██████████████████████████████████████████| 335M/335M [00:05<00:00, 62.1MB/s]
unet/config.json: 100%|███████████████████████████████████████████████████████████████████| 1.68k/1.68k [00:00<?, ?B/s]
diffusion_pytorch_model.safetensors: 100%|████████████████████████████████████████| 10.3G/10.3G [02:40<00:00, 64.1MB/s]
Could not enable memory efficient attention. Make sure xformers is installed correctly and a GPU is available: CUDA error: named symbol not found
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

enumerating sample paths: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 191.44it/s]
enumerating sample paths:   0%|                                                                  | 0/1 [00:00<?, ?it/s]D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention_processor.py:1476: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  hidden_states = F.scaled_dot_product_attention(
caching: 100%|█████████████████████████████████████████████████████████████████████████| 58/58 [00:49<00:00,  1.16it/s]
caching: 100%|█████████████████████████████████████████████████████████████████████████| 58/58 [00:05<00:00, 10.91it/s]
sampling:   0%|                                                                                 | 0/20 [00:00<?, ?it/s]
Traceback (most recent call last):                                                              | 0/29 [00:00<?, ?it/s]
  File "D:\AI Generated\OneTrainer\modules\trainer\GenericTrainer.py", line 249, in __sample_loop 0/20 [00:00<?, ?it/s]
    self.model_sampler.sample(
  File "D:\AI Generated\OneTrainer\modules\modelSampler\StableDiffusionXLSampler.py", line 634, in sample
    image = self.__sample_base(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\modules\modelSampler\StableDiffusionXLSampler.py", line 232, in __sample_base
    noise_pred = unet(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\unets\unet_2d_condition.py", line 1209, in forward
    sample, res_samples = downsample_block(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\unets\unet_2d_blocks.py", line 1288, in forward
    hidden_states = attn(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_2d.py", line 442, in forward
    hidden_states = block(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\modules\modelSetup\stableDiffusion\checkpointing_util.py", line 42, in forward
    return custom_forward(None, *args, **kwargs)
  File "D:\AI Generated\OneTrainer\modules\modelSetup\stableDiffusion\checkpointing_util.py", line 21, in custom_forward
    return orig_forward(
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention.py", line 453, in forward
    attn_output = self.attn1(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention_processor.py", line 454, in forward
    return self.processor(
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention_processor.py", line 1279, in __call__
    hidden_states = xformers.ops.memory_efficient_attention(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 276, in memory_efficient_attention
    return _memory_efficient_attention(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 395, in _memory_efficient_attention
    return _memory_efficient_attention_forward(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 418, in _memory_efficient_attention_forward
    out, *_ = op.apply(inp, needs_gradient=False)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\cutlass.py", line 217, in apply
    return cls.apply_bmhk(inp, needs_gradient=needs_gradient)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\cutlass.py", line 281, in apply_bmhk
    out, lse, rng_seed, rng_offset, _, _ = cls.OPERATOR(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_ops.py", line 854, in __call__
    return self_._op(*args, **(kwargs or {}))
RuntimeError: CUDA error: named symbol not found
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Error during sampling, proceeding without sampling
step:   0%|                                                                                     | 0/29 [00:08<?, ?it/s]
epoch:   0%|                                                                                     | 0/5 [01:06<?, ?it/s]
Traceback (most recent call last):
  File "D:\AI Generated\OneTrainer\modules\ui\TrainUI.py", line 544, in __training_thread_function
    trainer.train()
  File "D:\AI Generated\OneTrainer\modules\trainer\GenericTrainer.py", line 576, in train
    model_output_data = self.model_setup.predict(self.model, batch, self.config, train_progress)
  File "D:\AI Generated\OneTrainer\modules\modelSetup\BaseStableDiffusionXLSetup.py", line 465, in predict
    predicted_latent_noise = model.unet(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\unets\unet_2d_condition.py", line 1209, in forward
    sample, res_samples = downsample_block(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\unets\unet_2d_blocks.py", line 1278, in forward
    hidden_states = attn(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_2d.py", line 430, in forward
    hidden_states = torch.utils.checkpoint.checkpoint(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_dynamo\eval_frame.py", line 451, in _fn
    return fn(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_dynamo\external_utils.py", line 36, in inner
    return fn(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\utils\checkpoint.py", line 494, in checkpoint
    ret = function(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_2d.py", line 425, in custom_forward
    return module(*inputs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\modules\modelSetup\stableDiffusion\checkpointing_util.py", line 34, in forward
    return checkpoint(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_dynamo\eval_frame.py", line 451, in _fn
    return fn(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_dynamo\external_utils.py", line 36, in inner
    return fn(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\utils\checkpoint.py", line 494, in checkpoint
    ret = function(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\modules\modelSetup\stableDiffusion\checkpointing_util.py", line 21, in custom_forward
    return orig_forward(
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention.py", line 453, in forward
    attn_output = self.attn1(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention_processor.py", line 454, in forward
    return self.processor(
  File "D:\AI Generated\OneTrainer\venv\src\diffusers\src\diffusers\models\attention_processor.py", line 1279, in __call__
    hidden_states = xformers.ops.memory_efficient_attention(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 276, in memory_efficient_attention
    return _memory_efficient_attention(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 403, in _memory_efficient_attention
    return _fMHA.apply(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\autograd\function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 74, in forward
    out, op_ctx = _memory_efficient_attention_forward_requires_grad(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 431, in _memory_efficient_attention_forward_requires_grad
    out = op.apply(inp, needs_gradient=True)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\cutlass.py", line 217, in apply
    return cls.apply_bmhk(inp, needs_gradient=needs_gradient)
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\xformers\ops\fmha\cutlass.py", line 281, in apply_bmhk
    out, lse, rng_seed, rng_offset, _, _ = cls.OPERATOR(
  File "D:\AI Generated\OneTrainer\venv\lib\site-packages\torch\_ops.py", line 854, in __call__
    return self_._op(*args, **(kwargs or {}))
RuntimeError: CUDA error: named symbol not found
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

@CS1o
Copy link
Owner

CS1o commented Jan 10, 2025

Hey, first update your Python to 3.10.11 64bit
Then delete the venv folder and reinstall OneTrainer. Then replace the zluda files again.
Also what did you tried to train? With which settings?

@Santodan
Copy link
Author

Can i install that python version and make OneTrainer using it? as that is the one recommended for SD.Next with AMD GPU, didn't wanted to mess it up
I've tried with this base model - https://civitai.com/models/1052485?modelVersionId=1180973 - which seems to be illustrious and there is no option for that, so I used SD 1.5 and since it was only a test, i tried with 5 epoch
Here the full config.json
config.json

@CS1o
Copy link
Owner

CS1o commented Jan 11, 2025

Yep you can safely install and overwrite Python with 3.10.11 64bit as its only an upgrade.
For SDnext and other webuis you have to delete the venv folder and relaunch to make it work with the new Python.

For Illust you have to select SDXL lora on the top left in OneTrainer as Illust models are based on that.

What is important for AMD is in the Training tab to set the Attention to SDP instead of xformers as xformers only works with Nvidia GPUs.

Then it should work.
Edit: Testing it right now and it seems to work.
Reduse the Batch Sise to 2 to not get an OOM error.
It uses 19.3GB vram with Batch Size 4 on a 7900XTX.

@Santodan
Copy link
Author

Santodan commented Jan 12, 2025

Yeah, it seems that i had the xformer on, thought that zluda would bypass that. it worked now.
thanks.

BTW, what would be the recommended setting for a training?

@devtobi
Copy link

devtobi commented Jan 14, 2025

I also have problems installing OneTrainer with the current guide. When starting I get the error

Failed to load ZLUDA: Could not find module '...\OneTrainer\.zluda\nvrtc64_112_0.dll' (or one of its dependencies). Try using the full path with constructor syntax.

It looks like this is due an old version of ZLUDA which gets installed via the install_zluda.py script.
@CS1o Any idea what I could do in this case?

Running latest ROCM 6.2 on Windows 11 with an 7900XTX. I have ZLUDA 3.8.6 installed, however its for sure not picket up by OneTrainer.

I tried running zluda.exe --version using the downloaded zluda in OneTrainer and this parameter is unknown, so it looks like this is indeed quite an old version of zluda?

@CS1o
Copy link
Owner

CS1o commented Jan 14, 2025

The OneTrainer Guide needs to be updated to include the new instructions for HIP SDK 6.2
You just need to replace the three renamed zluda files in the torch folder with the one from zluda 3.8.6 (Step 5 of the Zluda part)
Edit: Updated the Guide.

@devtobi
Copy link

devtobi commented Jan 14, 2025

Yeah I renamed and moved those 2 files as described, but I still get the error mentioned error when starting. Can you check if your log contains that message as well? Maybe its just a false positive and everything else still works?

@CS1o
Copy link
Owner

CS1o commented Jan 15, 2025

Can confirm the same error. As the .zluda files contains the old Zluda files.
But in the guide we replace the ones in torch directly so everything works.
Edit: Updated the Guide that now the nvrtc64_112_0.dll is also needed to be put into the torch/lib folder.

@Santodan
Copy link
Author

yeah, that's why it was working for my, since I followed the guide for SD.Next with HIP SDK 6.2, so i got ZLUDA for that version.

@devtobi
Copy link

devtobi commented Jan 15, 2025

So you are basically saying we should ignore the ZLUDA error at startup?

Can confirm the same error. As the .zluda files contains the old Zluda files. But in the guide we replace the ones in torch directly so everything works. Edit: Updated the Guide that now the nvrtc64_112_0.dll is also needed to be put into the torch/lib folder.

Or would it be a solution to replace the whole .zluda folder with a more current zluda version? This way we shouldn't have to replace in torch directly, right?

@CS1o
Copy link
Owner

CS1o commented Jan 17, 2025

So you are basically saying we should ignore the ZLUDA error at startup?

Yep can be ignored as we replaced the Files in torch/lib

Or would it be a solution to replace the whole .zluda folder with a more current zluda version? This way we shouldn't have to replace in torch directly, right?

Yes, the error goes away when replacing the whole zluda files of the .zluda folder with the files of C:/ZLUDA.

Gonna Update the Guide so the Users dont think its broken when launching.

@CS1o
Copy link
Owner

CS1o commented Jan 17, 2025

As the .zluda Folder of OneTrainer also includes the three important already renamed .dll files a normal copy of the Zluda files wont work. The rename Step of these three .dlls is still needed and then everything can be copied to .zluda.

Tested and updated the Guide.

@CS1o CS1o closed this as completed Jan 17, 2025
@CS1o
Copy link
Owner

CS1o commented Jan 23, 2025

@Santodan @devtobi
Updated the OneTrainer Guide today with a new Repo link from lshqqtiger. A reinstall is needed to get the latest features like Flux lora training.

@devtobi
Copy link

devtobi commented Jan 24, 2025

Thanks, will take a look at this on weekend. 👍

@devtobi
Copy link

devtobi commented Jan 25, 2025

@CS1o
Do we still need to replace the .dll files manually when using ishqqtiger fork? And do you need to run pip uninstall onnxruntime-gpu to get Auto Image Captioning working. Its a bit unclear in your guide in my opinion.

Btw whats the difference now with using ishqqqtiger fork compared to the previous one?

@CS1o
Copy link
Owner

CS1o commented Jan 26, 2025

@devtobi
Updated the Guide.
lshqtigers fork has a better and more complete rocm detection and support and some fixes like the onnxruntime already implemented for image captioning.
The Zluda files only need to be replaced for people with a GPU below a RX6800 but it doesnt hurt to replace them as that makes sure they use the same version as for for their webui.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants