-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error building torch on clean docker compose --profile auto up --build
#420
Comments
let's see if this works EDIT: yeah... don't do this. that torch version is pinned for a reason. |
new error now after unpinning torch:
i think this means I'm missing my cuda drivers? |
confirmed... I didn't have my cuda stuff configured :/ For posterity:
nvidia-smi works properly, so does the hello-world nvidia docker container. still getting the same error :( |
deleted and rebuilt containers and images, still no luck.
|
tried sudo-ing the command, seems to have at least gotten past the previous error. believe the root of the problem is discussed here: NVIDIA/nvidia-container-toolkit#154 |
services build, getting an error when trying to run a test prompt with everything else set to defaults... webui-docker-auto-1 | Running on local URL: http://0.0.0.0:7860
webui-docker-auto-1 |
webui-docker-auto-1 | To create a public link, set `share=True` in `launch()`.
webui-docker-auto-1 | Startup time: 13.9s (import gradio: 0.8s, import ldm: 0.4s, other imports: 1.2s, load scripts: 0.2s, load SD checkpoint: 10.9s, create ui: 0.1s).
webui-docker-auto-1 | Error completing request
webui-docker-auto-1 | Arguments: ('task(td9v3amy7jrkdya)', 'a delicious cheeseburger', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, '', False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0) {}
webui-docker-auto-1 | Traceback (most recent call last):
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/call_queue.py", line 56, in f
webui-docker-auto-1 | res = list(func(*args, **kwargs))
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/call_queue.py", line 37, in f
webui-docker-auto-1 | res = func(*args, **kwargs)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
webui-docker-auto-1 | processed = process_images(p)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/processing.py", line 486, in process_images
webui-docker-auto-1 | res = process_images_inner(p)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/processing.py", line 625, in process_images_inner
webui-docker-auto-1 | uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/processing.py", line 570, in get_conds_with_caching
webui-docker-auto-1 | cache[1] = function(shared.sd_model, required_prompts, steps)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/prompt_parser.py", line 140, in get_learned_conditioning
webui-docker-auto-1 | conds = model.get_learned_conditioning(texts)
webui-docker-auto-1 | File "/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
webui-docker-auto-1 | c = self.cond_stage_model(c)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
webui-docker-auto-1 | return forward_call(*input, **kwargs)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/sd_hijack_clip.py", line 229, in forward
webui-docker-auto-1 | z = self.process_tokens(tokens, multipliers)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/sd_hijack_clip.py", line 254, in process_tokens
webui-docker-auto-1 | z = self.encode_with_transformers(tokens)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/sd_hijack_clip.py", line 302, in encode_with_transformers
webui-docker-auto-1 | outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1201, in _call_impl
webui-docker-auto-1 | result = hook(self, input)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/lowvram.py", line 35, in send_me_to_gpu
webui-docker-auto-1 | module.to(devices.device)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 989, in to
webui-docker-auto-1 | return self._apply(convert)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
webui-docker-auto-1 | module._apply(fn)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
webui-docker-auto-1 | module._apply(fn)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
webui-docker-auto-1 | module._apply(fn)
webui-docker-auto-1 | [Previous line repeated 2 more times]
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
webui-docker-auto-1 | param_applied = fn(param)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 987, in convert
webui-docker-auto-1 | return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
webui-docker-auto-1 | RuntimeError: CUDA error: unspecified launch failure
webui-docker-auto-1 | CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
webui-docker-auto-1 | For debugging consider passing CUDA_LAUNCH_BLOCKING=1. |
the first error that you got was just a timeout because of wonky internet connection, if you try building again it should be fixed (hopefully). Please keep pytorch pinned, otherwise you would get a lot of unexpected errors. The second error seems weird, what is the output of this command?
if you get the same error, then it is probably a problem with docker not being able to see your GPU. Make sure you have nvidia container toolkit installed and working. |
i think the issue might've been that i had nvidia-container-toolkit-base installed as well. I uninstalled both, reinstalled nvidia-container-toolkit, restarted, and i've got the test image generating successfully now. not sure if the issue was that package or that I just needed to restart. i'm only able to get docker to see my GPU when I run with sudo though which I'm not a huge fan of... anyway, looks like the issue was with me not realizing I'd skipped the pre-reqs on a too-fresh ubuntu re-install. |
I've just had this issue too on Ubuntu 23.04. I fixed it by re-installing nvidia-container-toolkit! |
Has this issue been opened before?
Describe the bug
first attempt at building. the
docker compose --profile download up --build
step worked fine. attempting to rundocker compose --profile auto up --build
resulted in the following error:Which UI
auto
Hardware / Software
The text was updated successfully, but these errors were encountered: