-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load safetensors directly to cuda #2445
Comments
Cc: @pcuenca |
Hey @Daniel-Kelvich, It should be possible since this PR: huggingface/accelerate#1028 Can you make sure to upgrade accelerate:
and then you can load the model directly on GPU with safetensors:
Can you check if this works? |
For me it either loads to cpu to uses Here is the libraries and a minimal code to reproduce.
|
Can you please provide a minimal working example of loading safetensors to gpu? |
Hey @Daniel-Kelvich, Can you make sure to have Once the weights are downloaded the following code snippet: import torch
from diffusers import StableDiffusionPipeline
import time
def _raise():
raise RuntimeError("I don't want to use pickle")
torch.load = lambda *args, **kwargs: _raise()
t1=time.time()
model_id = "dreamlike-art/dreamlike-diffusion-1.0"
pipe = StableDiffusionPipeline.from_pretrained(model_id, device_map='auto', torch_dtype=torch.float16)
print(f'{time.time()-t1} sec')
print(pipe.device) currently sadly yields an error:
However, this is fixed in: huggingface/accelerate#1151 Using this PR the above code runs as expected.
|
@patrickvonplaten Hi! I still get this error, even though I have updated the deps.
|
Hey @Daniel-Kelvich, Can you try again after running: pip uninstall accelerate
pip install git+https://github.com/huggingface/accelerate.git ? |
It seems to work now, but it is pretty slow. There's no point of loading a model to gpu directly. |
Yeah, I'm also not 100% sure in which use cases it improves performance |
As far as I know there is not way right now to load a model from a safetensors file directly to cuda. You always have to load it to cpu first. Safetensors library supports loading directly to cuda, so it shouldn't be hard to add this functionality to diffusers pipelines.
Interface may look like this (just specify device in init function)
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, device='cuda:0')
The text was updated successfully, but these errors were encountered: