Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load safetensors directly to cuda #2445

Closed
Daniel-Kelvich opened this issue Feb 21, 2023 · 9 comments
Closed

Load safetensors directly to cuda #2445

Daniel-Kelvich opened this issue Feb 21, 2023 · 9 comments

Comments

@Daniel-Kelvich
Copy link

As far as I know there is not way right now to load a model from a safetensors file directly to cuda. You always have to load it to cpu first. Safetensors library supports loading directly to cuda, so it shouldn't be hard to add this functionality to diffusers pipelines.

Interface may look like this (just specify device in init function)
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, device='cuda:0')

@sayakpaul
Copy link
Member

Cc: @pcuenca

@patrickvonplaten
Copy link
Contributor

Hey @Daniel-Kelvich,

It should be possible since this PR: huggingface/accelerate#1028

Can you make sure to upgrade accelerate:

pip install --upgrade accelerate

and then you can load the model directly on GPU with safetensors:

pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

Can you check if this works?

@Daniel-Kelvich
Copy link
Author

Hi @patrickvonplaten,

For me it either loads to cpu to uses torch.load().

Here is the libraries and a minimal code to reproduce.

accelerate==0.16.0
diffusers==0.13.1
torch==1.13.1
import torch
from diffusers import StableDiffusionPipeline
import time

def _raise():
    raise RuntimeError("I don't want to use pickle")
torch.load = lambda *args, **kwargs: _raise()

t1=time.time()
model_id = "dreamlike-art/dreamlike-diffusion-1.0"
pipe = StableDiffusionPipeline.from_pretrained(model_id, device_map='auto', torch_dtype=torch.float16)
print(f'{time.time()-t1} sec')

print(pipe.device)

@Daniel-Kelvich
Copy link
Author

Can you please provide a minimal working example of loading safetensors to gpu?

@patrickvonplaten
Copy link
Contributor

Hey @Daniel-Kelvich,

Can you make sure to have safetensors installed as well? Note that if safetensors is not installed, the weights will automatically be loaded with torch.load.

Once the weights are downloaded the following code snippet:

import torch
from diffusers import StableDiffusionPipeline
import time

def _raise():
    raise RuntimeError("I don't want to use pickle")
torch.load = lambda *args, **kwargs: _raise()

t1=time.time()
model_id = "dreamlike-art/dreamlike-diffusion-1.0"
pipe = StableDiffusionPipeline.from_pretrained(model_id, device_map='auto', torch_dtype=torch.float16)
print(f'{time.time()-t1} sec')

print(pipe.device)

currently sadly yields an error:

│ /home/patrick_huggingface_co/python_bin/accelerate/utils/modeling.py:670 in load_state_dict      │
│                                                                                                  │
│   667 │   │   with safe_open(checkpoint_file, framework="pt") as f:                              │
│   668 │   │   │   metadata = f.metadata()                                                        │
│   669 │   │   │   weight_names = f.keys()                                                        │
│ ❱ 670 │   │   if metadata.get("format") not in ["pt", "tf", "flax"]:                             │
│   671 │   │   │   raise OSError(                                                                 │
│   672 │   │   │   │   f"The safetensors archive passed at {checkpoint_file} does not contain t   │
│   673 │   │   │   │   "you save your model with the `save_pretrained` method."                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'NoneType' object has no attribute 'get'

However, this is fixed in: huggingface/accelerate#1151

Using this PR the above code runs as expected.

4.564812898635864 sec
cuda:0

@Daniel-Kelvich
Copy link
Author

Daniel-Kelvich commented Mar 6, 2023

@patrickvonplaten Hi! I still get this error, even though I have updated the deps.

│   667 │   │   with safe_open(checkpoint_file, framework="pt") as f:                              │
│   668 │   │   │   metadata = f.metadata()                                                        │
│   669 │   │   │   weight_names = f.keys()                                                        │
│ ❱ 670 │   │   if metadata.get("format") not in ["pt", "tf", "flax"]:                             │
│   671 │   │   │   raise OSError(                                                                 │
│   672 │   │   │   │   f"The safetensors archive passed at {checkpoint_file} does not contain t   │
│   673 │   │   │   │   "you save your model with the `save_pretrained` method."  
diffusers==0.15.0.dev0
accelerate==0.17.0.dev0
safetensors==0.3.0

@patrickvonplaten
Copy link
Contributor

Hey @Daniel-Kelvich,

Can you try again after running:

pip uninstall accelerate
pip install git+https://github.com/huggingface/accelerate.git

?

@Daniel-Kelvich
Copy link
Author

It seems to work now, but it is pretty slow. There's no point of loading a model to gpu directly.

@patrickvonplaten
Copy link
Contributor

Yeah, I'm also not 100% sure in which use cases it improves performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants