Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug introduced in _load_state_dict_into_meta_model and to v4.49.0..v4.50.0.dev #36441

Closed
hlky opened this issue Feb 27, 2025 · 6 comments
Closed
Labels
Core: Modeling Internals of the library; Models.

Comments

@hlky
Copy link
Member

hlky commented Feb 27, 2025

Hi 🤗

Diffusers 🧨noticed some failing tests starting with v4.50.0.dev across several of our models that use transformers.

Test run #1, Test run #2

/opt/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
    return fn(*args, **kwargs)
src/diffusers/pipelines/pipeline_utils.py:952: in from_pretrained
    loaded_sub_model = load_sub_model(
src/diffusers/pipelines/pipeline_loading_utils.py:733: in load_sub_model
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:268: in _wrapper
    return func(*args, **kwargs)
/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:4406: in from_pretrained
    ) = cls._load_pretrained_model(
/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:4972: in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py:116: in decorate_context
    return func(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

model = StableDiffusionSafetyChecker(
  (vision_model): CLIPVisionModel(
    (vision_model): CLIPVisionTransformer(
      (emb...=1e-05, elementwise_affine=True)
    )
  )
  (visual_projection): Linear(in_features=32, out_features=64, bias=False)
)
state_dict = {'concept_embeds': tensor([[1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1...., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]), 'special_care_embeds_weights': tensor([1., 1., 1.]), ...}
start_prefix = ''
expected_keys = ['concept_embeds', 'special_care_embeds', 'concept_embeds_weights', 'special_care_embeds_weights', 'vision_model.vision_model.embeddings.class_embedding', 'vision_model.vision_model.embeddings.patch_embedding.weight', ...]
device_map = None, offload_folder = None, offload_index = None
state_dict_folder = None, state_dict_index = None, dtype = torch.float32
hf_quantizer = None, is_safetensors = False, keep_in_fp32_modules = None
unexpected_keys = [], pretrained_model_name_or_path = None, device_mesh = None
shard_file = '/github/home/.cache/huggingface/hub/models--hf-internal-testing--tinier-stable-diffusion-pipe/snapshots/5ed5ee78ee0b294cba6632344d00bd9535ed8ad1/safety_checker/model.safetensors'

    @torch.no_grad()
    def _load_state_dict_into_meta_model(
        model: torch.nn.Module,
        state_dict: Dict[str, torch.Tensor],
        start_prefix,
        expected_keys,
        device_map=None,
        offload_folder=None,
        offload_index=None,
        state_dict_folder=None,
        state_dict_index=None,
        dtype=None,
        hf_quantizer=None,
        is_safetensors=False,
        keep_in_fp32_modules=None,
        unexpected_keys=None,  # passing `unexpected` for cleanup from quantization items
        pretrained_model_name_or_path=None,  # for flagging the user when the model contains renamed keys
        device_mesh=None,
        shard_file=None,
    ):
        """
        This is somewhat similar to `_load_state_dict_into_model`, but deals with a model that has some or all of its
        params on a `meta` device. It replaces the model params with the data from the `state_dict`, while moving the
        params back to the normal device, but only for `loaded_state_dict_keys`.
    
        `start_prefix` is used for models which insert their name into model keys, e.g. `bert` in
        `bert.pooler.dense.weight`
    
        It also initialize tensor parallelism for each module if needed.
    
        """
        tensor_device = None
        if device_map is not None and device_map.get("", None) is not None:
            tensor_device = device_map[""].index if isinstance(device_map[""], torch.device) else device_map[""]
    
        with safe_open(shard_file, framework="pt", device=tensor_device) as file_pointer:
            error_msgs = []
    
            is_quantized = hf_quantizer is not None
    
            is_torch_e4m3fn_available = hasattr(torch, "float8_e4m3fn")
    
            # we need this later to initialize tensor parallelism
            if device_mesh is not None:
                full_tp_plan = model.config.base_model_tp_plan
                for submodule in model.modules():
                    full_tp_plan.update(getattr(submodule, "_tp_plan", {}))
    
            for serialized_param_name, empty_param in state_dict.items():
                # param_name is the raw, serialized name
                # new_param_name is the model's equivalent
                module_name, _ = model.rename_key(serialized_param_name)
                if module_name not in expected_keys:
                    continue
>               layer, param_type = module_name.rsplit(".", 1)
E               ValueError: not enough values to unpack (expected 2, got 1)
src/diffusers/pipelines/pipeline_utils.py:481: in to
    module.to(device, dtype)
/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:3216: in to
    return super().to(*args, **kwargs)
/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1343: in to
    return self._apply(convert)
/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:903: in _apply
    module._apply(fn)
/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:903: in _apply
    module._apply(fn)
/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:903: in _apply
    module._apply(fn)
/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:930: in _apply
    param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

t = Parameter containing:
tensor(..., device='meta', size=(1000, 32), requires_grad=True)

    def convert(t):
        try:
            if convert_to_format is not None and t.dim() in (4, 5):
                return t.to(
                    device,
                    dtype if t.is_floating_point() or t.is_complex() else None,
                    non_blocking,
                    memory_format=convert_to_format,
                )
            return t.to(
                device,
                dtype if t.is_floating_point() or t.is_complex() else None,
                non_blocking,
            )
        except NotImplementedError as e:
            if str(e) == "Cannot copy out of meta tensor; no data!":
>               raise NotImplementedError(
                    f"{e} Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() "
                    f"when moving module from meta to a different device."
                ) from None
E               NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
@rinabuoy
Copy link

rinabuoy commented Feb 27, 2025

`
from transformers import AutoModel, AutoProcessor

ckpt = "google/siglip2-base-patch16-224"

model = AutoModel.from_pretrained(ckpt, device_map="auto").eval()

`

Image

@DN6
Copy link
Contributor

DN6 commented Feb 27, 2025

Error seems to be from the change introduced in this PR: #36335
cc: @SunMarc

@ArthurZucker
Copy link
Collaborator

Thanks a lot for raising it so quickly! 🤗 cc @SunMarc will help you fix it 👀 deepspeed branch is also broken

@ArthurZucker ArthurZucker added the Core: Modeling Internals of the library; Models. label Feb 27, 2025
@ArthurZucker
Copy link
Collaborator

The first one:

>               layer, param_type = module_name.rsplit(".", 1)
E               ValueError: not enough values to unpack (expected 2, got 1)

should be quite easy to fix (just check if . in module name.
Second case I have no idea need to check but a branch that is wrong IMO

@SunMarc
Copy link
Member

SunMarc commented Feb 27, 2025

let me know if this is better with that #36453

@hlky
Copy link
Member Author

hlky commented Mar 1, 2025

Fixed by #36453

@hlky hlky closed this as completed Mar 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core: Modeling Internals of the library; Models.
Projects
None yet
Development

No branches or pull requests

5 participants