Bug introduced in `_load_state_dict_into_meta_model` and `to` `v4.49.0`..`v4.50.0.dev` #36441

hlky · 2025-02-27T07:22:42Z

Hi 🤗

Diffusers 🧨noticed some failing tests starting with v4.50.0.dev across several of our models that use transformers.

/opt/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
    return fn(*args, **kwargs)
src/diffusers/pipelines/pipeline_utils.py:952: in from_pretrained
    loaded_sub_model = load_sub_model(
src/diffusers/pipelines/pipeline_loading_utils.py:733: in load_sub_model
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:268: in _wrapper
    return func(*args, **kwargs)
/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:4406: in from_pretrained
    ) = cls._load_pretrained_model(
/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:4972: in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py:116: in decorate_context
    return func(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

model = StableDiffusionSafetyChecker(
  (vision_model): CLIPVisionModel(
    (vision_model): CLIPVisionTransformer(
      (emb...=1e-05, elementwise_affine=True)
    )
  )
  (visual_projection): Linear(in_features=32, out_features=64, bias=False)
)
state_dict = {'concept_embeds': tensor([[1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1...., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]), 'special_care_embeds_weights': tensor([1., 1., 1.]), ...}
start_prefix = ''
expected_keys = ['concept_embeds', 'special_care_embeds', 'concept_embeds_weights', 'special_care_embeds_weights', 'vision_model.vision_model.embeddings.class_embedding', 'vision_model.vision_model.embeddings.patch_embedding.weight', ...]
device_map = None, offload_folder = None, offload_index = None
state_dict_folder = None, state_dict_index = None, dtype = torch.float32
hf_quantizer = None, is_safetensors = False, keep_in_fp32_modules = None
unexpected_keys = [], pretrained_model_name_or_path = None, device_mesh = None
shard_file = '/github/home/.cache/huggingface/hub/models--hf-internal-testing--tinier-stable-diffusion-pipe/snapshots/5ed5ee78ee0b294cba6632344d00bd9535ed8ad1/safety_checker/model.safetensors'

    @torch.no_grad()
    def _load_state_dict_into_meta_model(
        model: torch.nn.Module,
        state_dict: Dict[str, torch.Tensor],
        start_prefix,
        expected_keys,
        device_map=None,
        offload_folder=None,
        offload_index=None,
        state_dict_folder=None,
        state_dict_index=None,
        dtype=None,
        hf_quantizer=None,
        is_safetensors=False,
        keep_in_fp32_modules=None,
        unexpected_keys=None,  # passing `unexpected` for cleanup from quantization items
        pretrained_model_name_or_path=None,  # for flagging the user when the model contains renamed keys
        device_mesh=None,
        shard_file=None,
    ):
        """
        This is somewhat similar to `_load_state_dict_into_model`, but deals with a model that has some or all of its
        params on a `meta` device. It replaces the model params with the data from the `state_dict`, while moving the
        params back to the normal device, but only for `loaded_state_dict_keys`.
    
        `start_prefix` is used for models which insert their name into model keys, e.g. `bert` in
        `bert.pooler.dense.weight`
    
        It also initialize tensor parallelism for each module if needed.
    
        """
        tensor_device = None
        if device_map is not None and device_map.get("", None) is not None:
            tensor_device = device_map[""].index if isinstance(device_map[""], torch.device) else device_map[""]
    
        with safe_open(shard_file, framework="pt", device=tensor_device) as file_pointer:
            error_msgs = []
    
            is_quantized = hf_quantizer is not None
    
            is_torch_e4m3fn_available = hasattr(torch, "float8_e4m3fn")
    
            # we need this later to initialize tensor parallelism
            if device_mesh is not None:
                full_tp_plan = model.config.base_model_tp_plan
                for submodule in model.modules():
                    full_tp_plan.update(getattr(submodule, "_tp_plan", {}))
    
            for serialized_param_name, empty_param in state_dict.items():
                # param_name is the raw, serialized name
                # new_param_name is the model's equivalent
                module_name, _ = model.rename_key(serialized_param_name)
                if module_name not in expected_keys:
                    continue
>               layer, param_type = module_name.rsplit(".", 1)
E               ValueError: not enough values to unpack (expected 2, got 1)

src/diffusers/pipelines/pipeline_utils.py:481: in to
    module.to(device, dtype)
/opt/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:3216: in to
    return super().to(*args, **kwargs)
/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1343: in to
    return self._apply(convert)
/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:903: in _apply
    module._apply(fn)
/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:903: in _apply
    module._apply(fn)
/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:903: in _apply
    module._apply(fn)
/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:930: in _apply
    param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

t = Parameter containing:
tensor(..., device='meta', size=(1000, 32), requires_grad=True)

    def convert(t):
        try:
            if convert_to_format is not None and t.dim() in (4, 5):
                return t.to(
                    device,
                    dtype if t.is_floating_point() or t.is_complex() else None,
                    non_blocking,
                    memory_format=convert_to_format,
                )
            return t.to(
                device,
                dtype if t.is_floating_point() or t.is_complex() else None,
                non_blocking,
            )
        except NotImplementedError as e:
            if str(e) == "Cannot copy out of meta tensor; no data!":
>               raise NotImplementedError(
                    f"{e} Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() "
                    f"when moving module from meta to a different device."
                ) from None
E               NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

The text was updated successfully, but these errors were encountered:

rinabuoy · 2025-02-27T07:44:54Z

`
from transformers import AutoModel, AutoProcessor

ckpt = "google/siglip2-base-patch16-224"

model = AutoModel.from_pretrained(ckpt, device_map="auto").eval()

`

DN6 · 2025-02-27T07:50:03Z

Error seems to be from the change introduced in this PR: #36335
cc: @SunMarc

ArthurZucker · 2025-02-27T08:21:08Z

Thanks a lot for raising it so quickly! 🤗 cc @SunMarc will help you fix it 👀 deepspeed branch is also broken

ArthurZucker · 2025-02-27T08:25:06Z

The first one:

>               layer, param_type = module_name.rsplit(".", 1)
E               ValueError: not enough values to unpack (expected 2, got 1)

should be quite easy to fix (just check if . in module name.
Second case I have no idea need to check but a branch that is wrong IMO

SunMarc · 2025-02-27T14:33:35Z

let me know if this is better with that #36453

hlky · 2025-03-01T13:09:34Z

Fixed by #36453

ArthurZucker added the Core: Modeling Internals of the library; Models. label Feb 27, 2025

muellerzr mentioned this issue Feb 27, 2025

Fix loading zero3 weights #36455

Merged

5 tasks

hlky closed this as completed Mar 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug introduced in `_load_state_dict_into_meta_model` and `to` `v4.49.0`..`v4.50.0.dev` #36441

Bug introduced in `_load_state_dict_into_meta_model` and `to` `v4.49.0`..`v4.50.0.dev` #36441

hlky commented Feb 27, 2025

rinabuoy commented Feb 27, 2025 •

edited

Loading

DN6 commented Feb 27, 2025

ArthurZucker commented Feb 27, 2025

ArthurZucker commented Feb 27, 2025

SunMarc commented Feb 27, 2025

hlky commented Mar 1, 2025

Bug introduced in _load_state_dict_into_meta_model and to v4.49.0..v4.50.0.dev #36441

Bug introduced in _load_state_dict_into_meta_model and to v4.49.0..v4.50.0.dev #36441

Comments

hlky commented Feb 27, 2025

rinabuoy commented Feb 27, 2025 • edited Loading

DN6 commented Feb 27, 2025

ArthurZucker commented Feb 27, 2025

ArthurZucker commented Feb 27, 2025

SunMarc commented Feb 27, 2025

hlky commented Mar 1, 2025

Bug introduced in `_load_state_dict_into_meta_model` and `to` `v4.49.0`..`v4.50.0.dev` #36441

Bug introduced in `_load_state_dict_into_meta_model` and `to` `v4.49.0`..`v4.50.0.dev` #36441

rinabuoy commented Feb 27, 2025 •

edited

Loading