Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent Qwen2VL merge request (#35837) break compatibility with DeepSpeed #36187

Closed
ArdalanM opened this issue Feb 14, 2025 · 3 comments
Closed

Comments

@ArdalanM
Copy link
Contributor

ArdalanM commented Feb 14, 2025

The recent merge request (#35837) works with accelerate but breaks with DeepSpeed (w/ and w/o deepspeed config)

  • distributed_type: MULTI_GPU (work)
  • distributed_type: DEEPSPEED (no longer works)

To be more precise the issue lies in this section: https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py#L200

    emb = torch.cat((rotary_pos_emb, rotary_pos_emb), dim=-1)
    cos = emb.cos().float()
    sin = emb.sin().float()
else:
    cos, sin = position_embeddings
q, k = apply_rotary_pos_emb_flashatt(q.unsqueeze(0), k.unsqueeze(0), cos, sin)

cos, sin = position_embeddings these are not casted to float and are subject to various dtypes depending on the DeepSpeed and mixed_precision config.

This accelerate config works:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: MULTI_GPU
downcast_bf16: 'no'
enable_cpu_affinity:  #false
main_training_function: main
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
mixed_precision: bf16

This accelerate config no longer works:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: DEEPSPEED
deepspeed_config:
  zero_stage: 3
downcast_bf16: 'no'
enable_cpu_affinity: false
main_training_function: main
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
@ArvinZhuang
Copy link

same issue

@ArthurZucker
Copy link
Collaborator

Nice catch!

@zucchini-nlp
Copy link
Member

Resolved on main now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants