-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: 'weight' must be 2-D while training Flan-T5 models with stage 3 #2746
Comments
Looks like same error popped up in diffusers using zero stage3 :) huggingface/diffusers#1865 |
Don't know if this helps, but I get the same 2-D error with stage3 in a weird way: I use the datasets map function with the method of a class that contains a SentenceTranformer model. Basically, I want to augment my dataset before training, and when used with deepspeed it gives the 2-D error in the sentence transformer that has nothing to do with the model I'm actually training. Stage 2 seems to work okay. |
Hello @smitanannaware, thank you for reporting. According to the documentation of HuggingFace, you need to pass your deepspeed config file to training_args = Seq2SeqTrainingArguments(
...
deepspeed="ds_config.json"
) I tried to train Flan-T5 using the code on this article. |
+1, getting same error |
@djaym7 Thank you for your report! Can you give us more details? Did you pass |
config is loaded from https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/configs/ds_flan_t5_z3_config.json training_args = TrainingArguments(
|
+1, getting same error |
@djaym7 @woodyx218 |
Error is when using PEFT with Flan.. |
Hi @djaym7 |
havent, using regular inference without deepspeed |
Hi @djaym7, I have tried to reproduce the problem using both deepspeed and PEFT (prefix tuning) but haven't seen the same error. I came across the error that you mentioned at huggingface/peft#168. I didn't see an error after making these changes. The versions of peft, transformers, deepspeed were:
|
There's no error in training, error is in inference .. add following after training and there'll be error for batch in tqdm(data_loader): |
Hi @djaym7, I added the folllowing code after device = torch.device("cuda")
loader = torch.utils.data.DataLoader(eval_dataset, batch_size=args.per_device_eval_batch_size, shuffle=False, collate_fn=data_collator)
for batch in loader:
with torch.no_grad():
outputs = model.generate(input_ids=batch['input_ids'].to(device),
attention_mask=batch['attention_mask'].to(device), max_new_tokens=128) # num_beams=8, early_stopping=True)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)) |
@djaym7 Can we clarify what errors you have now? I see several different errors regarding this issue.
It would be helpful if you could give us the entire reproducing code. |
The error you mentioned at huggingface/peft#168 is about both training and inference. Do you still have the errors? YES To reproduce, add the evaluate function shared above after training the model. Error is posted above as well. |
I am a bit confused. You wrote "There's no error in training, error is in inference" at #2746 (comment). Do you have an error with training or not? I wrote an example of training/generation using DS and PEFT. I didn't fully test it but at least it didn't throw the error. What is the difference with your code?
I think we need to make sure that we are doing the same for training/generation before further investigation. |
Closing because we have no additional information. |
Faced the same issue when run inference using Solution: use |
Facing same issue when using |
I had the same issue. Thankfully, it went away when I upgraded to Deepspeed 0.9.5. |
Facing the same issue with DeepSpeed 0.13.4. Training with PEFT: QLora + DeepSpeed Zero Stage 3, offload param and optimizer to CPU. Training is fine. After training, we File "/root/code_sft/sft_main.py", line 461, in main
test_pfm = evaluate(args, test_dataloader, model, mix_precision=mix_precision, tokenizer=tokenizer,
File "/root/code_sft/sft_main.py", line 223, in evaluate
generated_ids = module.generate(input_ids=feature["input_ids"],
File "/miniconda/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/miniconda/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 1474, in generate
File "/miniconda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/miniconda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/miniconda/envs/py310/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward
outputs = self.model(
File "/miniconda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/miniconda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/miniconda/envs/py310/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1027, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/miniconda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/miniconda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/miniconda/envs/py310/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/miniconda/envs/py310/lib/python3.10/site-packages/torch/nn/functional.py", line 2233, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D |
I am facing the same issue with the DPR and GPT2 models. I am using the latest torch version to use FullyShardedDataParallel for distributed training. The training works fine (regardless of the number of devices I use) |
have you solve this problem. It seems the solution mentioned doesn't work well for the issue |
I am using Huggingface Seq2SeqTrainer for training Flan-T5-xl model with deepspeed stage 3.
I am stuck on below error:
The code works with Zero2 config but not working with Zero 3. I have tried a couple of settings but no luck.
Any help would be appreciated.
The text was updated successfully, but these errors were encountered: