-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with SFT of LLaVA-Next #1785
Comments
Hi, sorry for the delay. Can you double-check the command. When I run it, I get
Also share the versions of try, transformers and torch please |
I have double checked the command, code and output. Versions are as follows:
|
@qgallouedec any update on this? I created a new environment with the latest version of try and transformers but still facing the same issue. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
@GohioAC were you able to resolve it? I got a similar issue. |
Make sure to update to the dev version of transformers:
|
thanks, I updated to the latest dev version of transforms but still ran into the issue. Could you shed more light why it occurs? I am trying to reproduce this notebook to finetune LLaVA-Next-Video for context (https://colab.research.google.com/drive/1dTdro-k7NFqRgGq5-TlGHM-6k2sYQhXp?usp=sharing#scrollTo=5da82ca2-e1db-4a7c-878f-aeed972ba9e6) @qgallouedec |
Unfortunately, video aren't supported yet |
@qgallouedec Hmmm, I believe this tutorial notebook (https://colab.research.google.com/drive/1dTdro-k7NFqRgGq5-TlGHM-6k2sYQhXp?usp=sharing#scrollTo=5da82ca2-e1db-4a7c-878f-aeed972ba9e6) on LLaVA-Next-Video finetuning was made by a member of the LLaVA Hugging Face team member, Raushan Turganbay... not sure if you know. Thanks |
Thank for the info. cc @zucchini-nlp. |
Thanks it's not, i didn't know this is specific to TRL. @qgallouedec @zuchini-nlp any help on this error for notebook would be much appreciated! @zucchini-nlp any pointers on the library version used would be helpful. Thanks |
@TonyJiang17 hey! Yes, we had an issue with llava-next-video recently and a fix was added in the latest patch release. Can you make sure that you have the latest version and check if model inference works? I guess the error message above should be affecting generation and training |
hey @zucchini-nlp thanks for replying. I made sure i am using the latest patch release 4.44.2 and ran the notebook but still got into the following error when I tried to finetune the model... Inference works. RuntimeError: Input tensor at index 1 has invalid shape [2, 1402, 32064], but expected [2, 1407, 32064] I am running just a batch size of 2, number of frames 8. I made sure each input_ids is padded to a max length of 256. Is there some issue with the number of tokens used per frame? I assume the 1407 came from 255 + 12128, and there should be 144 tokens per frame? Any help would be much appreciated! |
@TonyJiang17 oke, let me check this |
@TonyJiang17 the example notebook works for me with the latest transformers, I tried on a tiny subset of the ShareGPT4Video data. I guess you're using your own dataset for tuning. Can you share with me how the dataset looks like after collating and the whole traceback so I can help you |
Hi @zucchini-nlp certainly, and thanks again for helping! I am actually also using a tiny subset of the ShareGPT4Video data (i only loaded the mixit portion of it). Below is more information of the dataset after collating. I really just used your code. Please let me know if you needed more information. I actually first ran into this tensor not only the same device bug. I am using a AWS Sagemaker notebook instance with access to 4 A10 GPUs. The Bug traceback pasted below.RuntimeError Traceback (most recent call last) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/trainer.py:1938, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/trainer.py:2279, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/trainer.py:3318, in Trainer.training_step(self, model, inputs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/trainer.py:3363, in Trainer.compute_loss(self, model, inputs, return_outputs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/accelerate/utils/operations.py:820, in convert_outputs_to_fp32..forward(*args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/accelerate/utils/operations.py:808, in ConvertOutputsToFp32.call(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/amp/autocast_mode.py:16, in autocast_decorator..decorate_autocast(*args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/peft/peft_model.py:771, in PeftModel.forward(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/accelerate/hooks.py:170, in add_hook_to_module..new_forward(module, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/models/llava_next_video/modeling_llava_next_video.py:937, in LlavaNextVideoForConditionalGeneration.forward(self, input_ids, pixel_values, pixel_values_videos, image_sizes, attention_mask, position_ids, past_key_values, inputs_embeds, vision_feature_layer, vision_feature_select_strategy, labels, use_cache, output_attentions, output_hidden_states, return_dict) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:1189, in LlamaForCausalLM.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, cache_position) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:977, in LlamaModel.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict, cache_position) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/accelerate/hooks.py:170, in add_hook_to_module..new_forward(module, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:209, in LlamaRotaryEmbedding.forward(self, x, position_ids) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) |
@zucchini-nlp After I removed the device_map = "auto" parameter when loading the pretrained model and lowered the batch size to 2, i no longer run into the above bug. I think it's just a work around as I think it's now just using a single GPU not the multi-GPU set up... Regardless, after I remove the device_map = "auto" and ran the code again, I ran into the following tensor shape mismatch error similar to the original error in this issue thread. The Bug traceback pasted below.RuntimeError Traceback (most recent call last) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/trainer.py:1938, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/trainer.py:2279, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/trainer.py:3318, in Trainer.training_step(self, model, inputs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/trainer.py:3363, in Trainer.compute_loss(self, model, inputs, return_outputs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py:186, in DataParallel.forward(self, *inputs, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py:203, in DataParallel.gather(self, outputs, output_device) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/parallel/scatter_gather.py:105, in gather(outputs, target_device, dim) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/parallel/scatter_gather.py:96, in gather..gather_map(outputs) File :9, in init(self, loss, logits, past_key_values, hidden_states, attentions, image_hidden_states) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/transformers/utils/generic.py:390, in ModelOutput.post_init(self) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/parallel/scatter_gather.py:96, in (.0) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/parallel/scatter_gather.py:90, in gather..gather_map(outputs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/autograd/function.py:539, in Function.apply(cls, *args, **kwargs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/parallel/_functions.py:75, in Gather.forward(ctx, target_device, dim, *inputs) File ~/SageMaker/llava-next-env/lib/python3.10/site-packages/torch/nn/parallel/comm.py:231, in gather(tensors, dim, destination, out) RuntimeError: Input tensor at index 1 has invalid shape [2, 1402, 32064], but expected [2, 1407, 32064] |
I'm trying to instruction tune llava-next models following the llava_vsft.py examples shared for llava-1.5.
The run keeps failing on a 8xH100 VM with the following error:
The full code and error stack trace is available in this gist.
The text was updated successfully, but these errors were encountered: