-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug when using StaticCache in Qwen2.5 Inference #34678
Comments
@ArthurZucker Hi, when you have time, could you please help take a look at this bug? Thank you very much. |
Can you try using
The StaticCache mechanism is designed to work with input tokens rather than embeddings directly. By letting the model handle the embedding process internally, we avoid dimension mismatch issues during the attention mask creation. |
I can use StaticCache with input_ids, but unfortunately, in my scenario, I can't provide input_ids to the |
Can you explain why you can't pass |
Because the features I passed to |
After going down the rabbit hole, here's what I think, when we use
|
Static Cache should be working with input embeds even without attention mask, and from the code snippet I see that the first I'll check out if we can accommodate space for continue generation with embeds later next week, also feel free to open a PR if you have any initial fix :) |
@zucchini-nlp @ArthurZucker If nobody is currently working on it then I can take a stab at this issue. |
Yeah feel free to work on it! 🤗 |
@yaswanth19 are you still working on it? |
Yes @zucchini-nlp, I have started working on it and currently paused it due to personal work. I will try to raise a draft PR next week for quick review. |
@BBuf Can you use my feature branch and check whether it solves the issue or not. Please also check the reproducer code which I have attached 🤗 |
Thank you for your contribution. I have also raised a question regarding Progressive Generation Using inputs_embeds and past_key_values. #35707 The feature you implemented is very important for solving my issue. I have tested your feature branch, but encountered some problems. My original code still doesn’t work, and when I run your test code with the model replaced by llama3-3b-chat, it throws an error. I’d like to know if there’s any way to resolve this.
|
@Superbooming Please have a look at the colab notebook as I am able to run my branch correctly for progressive generation using embeds. I tried to reproduce the code for your usecase too and it seems to be working fine, have a look at it too. Notebook: https://colab.research.google.com/drive/1M5ncp8nHwXuwBYmUcAsK4jaBT1nmL9GF?usp=sharing |
Hey @yaswanth19, I runned your branch with |
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
When I use StaticCache to perform inference on Qwen2.5, a bug occurs. In this example, I pass the tensor after the embedding layer to model.generate instead of the token IDs from the tokenizer. The reproduction script is as follows:
I used the latest version of Transformers by compiling it from source. The error message is as follows:
Expected behavior
I can successfully run the above script using StaticCache.
The text was updated successfully, but these errors were encountered: