-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] how to run Llama-3.1-Minitron-4B-Width-Base #2820
Comments
I think you refer to this block.
Because of L343, which defines FYI, PR #32495 has been merged now
The model is a Base model, not an instruct model. It still may have minimal conversational ability. Please posting a question/request in the HF model hub page. |
@suhara thank you for the checking, but the mlc-llm codebase has a similar code here
But based on the PR you meneiton in https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base the |
@huanglizhuo Your understanding is correct. The custom head_dim should be supported on the MLC side as well. Each inference engine (e.g., HF, Llama.cpp, MLC) should support the architecture. I'm not familiar with MLC at all but do you think you can make necessary changes? FYI, you can refer to the PR for HF. |
@suhara Thank you for the confirm, I actually check your PR for HF, and
I removed it as I mention here, but the chat output is nonsense. Let me read MLC code more carefully and see if I can find where did I miss. |
by the way when I do the weight convert I use |
will try convert weight without |
tried with |
Hi @huanglizhuo, thank you for bringing this issue to our attention. Removal of However, we've encountered similar observation when running inference on the Llama-3.1-Minitron-4B-Width-Base model using Hugging Face's |
@YiyanZhai thank you for the update, then the issue is actually due to
|
❓ General Questions
I am trying to run Llama-3.1-Minitron-4B-Width-Base, in the readme they mention:
After check the PR mentioned in above I found that
head_dim
already supported by mlc-llm, and looks likeassert self.head_dim * self.num_attention_heads == self.hidden_size
from the llama_model.py is not required? so I did below steps:assert self.head_dim * self.num_attention_heads == self.hidden_size
from thellama_model.py
I think there must be some wrong understanding from myside, can anyone help give hint which direction should I check to run
Llama-3.1-Minitron-4B-Width-Base
model?The text was updated successfully, but these errors were encountered: