-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phi3 : duplicate rope factors in each layer #7447
Conversation
slaren
commented
May 21, 2024
•
edited
Loading
edited
GPU | Model | Test | t/s master | t/s sl/phi3-fix | Speedup |
---|---|---|---|---|---|
RTX 3090 Ti | phi3 14B Q8_0 | pp512 | 1655.80 | 2359.53 | 1.43 |
RTX 3090 Ti | phi3 14B Q8_0 | tg128 | 16.97 | 53.37 | 3.14 |
phi3 : set phi-3 model type as 14B model loader : simplify the process for duplicating model tensors llama-bench : remove default pg test
llama.cpp
Outdated
model.output = ml.create_tensor(ctx_output, tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab}); | ||
ml.n_created--; // artificial tensor | ||
ml.size_data += ggml_nbytes(model.output); | ||
model.output = ml.create_tensor(ctx_output, tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab}, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the intention to make the duplicated
argument true
? I assume that yes, because then the old behavior would be kept.
This seems like required
is set to true
and leaves duplicated
to false
.
(this also applies to the other places where model.output
is initialized from the token_embd tensor)
model.output = ml.create_tensor(ctx_output, tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab}, true); | |
model.output = ml.create_tensor(ctx_output, tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab}, true, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have replaced the boolean parameters with named flags that should make these errors easier to avoid in the future.
* phi3 : duplicate rope factors in each layer phi3 : set phi-3 model type as 14B model loader : simplify the process for duplicating model tensors llama-bench : remove default pg test * replace bool parameters in llama_model_loader with named flags
* phi3 : duplicate rope factors in each layer phi3 : set phi-3 model type as 14B model loader : simplify the process for duplicating model tensors llama-bench : remove default pg test * replace bool parameters in llama_model_loader with named flags
* phi3 : duplicate rope factors in each layer phi3 : set phi-3 model type as 14B model loader : simplify the process for duplicating model tensors llama-bench : remove default pg test * replace bool parameters in llama_model_loader with named flags