Skip to content
This repository was archived by the owner on Jun 24, 2024. It is now read-only.

Reporting model stats #46

Closed
bcho opened this issue Mar 19, 2023 · 4 comments · Fixed by #47
Closed

Reporting model stats #46

bcho opened this issue Mar 19, 2023 · 4 comments · Fixed by #47

Comments

@bcho
Copy link
Contributor

bcho commented Mar 19, 2023

Hey, in llama.cpp, there are some useful output for telling the model predict / evaluation stats:

https://github.com/ggerganov/llama.cpp/blob/da5303c1ea68aa19db829c634f1e10d08d409680/main.cpp#L1086-L1095

Can we also export such information after the model run?

I have a local change which produces output like this:

[2023-03-19T21:59:46Z INFO  llama_cli] Model size = 4017.27 MB / num tensors = 291
[2023-03-19T21:59:46Z INFO  llama_cli] Model fully loaded!
<prompt> <predict output>
feed_prompt_duration: 1533ms, prompt_tokens: 10, predict_duration: 22656ms, predict_tokens: 86, per_token_duration: 263.442ms
@brysgo
Copy link

brysgo commented Mar 20, 2023

Would be nice to see the stats, I'm curious why it feels a little slower than the C++ version.

@bcho
Copy link
Contributor Author

bcho commented Mar 20, 2023

@brysgo oh really? From my local testing, the performance is about the same... do you have any metrics/results can share?

@brysgo
Copy link

brysgo commented Mar 20, 2023

Right now it is just qualitative, watching the words come up. Maybe it is I/O bound?

I'm going to checkout your branch and run the stats.

@brysgo
Copy link

brysgo commented Mar 20, 2023

llama-rs
feed_prompt_duration: 2421ms, prompt_tokens: 9, predict_duration: 33204ms, predict_tokens: 116, per_token_duration: 286.241ms

llama.cpp

main: mem per token = 14368644 bytes
main:     load time =   772.37 ms
main:   sample time =    12.49 ms
main:  predict time =  1900.99 ms / 76.04 ms per token
main:    total time =  3039.16 ms

Model Name: MacBook Pro
Model Identifier: MacBookPro18,2
Chip: Apple M1 Max
Total Number of Cores: 10 (8 performance and 2 efficiency)
Memory: 64 GB

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants