Replies: 2 comments 1 reply
-
The only things that would affect inference speed are model size (7B is fastest, 65B is slowest) and your CPU/RAM specs.
|
Beta Was this translation helpful? Give feedback.
-
Here is my inaccurate list I constructed from iffy sources (mostly this thread). I hope someone can point the mistakes or suggest better explanations. --temp - controls looseness on prompt or how wild/creative the AI is |
Beta Was this translation helpful? Give feedback.
-
Hello there,
So there are plenty parameters, and for a lot of them i have no clue about what they are used for or if they can help me to have better answer time. I would like to have a realistic chatbot, that doesn't take 2-5 min to answer a single question, but still doesn't sound like a robot. It would be awesome to have a place where we could resume those things for the newcomer like myself, so maybe you could help me know more or to correct my mistakes :
temp : (This one doesn't seem to impact on the speed, since it just change how much the AI will stay on the topic)
top_k : (This is the number of probable next words, to create a pool of words to choose from)
top_p: (This is by how much a word should be probable to be picked)
repeat_last_n: (I have no idea)
repeat_penalty: (Seems like the higher the less the AI will repeat itself)
n_ctx: (I don't know what this exactly is and how much it helps to speed stuff)
n_batch: (That's the amount of character the AI can compute in the same round, it seems like you should keep it as low as possible for your needs)
n_predict: (I believe that's how many character the AI think upfront before talking, if it's lower than the sentence it may have to compute another round or talk nonsense)
n_keep: (I have no idea)
Could someone please point me toward what repeat_last_n, n_keep and n_ctx do ? And tell me what parameter really play in the speed of the answer ?
So far i'm using that without too much idea of what i'm doing with extremely low performance :
Thank you !
Beta Was this translation helpful? Give feedback.
All reactions