You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I am using this library for benchmarking Question Answering tasks. For that I want to use a feature called self-consistency where multiple completions are generated for the same prompt with a high temperature value.
Describe the solution you'd like
Include the n parameter from OpenAI into this server.
I think that the Huggingface implementation of Llama offers this feature already.
Describe alternatives you've considered
Just do multiple calls to the LLM. But I guess this would take a lot more processing power since each generation will be a new pass through the model.
Additional context
As far as I know the multiple generations can be achieved using multiple beams in the generation. I am not sure if using multiple beams is supported by llama.cpp it would also be a help for me to clear that up first.
If a feature like that is supported by llama.cpp I may be able implement the python part myself and create a PR for that but I would need someone to point me in the right direction first.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
I am using this library for benchmarking Question Answering tasks. For that I want to use a feature called self-consistency where multiple completions are generated for the same prompt with a high temperature value.
Describe the solution you'd like
Include the n parameter from OpenAI into this server.
I think that the Huggingface implementation of Llama offers this feature already.
Describe alternatives you've considered
Just do multiple calls to the LLM. But I guess this would take a lot more processing power since each generation will be a new pass through the model.
Additional context
As far as I know the multiple generations can be achieved using multiple beams in the generation. I am not sure if using multiple beams is supported by llama.cpp it would also be a help for me to clear that up first.
If a feature like that is supported by llama.cpp I may be able implement the python part myself and create a PR for that but I would need someone to point me in the right direction first.
The text was updated successfully, but these errors were encountered: