n_generations Paramter/ n_beams parameter for openai server api #340

ColinS97 · 2023-06-07T16:23:17Z

Is your feature request related to a problem? Please describe.
I am using this library for benchmarking Question Answering tasks. For that I want to use a feature called self-consistency where multiple completions are generated for the same prompt with a high temperature value.

Describe the solution you'd like
Include the n parameter from OpenAI into this server.
I think that the Huggingface implementation of Llama offers this feature already.

Describe alternatives you've considered
Just do multiple calls to the LLM. But I guess this would take a lot more processing power since each generation will be a new pass through the model.

Additional context
As far as I know the multiple generations can be achieved using multiple beams in the generation. I am not sure if using multiple beams is supported by llama.cpp it would also be a help for me to clear that up first.
If a feature like that is supported by llama.cpp I may be able implement the python part myself and create a PR for that but I would need someone to point me in the right direction first.

gjmulder added the enhancement New feature or request label Jun 7, 2023

abetlen linked a pull request Sep 30, 2023 that will close this issue

Add beam search #631

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

n_generations Paramter/ n_beams parameter for openai server api #340

n_generations Paramter/ n_beams parameter for openai server api #340

ColinS97 commented Jun 7, 2023

n_generations Paramter/ n_beams parameter for openai server api #340

n_generations Paramter/ n_beams parameter for openai server api #340

Comments

ColinS97 commented Jun 7, 2023