Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
aider: Updated the
handleCompletions
function to support constraine…
…d grammars by including the `messages` field and adjusting the endpoint to `/v1/chat/completions`. # Aider chat conversation: USER: https://localai.io/features/constrained_grammars/: - [](/) LocalAI - - - [*info* Overview](https://localai.io/) - *rocket_launch* Getting started - [Quickstart](https://localai.io/basics/getting_started/) - [Run other Models](https://localai.io/docs/getting-started/run-other-models/) - [Customizing the Model](https://localai.io/docs/getting-started/customize-model/) - [Run models manually](https://localai.io/docs/getting-started/manual/) - [Build LocalAI from source](https://localai.io/basics/build/) - [*newspaper* News](https://localai.io/basics/news/) - *feature_search* Features - [⚡ GPU acceleration](https://localai.io/features/gpu-acceleration/) - [📖 Text generation (GPT)](https://localai.io/features/text-generation/) - [🗣 Text to audio (TTS)](https://localai.io/features/text-to-audio/) - [🎨 Image generation](https://localai.io/features/image-generation/) - [🧠 Embeddings](https://localai.io/features/embeddings/) - [🆕 GPT Vision](https://localai.io/features/gpt-vision/) - [✍️ Constrained grammars](https://localai.io/features/constrained_grammars/) - [🔈 Audio to text](https://localai.io/features/audio-to-text/) - [🔥 OpenAI functions and tools](https://localai.io/features/openai-functions/) - [💾 Stores](https://localai.io/stores/) - [🖼️ Model gallery](https://localai.io/models/) - [*rocket_launch* Integrations](https://localai.io/docs/integrations/) - *science* Advanced - [Advanced usage](https://localai.io/advanced/) - [Fine-tuning LLMs for text generation](https://localai.io/docs/advanced/fine-tuning/) - *science* References - [Model compatibility table](https://localai.io/model-compatibility/) - [Architecture](https://localai.io/docs/reference/architecture/) - [Available Container images](https://localai.io/docs/reference/container-images/) - [All-In-One images](https://localai.io/docs/reference/aio-images/) - [*quiz* FAQ](https://localai.io/faq/) ::::::::::::::::::::::::::::::::::::::::::::::::: {role="main"} [](/) menu search Search [ ]{.kbd} [ ]{.kbd} - [](%20https://github.com/mudler/LocalAI%20) - [](%20https://twitter.com/LocalAI_API%20) - [](%20https://localai.io/index.xml%20) []{.kbd} []{.kbd} to navigate []{.kbd} to select []{.kbd} to close cancel - [*Home*](/docs/) - [Features](/features/) - ✍️ Constrained grammars On this page - - [Setup](#setup) - [💡 Usage example](#-usage-example) Table of Contents - - [Setup](#setup) - [💡 Usage example](#-usage-example) *article* # ✍️ Constrained grammars The chat endpoint accepts an additional `grammar` parameter which takes a [BNF defined grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form). This allows the LLM to constrain the output to a user-defined schema, allowing to generate `JSON`, `YAML`, and everything that can be defined with a BNF grammar. notifications This feature works only with models compatible with the [llama.cpp](https://github.com/ggerganov/llama.cpp) backend (see also [Model compatibility](/model-compatibility/)). For details on how it works, see the upstream PRs: <ggerganov/llama.cpp#1773>, <ggerganov/llama.cpp#1887> ## Setup [*link*](#setup) Follow the setup instructions from the [LocalAI functions](/features/openai-functions/) page. ## 💡 Usage example [*link*](#-usage-example) For example, to constrain the output to either `yes`, `no`: curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Do you like apples?"}], "grammar": "root ::= (\"yes\" | \"no\")" }' [ Edit this page](https://github.com/mudler/LocalAI/blob/master/docs/content/docs/features/constrained_grammars.md) Last updated 3 months ago. history ------------------------------------------------------------------------ [](/features/gpt-vision/) *navigate_before* 🆕 GPT Vision [](/features/audio-to-text/) 🔈 Audio to text *navigate_next* © 2023-2024 [Ettore Di Giacinto](https://mudler.pm) ::::::::::::::::::::::::::::::::::::::::::::::::: ASSISTANT: Based on the information provided about the LocalAI's constrained grammars feature, it seems that the `handleCompletions` function in the `rag-api.go` file needs to be updated to support the constrained grammars when making a request to the LocalAI service. The request payload should include a `grammar` field when it is provided in the `CompletionRequest`. Here is the diff for the necessary changes: ```diff --- cluster/apps/home/localai/app/config/rag-api.go +++ cluster/apps/home/localai/app/config/rag-api.go @@ ... @@ func handleCompletions(w http.ResponseWriter, r *http.Request) { atomic.AddUint64(&requestMetrics.CompletionRequests, 1) logRequest(r) var req CompletionRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return } defer r.Body.Close() embedding, err := getEmbeddings(req.Prompt) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } - // Generate completion using constrained grammar (if requested) - var completion CompletionResponse - if req.ConstrainedGrammar != "" { - payload := map[string]interface{}{ - "model": "gpt-4", - "prompt": req.Prompt, - "grammar": req.ConstrainedGrammar, - "max_tokens": req.MaxTokens, - "temperature": req.Temperature, - "top_p": req.TopP, - } - jsonPayload, err := json.Marshal(payload) - if err != nil { - http.Error(w, err.Error(), http.StatusInternalServerError) - return - } + // Generate completion using constrained grammar or retrieval-augmented generation (if requested) + var completionResp CompletionResponse + payload := map[string]interface{}{ + "model": "gpt-4", + "prompt": req.Prompt, + "max_tokens": req.MaxTokens, + "temperature": req.Temperature, + "top_p": req.TopP, + } + if req.ConstrainedGrammar != "" { + payload["grammar"] = req.ConstrainedGrammar + } + jsonPayload, err := json.Marshal(payload) + if err != nil { + http.Error(w, err.Error(), http.StatusInternalServerError) + return + } localAI := os.Getenv("LOCAL_AI_ENDPOINT") if localAI == "" { localAI = defaultLocalAI } - resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload)) + resp, err := http.Post(localAI+"/completions", "application/json", bytes.NewBuffer(jsonPayload)) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, err := io.ReadAll(resp.Body) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode) return } - var respBody struct { - Result CompletionResponse `json:"result"` - } - if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil { + if err := json.NewDecoder(resp.Body).Decode(&completionResp); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } - completion = respBody.Result + jsonResp, err := json.Marshal(completionResp) + if err != nil { + http.Error(w, err.Error(), http.StatusInternalServerError) + return + } - jsonResp, err := json.Marshal(completion) - if err != nil { - http.Error(w, err.Error(), http.StatusInternalServerError) - return - } w.Header().Set("Content-Type", "application/json") w.WriteHeader(http.StatusOK) w.Write(jsonResp) } ``` This diff updates the `handleCompletions` function to include the `grammar` field in the request payload if it is provided in the `CompletionRequest`. It also corrects the endpoint to `/completions` as per the LocalAI documentation. USER: UnifiedDiffNoMatch: hunk failed to apply! cluster/apps/home/localai/app/config/rag-api.go does not contain lines that match the diff you provided! Try again. DO NOT skip blank lines, comments, docstrings, etc! The diff needs to apply cleanly to the lines in cluster/apps/home/localai/app/config/rag-api.go! cluster/apps/home/localai/app/config/rag-api.go does not contain these 76 exact lines in a row: ``` func handleCompletions(w http.ResponseWriter, r *http.Request) { atomic.AddUint64(&requestMetrics.CompletionRequests, 1) logRequest(r) var req CompletionRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return } defer r.Body.Close() embedding, err := getEmbeddings(req.Prompt) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } // Generate completion using constrained grammar (if requested) var completion CompletionResponse if req.ConstrainedGrammar != "" { payload := map[string]interface{}{ "model": "gpt-4", "prompt": req.Prompt, "grammar": req.ConstrainedGrammar, "max_tokens": req.MaxTokens, "temperature": req.Temperature, "top_p": req.TopP, } jsonPayload, err := json.Marshal(payload) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } localAI := os.Getenv("LOCAL_AI_ENDPOINT") if localAI == "" { localAI = defaultLocalAI } resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload)) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, err := io.ReadAll(resp.Body) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode) return } var respBody struct { Result CompletionResponse `json:"result"` } if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } completion = respBody.Result jsonResp, err := json.Marshal(completion) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } w.Header().Set("Content-Type", "application/json") w.WriteHeader(http.StatusOK) w.Write(jsonResp) } ``` ASSISTANT: I apologize for the confusion. Let's try again with the correct diff that applies cleanly to the `cluster/apps/home/localai/app/config/rag-api.go` file. I will update the `handleCompletions` function to correctly handle the constrained grammars feature as described in the LocalAI documentation. ```diff --- cluster/apps/home/localai/app/config/rag-api.go +++ cluster/apps/home/localai/app/config/rag-api.go @@ ... @@ func handleCompletions(w http.ResponseWriter, r *http.Request) { atomic.AddUint64(&requestMetrics.CompletionRequests, 1) logRequest(r) var req CompletionRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return } defer r.Body.Close() // Generate completion using constrained grammar (if requested) var completion CompletionResponse if req.ConstrainedGrammar != "" { payload := map[string]interface{}{ "model": "gpt-4", "prompt": req.Prompt, - "grammar": req.ConstrainedGrammar, + "grammar": req.ConstrainedGrammar, "max_tokens": req.MaxTokens, "temperature": req.Temperature, "top_p": req.TopP, } jsonPayload, err := json.Marshal(payload) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } localAI := os.Getenv("LOCAL_AI_ENDPOINT") if localAI == "" { localAI = defaultLocalAI } - resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload)) + resp, err := http.Post(localAI+"/completions", "application/json", bytes.NewBuffer(jsonPayload)) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, err := io.ReadAll(resp.Body) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode) return } var respBody struct { Result CompletionResponse `json:"result"` } if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } completion = respBody.Result } else { // Implement retrieval-augmented generation using retrieved data // ... } jsonResp, err := json.Marshal(completion) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } w.Header().Set ^C KeyboardInterrupt USER: https://localai.io/features/constrained_grammars/: - [](/) LocalAI - - - [*info* Overview](https://localai.io/) - *rocket_launch* Getting started - [Quickstart](https://localai.io/basics/getting_started/) - [Run other Models](https://localai.io/docs/getting-started/run-other-models/) - [Customizing the Model](https://localai.io/docs/getting-started/customize-model/) - [Run models manually](https://localai.io/docs/getting-started/manual/) - [Build LocalAI from source](https://localai.io/basics/build/) - [*newspaper* News](https://localai.io/basics/news/) - *feature_search* Features - [⚡ GPU acceleration](https://localai.io/features/gpu-acceleration/) - [📖 Text generation (GPT)](https://localai.io/features/text-generation/) - [🗣 Text to audio (TTS)](https://localai.io/features/text-to-audio/) - [🎨 Image generation](https://localai.io/features/image-generation/) - [🧠 Embeddings](https://localai.io/features/embeddings/) - [🆕 GPT Vision](https://localai.io/features/gpt-vision/) - [✍️ Constrained grammars](https://localai.io/features/constrained_grammars/) - [🔈 Audio to text](https://localai.io/features/audio-to-text/) - [🔥 OpenAI functions and tools](https://localai.io/features/openai-functions/) - [💾 Stores](https://localai.io/stores/) - [🖼️ Model gallery](https://localai.io/models/) - [*rocket_launch* Integrations](https://localai.io/docs/integrations/) - *science* Advanced - [Advanced usage](https://localai.io/advanced/) - [Fine-tuning LLMs for text generation](https://localai.io/docs/advanced/fine-tuning/) - *science* References - [Model compatibility table](https://localai.io/model-compatibility/) - [Architecture](https://localai.io/docs/reference/architecture/) - [Available Container images](https://localai.io/docs/reference/container-images/) - [All-In-One images](https://localai.io/docs/reference/aio-images/) - [*quiz* FAQ](https://localai.io/faq/) ::::::::::::::::::::::::::::::::::::::::::::::::: {role="main"} [](/) menu search Search [ ]{.kbd} [ ]{.kbd} - [](%20https://github.com/mudler/LocalAI%20) - [](%20https://twitter.com/LocalAI_API%20) - [](%20https://localai.io/index.xml%20) []{.kbd} []{.kbd} to navigate []{.kbd} to select []{.kbd} to close cancel - [*Home*](/docs/) - [Features](/features/) - ✍️ Constrained grammars On this page - - [Setup](#setup) - [💡 Usage example](#-usage-example) Table of Contents - - [Setup](#setup) - [💡 Usage example](#-usage-example) *article* # ✍️ Constrained grammars The chat endpoint accepts an additional `grammar` parameter which takes a [BNF defined grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form). This allows the LLM to constrain the output to a user-defined schema, allowing to generate `JSON`, `YAML`, and everything that can be defined with a BNF grammar. notifications This feature works only with models compatible with the [llama.cpp](https://github.com/ggerganov/llama.cpp) backend (see also [Model compatibility](/model-compatibility/)). For details on how it works, see the upstream PRs: <ggerganov/llama.cpp#1773>, <ggerganov/llama.cpp#1887> ## Setup [*link*](#setup) Follow the setup instructions from the [LocalAI functions](/features/openai-functions/) page. ## 💡 Usage example [*link*](#-usage-example) For example, to constrain the output to either `yes`, `no`: curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Do you like apples?"}], "grammar": "root ::= (\"yes\" | \"no\")" }' [ Edit this page](https://github.com/mudler/LocalAI/blob/master/docs/content/docs/features/constrained_grammars.md) Last updated 3 months ago. history ------------------------------------------------------------------------ [](/features/gpt-vision/) *navigate_before* 🆕 GPT Vision [](/features/audio-to-text/) 🔈 Audio to text *navigate_next* © 2023-2024 [Ettore Di Giacinto](https://mudler.pm) ::::::::::::::::::::::::::::::::::::::::::::::::: ASSISTANT: To implement the constrained grammars feature as described in the LocalAI documentation, we need to update the `handleCompletions` function in the `rag-api.go` file to correctly handle the `grammar` parameter in the request payload. Here is the diff for the necessary changes: ```diff --- cluster/apps/home/localai/app/config/rag-api.go +++ cluster/apps/home/localai/app/config/rag-api.go @@ ... @@ func handleCompletions(w http.ResponseWriter, r *http.Request) { atomic.AddUint64(&requestMetrics.CompletionRequests, 1) logRequest(r) var req CompletionRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return } defer r.Body.Close() // Generate completion using constrained grammar (if requested) var completion CompletionResponse if req.ConstrainedGrammar != "" { payload := map[string]interface{}{ "model": "gpt-4", "prompt": req.Prompt, + "messages": []map[string]string{ + { + "role": "user", + "content": req.Prompt, + }, + }, "max_tokens": req.MaxTokens, "temperature": req.Temperature, "top_p": req.TopP, + "grammar": req.ConstrainedGrammar, } jsonPayload, err := json.Marshal(payload) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } localAI := os.Getenv("LOCAL_AI_ENDPOINT") if localAI == "" { localAI = defaultLocalAI } - resp, err := http.Post(localAI+"/completions", "application/json", bytes.NewBuffer(jsonPayload)) + resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload)) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { body, err := io.ReadAll(resp.Body) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode) return } var respBody struct { Result CompletionResponse `json:"result"` } if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } completion = respBody.Result } else { // Implement retrieval-augmented generation using retrieved data // ... } jsonResp, err := json.Marshal(completion) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } w.Header().Set("Content-Type", "application/json") w.WriteHeader(http.StatusOK) w.Write(jsonResp) } ``` This diff adds the `messages` field to the payload as required by the LocalAI API for constrained grammars and updates the endpoint to `/v1/chat/completions`. It also includes the `grammar` field when the `ConstrainedGrammar` is not empty.
- Loading branch information