Skip to content

Commit 6b5f321

Browse files
committedNov 19, 2024
Update guide to use websocket connection instead of api
1 parent c95079a commit 6b5f321

File tree

1 file changed

+51
-35
lines changed

1 file changed

+51
-35
lines changed
 

‎docs/guides/python/llama-rag.mdx

+51-35
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
description: 'Making LLMs smarter with Dynamic Knowledge Access using Retrieval Augmented Generation'
33
tags:
4-
- API
4+
- Realtime & Websockets
55
- AI & Machine Learning
66
languages:
77
- python
@@ -54,7 +54,7 @@ We'll organize our project structure like so:
5454
+--model/
5555
| +-- Llama-3.2-1B-Instruct-Q4_K_M.gguf
5656
+--services/
57-
| +-- api.py
57+
| +-- chat.py
5858
+--.gitignore
5959
+--.python-version
6060
+-- build_query_engine.py
@@ -162,32 +162,55 @@ You can then run this using the following command. This should output the embeds
162162
uv run build_query_engine.py
163163
```
164164

165-
## Creating an API for querying our model
165+
## Creating a Websocket for querying our model
166166

167-
With our LLM ready for querying, we can create an API to handle prompts.
167+
With our LLM ready for querying, we can create a websocket to handle prompts.
168168

169-
```python title:services/api.py
169+
```python title:services/chat.py
170170
import os
171171

172-
from common.model_parameters import embed_model, llm, text_qa_template, persist_dir
172+
from common.model_parameters import embed_model, llm, persist_dir, text_qa_template
173173

174-
from nitric.resources import api
175-
from nitric.context import HttpContext
174+
from nitric.resources import websocket
175+
from nitric.context import WebsocketContext
176176
from nitric.application import Nitric
177177
from llama_index.core import StorageContext, load_index_from_storage, Settings
178178

179+
179180
# Set global settings for llama index
180181
Settings.llm = llm
181182
Settings.embed_model = embed_model
182183

183-
main_api = api("main")
184+
socket = websocket("socket")
185+
186+
# Handle socket connections
187+
@socket.on("connect")
188+
async def on_connect(ctx):
189+
print(f"socket connected with {ctx.req.connection_id}")
190+
return ctx
191+
192+
# Handle socket disconnections
193+
@socket.on("disconnect")
194+
async def on_disconnect(ctx):
195+
# handle disconnections
196+
print(f"socket disconnected with {ctx.req.connection_id}")
197+
return ctx
198+
199+
# Handle socket messages
200+
@socket.on("message")
201+
async def on_message(ctx: WebsocketContext):
202+
# Query the model with the requested prompt
203+
prompt = ctx.req.data.decode("utf-8")
204+
205+
response = await query_model(prompt)
184206

185-
@main_api.post("/prompt")
186-
async def query_model(ctx: HttpContext):
187-
# Pull the data from the request body
188-
query = str(ctx.req.data)
207+
# Send a response to the open connection
208+
await socket.send(ctx.req.connection_id, response.encode("utf-8"))
189209

190-
print(f"Querying model: \"{query}\"")
210+
return ctx
211+
212+
async def query_model(prompt: str):
213+
print(f"Querying model: \"{prompt}\"")
191214

192215
# Get the model from the stored local context
193216
if os.path.exists(persist_dir):
@@ -196,36 +219,31 @@ async def query_model(ctx: HttpContext):
196219
index = load_index_from_storage(storage_context)
197220

198221
# Get the query engine from the index, and use the prompt template for santisation.
199-
query_engine = index.as_query_engine(streaming=False, similarity_top_k=4, text_qa_template=text_qa_template)
222+
query_engine = index.as_query_engine(
223+
streaming=False,
224+
similarity_top_k=4,
225+
text_qa_template=text_qa_template
226+
)
200227
else:
201228
print("model does not exist")
202-
ctx.res.success= False
203-
return ctx
229+
return "model does not exist"
204230

205231
# Query the model
206-
response = query_engine.query(query)
232+
query_response = query_engine.query(prompt)
207233

208-
ctx.res.body = f"{response}"
234+
print(f"Response: \n{query_response}")
209235

210-
print(f"Response: \n{response}")
211-
212-
return ctx
236+
return query_response.response
213237

214238
Nitric.run()
215239
```
216240

217241
## Test it locally
218242

219-
Now that you have an API defined, we can test it locally. You can do this using `nitric start` and make a request to the API either through the [Nitric Dashboard](/get-started/foundations/projects/local-development#local-dashboard) or another HTTP client like cURL.
220-
221-
```bash
222-
curl -X POST http://localhost:4001/prompt -d "What is Nitric?"
223-
```
224-
225-
This should produce an output similar to:
243+
Now that we have the Websocket defined, we can test it locally. You can do this using `nitric start` and connecting to the websocket through either the [Nitric Dashboard](/get-started/foundations/projects/local-development#local-dashboard) or another Websocket client. Once connected, you can send a message with a prompt to the model. Sending a prompt like "What is Nitric?" should produce an output similar to:
226244

227245
```text
228-
Nitric is a cloud-agnostic framework designed to aid developers in building full cloud applications, including infrastructure. It is a declarative cloud framework with common resources like APIs, websockets, databases, queues, topics, buckets, and more. The framework provides tools for locally simulating a cloud environment, to allow an application to be tested locally, and it makes it possible to interact with resources at runtime. It is a lightweight and flexible framework that allows developers to structure their projects according to their preferences and needs. Nitric is not a replacement for IaC tools like Terraform but rather introduces a method of bringing developer self-service for infrastructure directly into the developer application. Nitric can be augmented through use of tools like Pulumi or Terraform and even be fully customized using such tools. The framework supports multiple programming languages, and its default deployment engines are built with Pulumi. Nitric provides tools for defining services in your project's `nitric.yaml` file, and each service can be run independently, allowing your app to scale and manage different workloads efficiently. Services are the heart of Nitric apps, they're the entrypoints to your code. They can serve as APIs, websockets, schedule handlers, subscribers and a lot more.
246+
Nitric is a cloud-agnostic framework designed to aid developers in building full cloud applications, including infrastructure.
229247
```
230248

231249
## Get ready for deployment
@@ -258,6 +276,8 @@ nitric stack new dev aws
258276

259277
Update the stack file `nitric.dev.yaml` with the appropriate AWS region and memory allocation to handle the model:
260278

279+
<Note>WebSockets are supported across all of AWS regions</Note>
280+
261281
```yaml title:nitric.dev.yaml
262282
provider: nitric/aws@1.14.0
263283
region: us-east-1
@@ -280,11 +300,7 @@ We can then deploy using the following command:
280300
nitric up
281301
```
282302

283-
Testing on AWS will be the same as we did locally, we'll just use cURL to make a request to the API URL that was outputted at the end of the deployment.
284-
285-
```bash
286-
curl -x POST {your AWS endpoint URL here}/prompt -d "What is Nitric?"
287-
```
303+
Testing on AWS we'll need to use a Websocket client or the AWS portal. You can verify it in the same way as locally by connecting to the websocket and sending a message with a prompt for the model.
288304

289305
Once you're finished querying the model, you can destroy the deployment using `nitric down`.
290306

0 commit comments

Comments
 (0)