A lighweight service to serve ai/llm needs. All requests are queued as tasks and executed with some retry strategy by celery worker(s)
To run the ai-llm-service
project, follow these steps:
-
Clone the repository:
git clone https://github.com/DalgoT4D/ai-llm-service.git
-
Navigate to the project directory:
cd ai-llm-service
-
Create a virtual environment and activate it:
python3 -m venv venv source venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Setup your .env file, Make sure you have a redis server running
cp .env.example .env
Update the relevant fields in
.env
-
Start the Celery worker(s):
celery -A main.celery worker -n llm -Q llm --loglevel=INFO
-
Monitor your celery tasks and queues using flower:
celery -A main.celery flower --port=5555
Dashboard will be available at
http://localhost:5555
-
Start the FastAPI server:
Dev server
python3 main.py
You can test the service by sending requests to the available endpoints.
Currently the service supports the openai's file search but can be easily extended to other services. The request response flow here is as follows
-
Client uploads a file (to query on) to the service.
-
Client uses the
file_path
from 1. to query. Note the client needs to provided with asystem_prompt
or anassistant_prompt
. Client can do multiple queries here -
Client polls for the response until the job/task reaches a terminal state.
-
Client gets the result with a
session_id
. Client can either continue querying the same file or close the session
API documentation can be found at https://llm.projecttech4dev.org/docs