Here we set up for local inference using Ollama, Postgres and pgvector extension. The services are separated into three docker containers. Inference on the LLM is done using the Ollama API on the standard port 11434. The vector storage is done either with middleware with crud operations or directly with psycopg in a python script. We tried both ways. The postgres service and ollama service function with the langchain integrations.
We will work with this set up and compare it to other integrations, specifically for multimodal RAG and object detection.
This is the components required to conduct inference from a large language model, including multimodal operations and vector storage. Ollama is used for the models and Postgres with pgvector extension is used for the vector storage, vector store and retrieval. The components are partitioned into three services: 1. ollama api: http://localhost:11434, 2. vector db: http://localhost:5433, 3. survey data db: http://localhost:5432. A middleware service using Flask is tested for compatibility and crud operations. Ollama, vector db and survey db operations are tested separately and all components are tested using the Langchain Ollama and Postgres integrations.
Here we have separated the docker files, this can be done in one docker-compose file also. The choice of models in Ollama is based on local resources. At time of writing we have the following local set up:
- RAM 32mb
- Nvidia driver 560
- 12 cores
To change the models edit the line in entry_point.sh.
Preliminaries
- Docker and docker-compose installed
- Nvidia driver installed and video card available
- Nvidia toolkit installed or cuda tool kit
- If postgres is not on your system you need to install psycopg[binary] on the host machine or in the virtual environment
local volumes
Local volumes refers to the storage of models and data on the host machine. That way when the containers shutdown the data persists. There are local volumes for the three containers:
- models/ is in the home directory of where ever you are running the Dockerfile that has the ollama service. It is created by ollama and contains the blobs and manifests that are created when you run
docker run --gpus all -it --rm -p 11434:11434 -v $(pwd)/models::/app/model --name ollama container ollamaservice
in the command above you are mounting the local volume /models to the container volume /app/model. This is where the models are stored.
- /path/to/docker-volumes/vector_db is the local volume for the vector database. This needs to be created before hand
- /path/to/docker-volumes/survey_data_db is the local volume for the survey data database. This needs to be created before hand
These are the data sources for the tables in different databases. To build the containers navigate to the folder with the docker compose file and run
docker-compose up --build
In the <docker-compose.yaml> file you will find reference to those two files.
directory structure
At time of writing the services are separated into their own directories. Both are using the same virtual environment. The test scripts are stored in the postgressservice directory. This includes the middleware script.
- ollamaservice/
- Dockerfile
- entry_point.
- models/
- blobs/
- manifests/
- postgressservice/
- docker-compose.yml
- dbmiddleware.py
- testflaskmiddleware.py
- testlangchain.py
- testsurveydb.py
- testvectordb.py
- /path/to/docker-volumes/postgres-data:/var/lib/postgresql/data
- /path/to/docker-volumes/pgvector-data:/var/lib/postgresql/data
This will build the containers and start the services. The services are available at the following ports:
- survey data db: http://localhost:5432
- vector db: http://localhost:5433
- midlleware: http://localhost:5000 (if you run the script)