Skip to content

nikkkkhil/faiss-search-engine

Repository files navigation

searchengine

searchengine is a similarity search engine built around Faiss library. It allows you to find the most similar vectors (a common mathematical and simplified representation of an image or a sound) of a query vector among hundreds of millions, or billions if you have enough RAM.

Why ?

Many computer scientists are now able to use machine learning frameworks to classify images without even having to understand how it works. It is very easy to obtain a probability of belonging to a class. In a production environment, regularly adding images or classes causes a bottleneck since you must constantly re-learn your model. One solution is to use a non-evolving model that simply produces an N-dimensional vector for each input image. These vectors can then be compressed, indexed and/or searched by similarity with an external and incremental system.

What is it ?

searchengine is similarity search engine (Think Elasticsearch, but for vectors, not text documents). searchengine is in fact just a wrapper around the Faiss library. It aims at providing what is missing in such a scientific library:

  • Data persitency: reliable storage of raw and compressed vectors (Faiss stores the data in RAM, and can persist to disk on demand, but the writing is not incremental).

  • Web services: searchengine provides a gRPC server, and some rest API endpoints for management/monitoring.

  • Packaging in a ready-to-run Docker image.

Archi
Figure 1. Global Architecture

Quick Start

Start the searchengine server with dependencies (Mongo + Redis)
docker-compose up

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

Databases

  • MongoDB 4.0+ (for storing databases and indices informations, and raw features)

  • Redis 5.0+ (for storing Faiss indices and encoded vectors, inverted lists, etc.)

Python (3.7) dependencies

See list in requirements.txt

install dependencies with:
pip install -r requirements.txt

Installing

from sources:
mkdir build
cd build
cmake ..
make
in a Docker image:
docker build -t pletessier/searchengine .

Deployment

docker-compose up -d

A default configuration file is provided inside the the Docker image. There are 4 complementary ways to configure searchengine:

  • One or more configuration files (yaml|tomljson|ini|xml) provided by the --config-file /my/config/file/path command line arg.

  • One or more configuration directories provided with the --config-path /my/config/directory command line arg. Example: providing the path /run/secrets, searchengine will read every files in all subdirectories of /run/secrets and associate a key (the subpath) to a value (file content). If there is a file /run/secrets/db/redis/password containing the text notagoodpassword, the configuration will be: db.redis.password=notagoodpassword

  • Every environment variables starting with searchengine__ will be parsed. For instance, searchengine__DB__REDIS__HOST=my-redis-host is equivalent to db.redis.host=my-redis-host.

  • Every additional command lines provided with the arg --additional-config or -C, such as -C db.redis.host=my-redis-host.

Tests

Note
Explain how to run the searchengine client code with a 1M vectors dataset.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published