Skip to content

Latest commit

 

History

History
56 lines (22 loc) · 3.28 KB

Vector search.md

File metadata and controls

56 lines (22 loc) · 3.28 KB

Vector search

Authors: Jubin Jose, Nibin Peter

It is mentioned in other part of the specification that, Aquila DB keeps document and vector indices separately. Any operation on document keys or vectors are separately done. Aquila DB internally maintains a mapping between vectors and documents to keep consistency between them.

Vector indexing

Vector indexes are long matrices having dimension dxn. d is a fixed number for a database and is computed as min(schema.maxItems, config.docs.vd). n is the number of vectors inserted to the database (and not deleted) so far. Maintaining a single long matrix is just a concept. An implementation of Aquila DB is expected to keep the index as small chunks to enable distributed scaling and data redundancy (reliability).

There is a set of methods to do chunking called bucketization. These methods try to come up with a hashing (central point) algorithm which clusters vectors based on their distance from one another. Popular examples include pre-trained k-means clustering, locality sensitive hashing, special rolling hashes etc.

Note: Official Aquila DB implementation currently uses k-means clustering to generate buckets.

Domain constraints in search

Full search

Full search allows anyone to search the entire matrix chunks without any constraints. This should be done when in need for 100% accuracy with a trade-off of increased latency. Aquila DB implementations should do an exhaustive search over all distributed instances at once.

Bucket search

Bucket searches are limited to a pre-configured / dynamically specified number of closest buckets. This improves search performance with a trade-off of reduced accuracy. Scale of accuracy drops purely depends upon the bucketization algorithm used by implementations. Please refer Notion of error section below for more information on measuring accuracy of implementations.

Content constraints in search

k-NN search

k-NN search is performed to retrieve k number of nearest vectors to the query vector from the database. Search API should take in the limit k along with the query vector and should return maximum k vectors as response. k-NN search is pre-constrained by the domain constraints applicable.

Radius search

Radius search performs k-NN search to retrieve all vectors within a specified radius r to query vector. Search API should take in the radius r along with the query vector and should return all vectors within r as response. k-NN search is pre-constrained by the domain constraints applicable.

Notion of error

Aquila DB transparency APIs should include error metric benchmark. Below are the expected results along with pseudo-code to generate them:

coming soon