Vector search

Authors: Jubin Jose, Nibin Peter

It is mentioned in other part of the specification that, Aquila DB keeps document and vector indices separately. Any operation on document keys or vectors are separately done. Aquila DB internally maintains a mapping between vectors and documents to keep consistency between them.

Vector indexing

Vector indexes are long matrices having dimension dxn. d is a fixed number for a database and is computed as min(schema.maxItems, config.docs.vd). n is the number of vectors inserted to the database (and not deleted) so far. Maintaining a single long matrix is just a concept. An implementation of Aquila DB is expected to keep the index as small chunks to enable distributed scaling and data redundancy (reliability).

There is a set of methods to do chunking called bucketization. These methods try to come up with a hashing (central point) algorithm which clusters vectors based on their distance from one another. Popular examples include pre-trained k-means clustering, locality sensitive hashing, special rolling hashes etc.

Note: Official Aquila DB implementation currently uses k-means clustering to generate buckets.

Domain constraints in search

Full search

Full search allows anyone to search the entire matrix chunks without any constraints. This should be done when in need for 100% accuracy with a trade-off of increased latency. Aquila DB implementations should do an exhaustive search over all distributed instances at once.

Bucket search

Bucket searches are limited to a pre-configured / dynamically specified number of closest buckets. This improves search performance with a trade-off of reduced accuracy. Scale of accuracy drops purely depends upon the bucketization algorithm used by implementations. Please refer Notion of error section below for more information on measuring accuracy of implementations.

Content constraints in search

k-NN search

k-NN search is performed to retrieve k number of nearest vectors to the query vector from the database. Search API should take in the limit k along with the query vector and should return maximum k vectors as response. k-NN search is pre-constrained by the domain constraints applicable.

Radius search

Radius search performs k-NN search to retrieve all vectors within a specified radius r to query vector. Search API should take in the radius r along with the query vector and should return all vectors within r as response. k-NN search is pre-constrained by the domain constraints applicable.

Notion of error

Aquila DB transparency APIs should include error metric benchmark. Below are the expected results along with pseudo-code to generate them:

coming soon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector search.md

Vector search.md

Vector search

Vector indexing

Domain constraints in search

Full search

Bucket search

Content constraints in search

k-NN search

Radius search

Notion of error

Files

Vector search.md

Latest commit

History

Vector search.md

File metadata and controls

Vector search

Vector indexing

Domain constraints in search

Full search

Bucket search

Content constraints in search

k-NN search

Radius search

Notion of error