Skip to content

Latest commit

 

History

History
47 lines (29 loc) · 1.94 KB

File metadata and controls

47 lines (29 loc) · 1.94 KB

CurvFaiss

Introduction

CurvFaiss is a library for efficient similarity search and clustering of dense vectors in non-Euclidean manifolds.

Based on Faiss, CurvFaiss develops a new Index IndexFlatStereographic to support nearest neighbors searching with stereographic distance metric. Together with CurvLearn, non-Euclidean model training and efficient inference are feasible.

Currently CurvFaiss supports retrieving neighbors in Hyperbolic, Euclidean, Spherical space. The indexing method is based on exact searching. Due to the parallelism in both the data level and instruction level, the indices can be built in less than two hours for 100 million nodes.

To those who want to apply on their own customized metric or optimize the indexing method, a hands-on tutorial is also provided.

Installation

CurvFaiss requires curvlearn and python3.

The preferred way for installing is via pip.

pip install curvfaiss

Since the source codes are compiled under CentOS, as for other platforms, we recommend users follow the tutorial to solve the code dependency.

Usage

A frequent problem is the runtime dependency.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`python -c "from os.path import abspath,dirname,join; import curvlearn as cl; print(join(dirname((dirname(cl.__file__))),'curvfaiss'))"`

Since IndexFlatStereographic is inherited from IndexFlat, the usage is the same with IndexFlatL2 in faiss except with an additional parameter curvature.

import curvfaiss

# build index, retrievaling in hyperbolic, euclidean, spherical metric with respect to curvature < 0, = 0, > 0
index = curvfaiss.IndexFlatStereographic(dim, curvature)
index.add(embedding)

# knn search
knn_distance, knn_index = index.search(query, topk)

print(knn_distance, knn_index)

See the full demo here!