Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Optimize native merge for native indices #1086

Open
jmazanec15 opened this issue Aug 31, 2023 · 4 comments
Open

[FEATURE] Optimize native merge for native indices #1086

jmazanec15 opened this issue Aug 31, 2023 · 4 comments
Labels
Features Introduces a new unit of functionality that satisfies a requirement

Comments

@jmazanec15
Copy link
Member

Currently, when building a newly merged segment for our native engines, we simply build the index structure from scratch. This can consume a significant amount of compute. In Lucene, apache/lucene#12050, we added the ability to initialize the HNSW graph in a newly created segment from the largest graph (that does not contain deletes) in the collection of to be merged segments.

We should have this option for the native indices as well. At the moment, we simply build the index from scratch. See https://github.com/opensearch-project/k-NN/blob/2.9.0.0/src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java#L201-L215.

In faiss, they do have options for some indices (IVF, not HNSW/NSG) to merge from other indices. See https://github.com/facebookresearch/faiss/blob/main/faiss/Index.h#L270-L279. We should provide support in the plugin to use these for the available index types.

@jmazanec15 jmazanec15 added Features Introduces a new unit of functionality that satisfies a requirement and removed untriaged labels Aug 31, 2023
@jmazanec15 jmazanec15 changed the title Optimize native merge for native indices [FEATURE] Optimize native merge for native indices Aug 31, 2023
@vamshin vamshin moved this from Backlog to Backlog (Hot) in Vector Search RoadMap Sep 7, 2023
@navneet1v
Copy link
Collaborator

@jmazanec15 I might have limited understanding here but did we explore this function on the index class: https://github.com/facebookresearch/faiss/blob/main/faiss/Index.h#L97-L104 So, if we can load the one of the index and then run this function with the ids of the other segment won't that work?

@jmazanec15
Copy link
Member Author

@navneet1v yes that can work! Might be a good first step as well. merge offers potential improvements, however. For example, for IVF, the centroids are the same and all the points have already been assigned. When adding points from one index to another, the points will need to again repeat the search for nearest centroids. With merge, this could be avoided - centroids have already been computed. For graph based indices, there is some potential to try to re-use some of the graph structure on merge - not implemented yet in faiss though.

@navneet1v
Copy link
Collaborator

@jmazanec15 So, what I am hearing is we can support reusing of the graphs in HNSW too. May be not the optimal but still better. can we change the description to add these details.

Also, if I am wrong we will have different strategies for type of indexes.

@luyuncheng
Copy link
Collaborator

i tried to use faiss#merge_from method , but i am wondering how to keep the id order with lucene id

@vamshin vamshin moved this from Backlog (Hot) to Next (Next Quarter) in Vector Search RoadMap Oct 5, 2023
@vamshin vamshin moved this from Next (Next Quarter) to Now(This Quarter) in Vector Search RoadMap Oct 5, 2023
@vamshin vamshin moved this from Now(This Quarter) to Next (Next Quarter) in Vector Search RoadMap Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement
Projects
Status: Now(This Quarter)
Development

No branches or pull requests

3 participants