-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Optimize native merge for native indices #1086
Comments
@jmazanec15 I might have limited understanding here but did we explore this function on the index class: https://github.com/facebookresearch/faiss/blob/main/faiss/Index.h#L97-L104 So, if we can load the one of the index and then run this function with the ids of the other segment won't that work? |
@navneet1v yes that can work! Might be a good first step as well. merge offers potential improvements, however. For example, for IVF, the centroids are the same and all the points have already been assigned. When adding points from one index to another, the points will need to again repeat the search for nearest centroids. With merge, this could be avoided - centroids have already been computed. For graph based indices, there is some potential to try to re-use some of the graph structure on merge - not implemented yet in faiss though. |
@jmazanec15 So, what I am hearing is we can support reusing of the graphs in HNSW too. May be not the optimal but still better. can we change the description to add these details. Also, if I am wrong we will have different strategies for type of indexes. |
i tried to use faiss#merge_from method , but i am wondering how to keep the id order with lucene id |
Currently, when building a newly merged segment for our native engines, we simply build the index structure from scratch. This can consume a significant amount of compute. In Lucene, apache/lucene#12050, we added the ability to initialize the HNSW graph in a newly created segment from the largest graph (that does not contain deletes) in the collection of to be merged segments.
We should have this option for the native indices as well. At the moment, we simply build the index from scratch. See https://github.com/opensearch-project/k-NN/blob/2.9.0.0/src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java#L201-L215.
In faiss, they do have options for some indices (IVF, not HNSW/NSG) to merge from other indices. See https://github.com/facebookresearch/faiss/blob/main/faiss/Index.h#L270-L279. We should provide support in the plugin to use these for the available index types.
The text was updated successfully, but these errors were encountered: