-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENHANCEMENT] Configurable index settings for hnsw #206
Comments
OverviewMarqo uses Hierarchical Navigable Small World (HNSW) graphs to perform approximate nearest neighbour search (ANNs). HNSW has two keys benefits compared to other ANNs methods, namely high recall and low search latency. HNSW requires several hyperparameters to be selected for both indexing and search. From these, often, recall and latency are in tension: hyperparameters that decrease latency will have an adverse decrease in recall. Hyperparameter selection then, is a engineering design choice that can be tailored with respect to the use case. Currently, these hyperparameters are fixed within Marqo (see code reference, here). This enhancement is to allow HNSW parameters to be configured on an index level. Proposed SolutionThe proposed solution is to extend an index's {
"index_defaults": {
"treat_urls_and_pointers_as_images": false,
"model": "hf/all_datasets_v4_MiniLM-L6",
"normalize_embeddings": true,
"text_preprocessing": {
"split_length": 2,
"split_overlap": 0,
"split_method": "sentence"
},
"image_preprocessing": {
"patch_method": null
}
},
"number_of_shards": 5
} This can be augmentated as follows (with defaults as specified): {
"index_defaults": {
"treat_urls_and_pointers_as_images": false,
"model": "hf/all_datasets_v4_MiniLM-L6",
"normalize_embeddings": true,
"text_preprocessing": {
"split_length": 2,
"split_overlap": 0,
"split_method": "sentence"
},
"image_preprocessing": {
"patch_method": null
},
"ann_parameters" : {
"method": "hnsw",
"space_type": "cosinesimil",
"method_parameters": {
"ef_construction": 128,
"m": 24
}
}
},
"number_of_shards": 5
} ImplementationTwo parts:
Backwards & Forwards Compatibility
|
Of note, opensearch-KNN default setting are specified here. |
We should improve readability for parameters in |
Is your feature request related to a problem? Please describe.
make the hnsw settings configurable per index
https://github.com/marqo-ai/marqo/blob/mainline/src/marqo/tensor_search/backend.py#L83-L106
Describe the solution you'd like
have (optional) settings in the
index_defaults
to set m, ef_construction and metric for the index hnsw settingsDescribe alternatives you've considered
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: