-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single KNN Field Optimisation #530
Conversation
* made test_search pass * Updated CUDA version to match mainline
This PR replaces the POC PR (comments from the POC PR may still be relevant): #509 |
return to_be_sanitised | ||
|
||
|
||
def add_chunks_prefix_to_filter_string_fields(filter_string: Optional[str], simple_properties: typing.Iterable) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if filter_string
MUST NOT be None
, it should be reflected in the signature:
Omit Optional[]
and just leave filter_string: str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter string can be None, actually this is what is passed when the user does not input a filter string. simple_properties is the one that cannot be None.
@@ -81,6 +81,47 @@ | |||
logger = get_logger(__name__) | |||
|
|||
|
|||
def _get_dimension_from_model_properties(model_properties: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for future work: this, and the other get_model_properties()
function can be moved to its file, for example: models/model_properties.py
) | ||
|
||
|
||
def _add_knn_field(ix_settings: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for future work: this can be moved, alongside create_index, into a dedicated file (like create_index.py
)
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
Optimisation
What is the current behavior? (You can also link to an open issue here)
Every field in a document has a separate OpenSearch knn vector field, thus a different HNSW graph (eg. 5 fields create 5 HNSW graphs). Increases RPS at scale for indexes with multiple tensor fields.
What is the new behavior (if this is a feature change)?
Each index will now have exactly 1 OpenSearch knn vector field:
__vector_marqo_knn_field
.dimensions
inmodel_properties
filter_string
andsearchable_attributes
will be combined into 1 filter for OpenSearch for result retrieval\\
. This is for simpler and cleaner Lucene DSL integration.Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
Yes. Indexes created with a version prior to this cannot be searched with this version and vice versa. Old indexes need to be reindexed.
Have unit tests been run against this PR? (Has there also been any additional testing?)
Yes. Also manual testing on: boosting, score modifiers
Related API Test changes (link commit/PR here)
TO FOLLOW
Related Python client changes (link commit/PR here)
None
Related documentation changes (link commit/PR here)
TO FOLLOW
Please check if the PR fulfills these requirements