Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single KNN Field Optimisation #530

Merged
merged 36 commits into from
Jul 16, 2023
Merged

Single KNN Field Optimisation #530

merged 36 commits into from
Jul 16, 2023

Conversation

pandu-k
Copy link
Collaborator

@pandu-k pandu-k commented Jul 6, 2023

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
    Optimisation

  • What is the current behavior? (You can also link to an open issue here)
    Every field in a document has a separate OpenSearch knn vector field, thus a different HNSW graph (eg. 5 fields create 5 HNSW graphs). Increases RPS at scale for indexes with multiple tensor fields.

  • What is the new behavior (if this is a feature change)?
    Each index will now have exactly 1 OpenSearch knn vector field: __vector_marqo_knn_field.

  1. index creation: will now fail on creation time (not vectorise time) if using a custom model with invalid/not provided dimensions in model_properties
  2. add_documents: only 1 knn vector field will be created per index
  3. search: filter_string and searchable_attributes will be combined into 1 filter for OpenSearch for result retrieval
  • filter strings also have stricter syntax. fields with special characters will not be filtered properly unless escaped with \\. This is for simpler and cleaner Lucene DSL integration.
  1. pagination: is now supported for both tensor and lexical search! (using limit and offset)
  2. boosting: will become slightly less effective, acting more as a result reranker
  • Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
    Yes. Indexes created with a version prior to this cannot be searched with this version and vice versa. Old indexes need to be reindexed.

  • Have unit tests been run against this PR? (Has there also been any additional testing?)
    Yes. Also manual testing on: boosting, score modifiers

  • Related API Test changes (link commit/PR here)
    TO FOLLOW

  • Related Python client changes (link commit/PR here)
    None

  • Related documentation changes (link commit/PR here)
    TO FOLLOW

  • Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes/features)
  • Docs have been added / updated (for bug fixes / features)

@pandu-k
Copy link
Collaborator Author

pandu-k commented Jul 6, 2023

This PR replaces the POC PR (comments from the POC PR may still be relevant): #509

@vicilliar vicilliar temporarily deployed to marqo-test-suite July 12, 2023 05:01 — with GitHub Actions Inactive
@vicilliar vicilliar temporarily deployed to marqo-test-suite July 12, 2023 05:01 — with GitHub Actions Inactive
@vicilliar vicilliar temporarily deployed to marqo-test-suite July 12, 2023 05:02 — with GitHub Actions Inactive
@vicilliar vicilliar changed the title Single knn field optimisation Single KNN Field Optimisation Jul 12, 2023
@vicilliar vicilliar temporarily deployed to marqo-test-suite July 14, 2023 01:17 — with GitHub Actions Inactive
@vicilliar vicilliar temporarily deployed to marqo-test-suite July 14, 2023 01:17 — with GitHub Actions Inactive
@vicilliar vicilliar marked this pull request as ready for review July 14, 2023 04:32
@vicilliar vicilliar temporarily deployed to marqo-test-suite July 14, 2023 05:59 — with GitHub Actions Inactive
@vicilliar vicilliar temporarily deployed to marqo-test-suite July 14, 2023 07:11 — with GitHub Actions Inactive
@vicilliar vicilliar temporarily deployed to marqo-test-suite July 14, 2023 07:11 — with GitHub Actions Inactive
@vicilliar vicilliar temporarily deployed to marqo-test-suite July 14, 2023 07:12 — with GitHub Actions Inactive
@vicilliar vicilliar temporarily deployed to marqo-test-suite July 14, 2023 08:08 — with GitHub Actions Inactive
@vicilliar vicilliar temporarily deployed to marqo-test-suite July 14, 2023 08:08 — with GitHub Actions Inactive
@vicilliar vicilliar temporarily deployed to marqo-test-suite July 14, 2023 08:08 — with GitHub Actions Inactive
return to_be_sanitised


def add_chunks_prefix_to_filter_string_fields(filter_string: Optional[str], simple_properties: typing.Iterable) -> str:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if filter_string MUST NOT be None, it should be reflected in the signature:

Omit Optional[] and just leave filter_string: str

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter string can be None, actually this is what is passed when the user does not input a filter string. simple_properties is the one that cannot be None.

@@ -81,6 +81,47 @@
logger = get_logger(__name__)


def _get_dimension_from_model_properties(model_properties: dict):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for future work: this, and the other get_model_properties() function can be moved to its file, for example: models/model_properties.py

)


def _add_knn_field(ix_settings: dict):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for future work: this can be moved, alongside create_index, into a dedicated file (like create_index.py)

@vicilliar vicilliar merged commit aca7cd6 into mainline Jul 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants