Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to use corpus for vector search params #459

Merged
merged 2 commits into from
Feb 9, 2024

Conversation

VijayanB
Copy link
Member

@VijayanB VijayanB commented Feb 8, 2024

Description

This commit allows vector search paramsource to accept corpus as dataset similar to bulk.
Added tests to verify the corpus can be parsed and used as dataset path/neighbor's dataset path/query's dataset path.

Issues Resolved

Part of #442

Testing

  • New functionality includes testing

[Describe how this change was tested]

tests/workload/params_test.py::VectorSearchParamSourceTests::test_corpus_contains_more_than_one_files PASSED [ 95%]
tests/workload/params_test.py::VectorSearchParamSourceTests::test_corpus_not_found_in_workload PASSED [ 95%]
tests/workload/params_test.py::VectorSearchParamSourceTests::test_invalid_data_set_format PASSED [ 95%]
tests/workload/params_test.py::VectorSearchParamSourceTests::test_invalid_data_set_path PASSED [ 95%]
tests/workload/params_test.py::VectorSearchParamSourceTests::test_missing_corpus PASSED [ 95%]
tests/workload/params_test.py::VectorSearchParamSourceTests::test_missing_data_set_path_or_corpus PASSED [ 96%]
tests/workload/params_test.py::VectorSearchParamSourceTests::test_missing_params PASSED [ 96%]
tests/workload/params_test.py::VectorSearchParamSourceTests::test_partition_bigann PASSED [ 96%]
tests/workload/params_test.py::VectorSearchParamSourceTests::test_partition_hdf5 PASSED [ 96%]
tests/workload/params_test.py::VectorSearchParamSourceTests::test_partition_hdf5_corpus PASSED [ 96%]


================= 1221 passed, 5 skipped, 4 warnings in 17.16s =================

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

osbenchmark/workload/params.py Show resolved Hide resolved
osbenchmark/workload/params.py Outdated Show resolved Hide resolved
osbenchmark/workload/params.py Outdated Show resolved Hide resolved
osbenchmark/workload/params.py Outdated Show resolved Hide resolved
osbenchmark/workload/params.py Outdated Show resolved Hide resolved
@VijayanB VijayanB force-pushed the add-corpora-data-set branch from bd16422 to e389e23 Compare February 8, 2024 20:42
@VijayanB VijayanB requested a review from jmazanec15 February 8, 2024 20:48
This commit allows vector search paramsource to accept corpus
as dataset similar to bulk. Added tests to verify the corpus can be parsed and used
as dataset path/neighbor's dataset path/query's dataset path.

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
@VijayanB VijayanB force-pushed the add-corpora-data-set branch from e389e23 to 844168f Compare February 8, 2024 21:33
@VijayanB VijayanB requested a review from jmazanec15 February 8, 2024 22:19
Copy link
Collaborator

@IanHoang IanHoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just requested for more clarity in one exception message.

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
@VijayanB VijayanB force-pushed the add-corpora-data-set branch from c6767e1 to 5427c9e Compare February 9, 2024 19:04
@VijayanB VijayanB requested a review from IanHoang February 9, 2024 19:05
@VijayanB
Copy link
Member Author

VijayanB commented Feb 9, 2024

@IanHoang Please add backport label. Thanks.

@IanHoang IanHoang merged commit ab65e1e into opensearch-project:main Feb 9, 2024
8 checks passed
@IanHoang IanHoang added 1.0 Backport to patch version branch 1.X Backport to minor version branch labels Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.X Backport to minor version branch 1.0 Backport to patch version branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants