Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a data corpora source-url field to specify a corpus file directly #461

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

gkamat
Copy link
Collaborator

@gkamat gkamat commented Feb 9, 2024

Description

Files in data corpora may have paths that don't conform to the "corpus URL / filename" convention, for instance with object stores. This change adds support for a source-url field that takes precedence over the corpus URL if it is specified. Existing workloads will continue to work as is.

Testing

  • New functionality includes additional unit tests.

Ran unit tests and integ tests, and a couple of workloads in test mode.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Comment on lines 320 to 321
base_url="http://benchmarks.elasticsearch.org/corpora",
source_url="http://benchmarks.elasticsearch.org/corpora/unit-test/docs.json.bz2",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Although this is for a unittest, is there another link we could use to test it out with instead of one from elastic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. These were cloned from other tests that have that link. Will go through all and replace them in a separate commit.

…irectly.

Signed-off-by: Govind Kamat <govkamat@amazon.com>
@gkamat gkamat merged commit 822f31d into opensearch-project:main Feb 15, 2024
8 checks passed
@gkamat gkamat deleted the source-url branch June 20, 2024 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants