Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filter function for NeuralQueryBuilder and HybridQueryBuilder and… #1206

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chloewqg
Copy link

@chloewqg chloewqg commented Mar 3, 2025

… modify fromXContent function in HybridQueryBuilder to support filter field.

Description

Add filter function to NeuralQueryBuilder and HybridQueryBuilder, which allows to push down non null filter. One exception is that when HybridQueryBuilder has a nested HybridQueryBuilder, then calling the filter function will cause UnsupportedOperationException

Related Issues

Resolves #1206
Related: #282, #1135

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

if (!validateFilterParams(filter)) {
return this;
}
HybridQueryBuilder compoundQueryBuilder = new HybridQueryBuilder();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to create a new query builder instead of returning this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find validateFilterParams method. Besides that the naming is confusing, if you conclude validity of the object based on return value that needs to be in the name, something like isValidFilterParams. Also change negation to an explicit == false

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is added in core. opensearch-project/OpenSearch#17409
Should we have to use that method? Can we just do null check explicitly here instead of relying on the method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to create a new query builder instead of returning this?

Because I saw the the queries of HybridQueryBuilder is marked as final. https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/query/HybridQueryBuilder.java#L56

So I think as we need to reassign a list of queries to the queries variable inside the HybridQueryBuilder. So we can only create a new HybridQueryBuilder to solve this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this double the filters sizes? Like if there are two filters, when we add(query.filter(filter)), wouldn't this create duplicate filters?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Martin for the comment! Actually the function name validateFilterParams was suggested by Daniel in the pull request of core package. opensearch-project/OpenSearch#17409 , is it okay to leave as what it is for now?

if method is already there let's utilize it. Please use == false syntax instead of negation, I see you've used it in other places in your PR

Yep. Updated in the newer commit

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quote reply

I think the filter input check is common across all the query builders. Although current check is simple of whether it equals to null, adding a function here can allow complicated logic changes in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do something like queries.set(i, queries.get(i).filter(filter));

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha!

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a separate issue and the whole design/RFC for hybrid query. Is this a building block for that design or this is a conflicting issue?
Ref: #282, #1135

if (!validateFilterParams(filter)) {
return this;
}
HybridQueryBuilder compoundQueryBuilder = new HybridQueryBuilder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find validateFilterParams method. Besides that the naming is confusing, if you conclude validity of the object based on return value that needs to be in the name, something like isValidFilterParams. Also change negation to an explicit == false

@chloewqg
Copy link
Author

chloewqg commented Mar 4, 2025

there is a separate issue and the whole design/RFC for hybrid query. Is this a building block for that design or this is a conflicting issue? Ref: #282, #1135

Hi Martin, this is a building block for that design. cc @bzhangam

@martin-gaievski
Copy link
Member

there is a separate issue and the whole design/RFC for hybrid query. Is this a building block for that design or this is a conflicting issue? Ref: #282, #1135

Hi Martin, this is a building block for that design. cc @bzhangam

Got it, makes sense then. Can you please mention same in the PR description and also update the link to the feature request issue under "Related issues"? If the filtering feature requires multiple PRs/building blocks I suggest you create a high-level meta issue and host smaller PRs under that meta issue.

Copy link

codecov bot commented Mar 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.90%. Comparing base (5f25d6c) to head (cb7dff7).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1206      +/-   ##
============================================
+ Coverage     81.80%   81.90%   +0.09%     
+ Complexity     2606     1312    -1294     
============================================
  Files           190       95      -95     
  Lines          8922     4482    -4440     
  Branches       1520      765     -755     
============================================
- Hits           7299     3671    -3628     
+ Misses         1032      514     -518     
+ Partials        591      297     -294     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chloewqg
Copy link
Author

chloewqg commented Mar 4, 2025

there is a separate issue and the whole design/RFC for hybrid query. Is this a building block for that design or this is a conflicting issue? Ref: #282, #1135

Hi Martin, this is a building block for that design. cc @bzhangam

Got it, makes sense then. Can you please mention same in the PR description and also update the link to the feature request issue under "Related issues"? If the filtering feature requires multiple PRs/building blocks I suggest you create a high-level meta issue and host smaller PRs under that meta issue.

Gotcha. Thanks!

* }
*/
@SneakyThrows
public void testQueryWithBoostAndFilterApplied() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we add this IT? I think the filter function is an existing feature for neural query?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are testing whether adding another filter on NeuralQuery would be function as expected.

}
}*/
@SneakyThrows
private void testRangeQueryAsFilter(String indexName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we add this IT in PostFilterIT? Filter is different from post filter. I think we should either add it to the HybridQueryIT or create a new HybridQueryFilterIT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to raise another PR to add tests for BWC? BWC tests are under the qa folder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we add this IT in PostFilterIT? Filter is different from post filter. I think we should either add it to the HybridQueryIT or create a new HybridQueryFilterIT.

+1, better add new class HybridQueryFilterIT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to raise another PR to add tests for BWC? BWC tests are under the qa folder.

I think the current bwc tests is blocking the merge. So I will do the bwc tests here in the same PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we add this IT in PostFilterIT? Filter is different from post filter. I think we should either add it to the HybridQueryIT or create a new HybridQueryFilterIT.

Sure. Will create a new HybridQueryFilterIT

… modify fromXContent function in HybridQueryBuilder to support filter field.

Signed-off-by: Chloe Gao <chloewq@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants