Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpanBoostQuery.rewrite was incomplete for boost==1 factor [LUCENE-10477] #11513

Closed
asfimport opened this issue Mar 21, 2022 · 15 comments
Closed

Comments

@asfimport
Copy link

asfimport commented Mar 21, 2022

(This bug report concerns pre-9.0 code only but it's so subtle that it warrants sharing I think and maybe fixing if there was to be a 8.11.2 release in future.)

Some existing code e.g. https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/queryparser/src/java/org/apache/lucene/queryparser/xml/builders/SpanNearBuilder.java#L54 adds a SpanBoostQuery even if there is no boost or the boost factor is 1.0 i.e. technically wrapping is unnecessary.

Query rewriting should counteract this somewhat except it might not e.g. note at https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanBoostQuery.java#L81-L83 how the rewrite is a no-op i.e. this.query.rewrite is not called!

This can then manifest in strange ways e.g. during highlighting:

...
java.lang.IllegalArgumentException: Rewrite first!
	at org.apache.lucene.search.spans.SpanMultiTermQueryWrapper.createWeight(SpanMultiTermQueryWrapper.java:99)
	at org.apache.lucene.search.spans.SpanNearQuery.createWeight(SpanNearQuery.java:183)
	at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:295)
	...

This stacktrace is not from 8.11.1 code but the general logic is that at line 293 rewrite was called (except it didn't a full rewrite because of SpanBoostQuery wrapping around the SpanNearQuery}) and so then at line 295 the IllegalArgumentException("Rewrite first!") arises: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/search/spans/SpanMultiTermQueryWrapper.java#L101


Migrated from LUCENE-10477 by Christine Poerschke (@cpoerschke), resolved May 18 2022
Linked issues:

Pull requests: apache/lucene-solr#2650, #758

@asfimport
Copy link
Author

Adrien Grand (@jpountz) (migrated from JIRA)

It should be ok given that callers are expected to call Query#rewrite until the return value is the same query as the input query? So the stack trace you saw would suggest that some callers of Query#rewrite missed that they might need to rewrite multiple times?

@asfimport
Copy link
Author

Christine Poerschke (@cpoerschke) (migrated from JIRA)

For 8.0 this should be the equivalent code area i.e. (one) rewrite call at line 309 and then line 311 weight creation: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.0.0/lucene/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java#L309-L311

@asfimport
Copy link
Author

Adrien Grand (@jpountz) (migrated from JIRA)

Thanks Christine, this is where the bug lies in my opinion. It should rewrite multiple times if necessary like IndexSearcher does: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.0.0/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L665-L671

@asfimport
Copy link
Author

Christine Poerschke (@cpoerschke) (migrated from JIRA)

... callers are expected to call {{Query#rewrite}} until the return value is the same query as the input query? ...

Interesting, didn't know that. Would it be worth documenting e.g. in the javadocs? https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/Query.java#L73-L79

@asfimport
Copy link
Author

Christine Poerschke (@cpoerschke) (migrated from JIRA)

#737 is the alternative pull request.

@asfimport
Copy link
Author

asfimport commented Mar 22, 2022

ASF subversion and git services (migrated from JIRA)

Commit ca252d6 in lucene's branch refs/heads/main from Christine Poerschke
https://gitbox.apache.org/repos/asf?p=lucene.git;h=ca252d6

#11500, LUCENE-10477: WeightedSpanTermExtractor.extractWeightedSpanTerms to rewrite sufficiently (#737)

@asfimport
Copy link
Author

asfimport commented Mar 22, 2022

ASF subversion and git services (migrated from JIRA)

Commit e7367f3 in lucene's branch refs/heads/branch_9x from Christine Poerschke
https://gitbox.apache.org/repos/asf?p=lucene.git;h=e7367f3

#11500, LUCENE-10477: WeightedSpanTermExtractor.extractWeightedSpanTerms to rewrite sufficiently (#737)

(cherry picked from commit ca252d6)

@asfimport
Copy link
Author

Christine Poerschke (@cpoerschke) (migrated from JIRA)

... Would it be worth documenting e.g. in the javadocs? ...

#758 opened for that.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 779c332 in lucene's branch refs/heads/main from Christine Poerschke
https://gitbox.apache.org/repos/asf?p=lucene.git;h=779c332

LUCENE-10477: mention 'call multiple times' in Query.rewrite javadoc (#758)

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit ffb3168 in lucene's branch refs/heads/branch_9x from Christine Poerschke
https://gitbox.apache.org/repos/asf?p=lucene.git;h=ffb3168

LUCENE-10477: mention 'call multiple times' in Query.rewrite javadoc (#758)

(cherry picked from commit 779c332)

@asfimport
Copy link
Author

Christine Poerschke (@cpoerschke) (migrated from JIRA)

Thanks @jpountz for the collaboration here!

@asfimport
Copy link
Author

Adrien Grand (@jpountz) (migrated from JIRA)

@cpoerschke Should we backport it to 8.11 now that a 8.11.2 release is being considered?

@asfimport
Copy link
Author

Christine Poerschke (@cpoerschke) (migrated from JIRA)

re-opening for potential backport: apache/lucene-solr#2656

@asfimport
Copy link
Author

asfimport commented May 18, 2022

ASF subversion and git services (migrated from JIRA)

Commit ece0f43b591d28cc7d41ff57b1db6ddcf4df6f8d in lucene-solr's branch refs/heads/branch_8_11 from Christine Poerschke
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ece0f43b591

#11500, LUCENE-10477: WeightedSpanTermExtractor.extractWeightedSpanTerms to rewrite sufficiently (#2656)

Also mention 'call multiple times' in Query.rewrite javadoc.

@asfimport
Copy link
Author

Alan Woodward (@romseygeek) (migrated from JIRA)

Bulk close for 9.2.0 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment