-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose iterator over query terms in TermInSetQuery #12280
Conversation
Please lets not go this path. It is a mutlitermquery, if you want to change how it works behind the scenes, you "plugin" with RewriteMethod |
Thanks @rmuir. It would be ideal if we could do this through RewriteMethod, but I'm not sure how we can actually accomplish that. The problem is in the implementation of Update: I added a sandbox query to this PR just to demonstrate the use-case for extending |
* TermsEnum#seekCeil(BytesRef)} to produce a terms iterator, which is compatible with {@code | ||
* BloomFilteringPostingsFormat}. | ||
*/ | ||
public class PKTermInSetQuery extends TermInSetQuery { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is for demo purposes only. I'm not suggesting we merge it as part of this PR. I only want to demonstrate how a class might leverage getQueryTerms
.
I still think it doesn't make sense to me to expose this. As i said on the dev list, your problem is that you use a custom postings format and you want it to accelerate the intersection. The cleanest way to do this, is to handoff the intersection to the postingsformat directly, rather than worry about seekCeil/seekExact and subclassing queries or exposing stuff. It should give a performance improvement using the default postings format as well (at least it did for other queries when mikemccand added it) So, IMO we should try to fix this query to use Terms.intersect() [see #12176], then override Terms.intersect for the BloomPostingsFormat to make use of the bloom filters to speed up intersection. |
Got it, thanks @rmuir. I hadn't seen your dev list reply yet. This all makes sense. I'll close this out and have a look at leveraging intersect. Seems like a better path forward. Thanks! |
Description
I'd like to propose we add an API to
TermInSetQuery
that exposes an iterator over the query terms. This is useful for extendingTermInSetQuery
. One concrete use-case for this is needing to change the way term intersection happens with the indexed terms dictionary to support bloom filters, as described in this email thread.I don't think there's any harm in exposing this, but am interested in feedback of course! This abstraction decouples the current prefix-coding implementation details, so it seems clean.