-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose term frequency in Painless script score context #7558
Comments
@msfroh I think you mentioned that you had an idea for implementation. Can you add any more thoughts on your idea please? |
@macohen, we need to get through the "unknown unknowns" first. I would take a day or two to build a scrappy prototype for one function (probably |
Hi @russcam, can you speak a little more to the use case you have here? Just would be great to make sure the community understands the need and can provide input and feedback. |
Being able to incorporate def multiplier = params.multiplier;
for (int x = 0; x < params.fields.length; x++) {
if (_doc(params.fields[x]) != null) {
return multiplier * _doc(params.fields[x]).term_freq(params.term);
}
}
return params.default_value; which calculates a score based on multiplying the term frequency in the first field that exists in a list of fields, otherwise returning a default value. Scripted similarities allow a script to be used to specify how scores should be computed, but scripted similarities are not flexible enough. |
Hi @russcam, While exposing term frequency in script Here's an working example on my local setup:
Response:
We should also create an doc issue for how to use |
@noCharger term vectors and multi term vectors APIs are useful, but you need to know the ids of documents whose term statistics you are interested in, which you don't know ahead of time for a given search query. As such, I don't believe it to be a viable interim solution. |
💯 for the term vector API the client would be responsible.. 😞
I haven't dug deep into this part of the painless code for quite some time so I'd have to look closer but are you suggesting a "simple" solution like adding a new ScriptContext that exposes the term vector.termFreq for use in rescoring at query time? |
This is definitely pseudocode 🙂 You're right, they would need to come from the
A new |
@russcam would like to know if scripted similarity could be a potential solution for your use case because it already expose
Response:
|
@noCharger I don't believe script similarity is flexible enough because it doesn't allow parameters to be included into the similarity score on a per query basis, which is needed e.g. in the example in #7558 (comment), |
correct, while the |
@russcam - Feel free to review the linked RFC and provide your feedback. |
Close this issue since PR is merged and backported. |
Is your feature request related to a problem? Please describe.
In our current Solr setup, we make heavy use of Solr functions for implementing query time multiplicative and additive boosting. We are in the process of migrating from Solr to Elasticsearch, and porting our querying logic over. The majority of Solr functions can be implemented with Painless scripting in a function score query script score to provide multiplicative boosting, however there are a few functions that cannot be, such as
termfreq
tf
totaltermfreq
sumtotaltermfreq
payload
the biggest pain in particular is the lack of
termfreq
, which we use as part of a boost for calculating popularity, a calculation that incorporates a baseline value so that new content of unknown popularity is factored in.Describe the solution you'd like
I'm opening this issue to request that Painless scripting support retrieving term frequencies in a script score context.
Describe alternatives you've considered
In order to work around not having access to
termfreq
in a script score context, we have to write a custom script engine and script plugin that is able to look up term frequencies from thePostingsEnum
. This is less than desirable becauseNote that AWS OpenSearch service does not support custom script plugins.
The text was updated successfully, but these errors were encountered: