Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ToParentBlockJoinQuery's explain should depend on its ScoreMode #12204

Closed
kashkambath opened this issue Mar 13, 2023 · 11 comments
Closed

ToParentBlockJoinQuery's explain should depend on its ScoreMode #12204

kashkambath opened this issue Mar 13, 2023 · 11 comments
Assignees

Comments

@kashkambath
Copy link

Description

Currently, ToParentBlockJoinQuery's explain() returns the explanation for the highest-scoring child document, regardless of the ScoreMode. If the ScoreMode is ScoreMode.Max, this makes sense. However, for other ScoreModes (Min, Avg, Total), this output is not as helpful for the user. We should make the explain output for ToParentBlockJoinQuery depend on the ScoreMode used in the query.

@MarcusSorealheis
Copy link
Contributor

@mkhludnev do you have ideas on how we should best solve this one? I think there are many paths we could take but open to the guidance from committers watching the issue.

@mkhludnev
Copy link
Member

@MarcusSorealheis, Well.. I don't have an elegant solution in mind, beside of almost fully copying score logic into explain or some crazy hairy polimorphish. Probably, somewhat constructive approach would be: collect all matching children explains, then either put only one which score matches parent one (handling min,max) or put all children explain otherwise (handling sum,avg).

@MarcusSorealheis
Copy link
Contributor

I think both of those options make sense. I think polymorphism would be pretty elegant, but copying the score logic scares me a bit. I'll think about about these and other options. There's quite a lot of work in this area in general, so I also need to weigh that fact against completing this task.

@MarcusSorealheis
Copy link
Contributor

Are users asking for scoreMode's other than max? I'm not going to suggest they remodel their data from a repo they don't read. I'm really curious.

If a document is returned as a top doc in any scored query, the reason it was returned is almost because it was the highest scoring document. In the case of a differing score mode this query, it was the biggest contributor to the relevance scoring document of the child docs. Isn't it most meaningful?

I worry somewhat about investing even a little time into ParentBlockJoinQuery because I hope for one day where there is a new approach. However, explain is pretty important for debugging and it's on my mind. I will check back in tomorrow and possibly build a draft.

@mkhludnev
Copy link
Member

mkhludnev commented Apr 7, 2023

Are users asking for scoreMode's other than max?

Sure. Sum/total is definitely one of useful options. There might all variety of them. There might be a function query or field value in the child query. eg we can score desc parents but picking cheapest, oldest child as meaningful scoring factor. I'd say an opposite: current modes are too limited and might require some extension in future.
Thanks @MarcusSorealheis for your involvement.

@MarcusSorealheis
Copy link
Contributor

@mkhludnev and @kashkambath I put up a PR. I took a pretty basic approach. Computing the score based on score mode is pretty straightforward.

@MarcusSorealheis
Copy link
Contributor

MarcusSorealheis commented Apr 27, 2023

I made it a draft, though, because there's still a lot of work to do. After two hours, I haven't figured it out, but I am on the right path. There's a lot more to it.

@MarcusSorealheis
Copy link
Contributor

MarcusSorealheis commented May 1, 2023

@mkhludnev or @kashkambath I'm almost done with the PR but there are a few problems:

  1. The tests failed in floating ways initially because in some places score is a double, and others it is a float. Is there a place where the choice between the two is documented or a reason for the difference? I don't think I'm the one to come and start changing things without a good reason or reviewing the documentation/prior references.
  2. The tests were made to work but not truly. Always returning best score was not a functional use of score mode, as it basically ignored it. For example, I think we need to be clear on the behavior of ToParentBlockJoinQuery. For example, should we return a Exaplanation.match() if a parent matches a query, but no child of the parent doc matches the query? In this case, would the number of matches be zero, despite the fact that we returned a document?

Some details about this query seem and aligning on the behavior/requirements would be helpful before I go further. After digging in a couple times over the past week, I have some rough idea about how we could make things work, but I am curious about how they should work. I don't have strong opinions other than the fact that this work is important. Explain is very useful for debugging.

@mkhludnev
Copy link
Member

For example, should we return a Exaplanation.match() if a parent matches a query, but no child of the parent doc matches the query?
I missed the point. If no child matches ToParentBJQ.subquery, parent doc doesn't match ToPBJQ.

@MarcusSorealheis
Copy link
Contributor

@mkhludnev, @kashkambath, et. al this issue can be closed as the PR has been merged and backported to 9_x

@mkhludnev
Copy link
Member

should be at 9.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants