[DOCS] add details on highlighting #28802

mayya-sharipova · 2018-02-23T15:21:03Z

Add additional information on inner working of highlighters

cexmmaqsood · 2018-02-28T16:31:14Z

Hi, I'm not sure exactly about the process of commenting on open PRs, but I created the original ticket for this (#28681).

This PR is excellent. It gives a lot of information. I have a few additional requests for info added there if possible.

Can you please go into detail about how the order parameter works? I created a ticket for it (Highlight Field Order #26612) and a few details are unclear:
-- What are the options available for the 'order' parameter?
-- Can you confirm that the order:sort works as expected (i.e fragments get scored, highest scored fragments are returned first)
Can you please provide an example of a complex query in regards to this?

The goal is to highlight only those terms that participated in generating the 'hit' on the document.
For some complex queries, this is still work in progress.

Is it recommended by ES to always highlight on the same field as searched? As you know a field can have many subfields and those subfields can be analyzed in different ways, so it is possible right now (without an error thrown) to search on subfield A, for example, but highlight on subfield B. In my experience, this has resulted in unexpected behaviour but I had no concrete evidence/documentation to support that theory.
Can you name/explain the algorithm that is used?
Plain highlighter uses a very simple +algorithm to break the token stream into fragments.
It would be really interesting if we quickly the low level information used is stated.
Then this obtained low-level match information is used to score each individual fragment.

Again thanks so much for the documentation. It helps immensely and if I had this last year when I was developing highlighting for my company it would have sped things up.

mayya-sharipova · 2018-03-01T02:11:29Z

@cexmmaqsood thanks for your comments, we will try to address them in the update

Add additional information on inner working of highlighters Closes elastic#28681, elastic#28816

mayya-sharipova · 2018-03-07T21:59:03Z

@cexmmaqsood Thanks very much again for your feedback.

Addressing your comments:

Can you please go into detail about how the order parameter works?

the highlighting documentation was updated accordingly.

Can you please provide an example of a complex query in regards to this?
The goal is to highlight only those terms that participated in generating the 'hit' on the document. For some complex queries, this is still work in progress.

An example of this complex query can be found in this issue: #28626

Is it recommended by ES to always highlight on the same field as searched? As you know a field can have many subfields and those subfields can be analyzed in different ways, so it is possible right now (without an error thrown) to search on subfield A, for example, but highlight on subfield B. In my experience, this has resulted in unexpected behaviour but I had no concrete evidence/documentation to support that theory.

I am not sure what you mean by subfields here. In case of nested fields, we have inner_hits option that allows you to highlight nested docs. There is also an option to use highlight_query that could be different from a search query. Overall, it is possible to use one field for search, and another for highlighting.

Can you name/explain the algorithm that is used? Plain highlighter uses a very simple algorithm to break the token stream into fragments.

The explanation that follows this line is the explanation of the algorithm.

It would be really interesting if we quickly the low level information used is stated.
Then this obtained low-level match information is used to score each individual fragment.

This low-level match information in a simplified form presented in the example at the end of the highlighting documentation:
onli -> positions(34, 35) weight:1
fox -> positions(34, 35) weight:1

mdcclv · 2018-03-16T15:52:43Z

This is such a useful explanation: thanks so much for writing it, and I'm very glad I stumbled upon it.

One other question raised by this section of the docs:

Fast vector highlighter

Can assign different weights to matches at different positions allowing for things like phrase matches being sorted above term matches when highlighting a Boosting Query that boosts phrase matches over term matches

How can I control this in a query? I am using a boosting query to prefer phrase matches to term matches, but my fvh highlights are coming through in document order.

mayya-sharipova · 2018-03-16T23:02:01Z

@mdcclv Thanks for the feedback!

How can I control this in a query? I am using a boosting query to prefer phrase matches to term matches, but my fvh highlights are coming through in document order.

For specific questions like this, please ask in https://discuss.elastic.co/
Just briefly reply here, you should use "order": "score" to output fragments by score. It is score that will incorporate your boost.

mdcclv · 2018-03-19T02:24:43Z

Thanks! But this new section says:
Only `unified` highlighter truly calculates the score, other highlighters with order: `sort` setting, will rank fragments by the number of query words found
does that mean that "order": "score" and "order": "sort" are both possibilities with fvh highlights?

The new documentation in this PR still seems to say that fvh highlighting only takes the "order": "sort" option, and only orders by number of query words, not by boosted values.

Add additional information on inner working of highlighters Closes elastic#28681, elastic#28816

mayya-sharipova · 2018-03-19T20:21:16Z

@mdcclv Sorry for that. I see my mistake, I have updated the PR accordingly: f6e1990. Thanks for noticing that.

In short, order can only be score or none. With order:score, fvh will rank fragments by the number of query words found in them, but it will incorporate boost as well.

jimczi

Thanks @mayya-sharipova and sorry for the late review.
I left some comments, I think we should focus on the unified highlighter which is the default in 6.0 and maybe have a separate page to describe the highlighter internals. This page is quite big already.

jimczi · 2018-04-13T12:35:55Z

docs/reference/search/request/highlighting.asciidoc

+`minimum_should_match` etc.), parts of documents may be highlighted
+that don't correspond to query matches. The work for fixing this is
+currenly in progress.
+


I think you should remove this part or say something like highlighters don't reflect the boolean logic of the query when extracting the terms to highlight.... I am not sure that we're going to "fix" it and we should not add this statement in the docs. I see pros and cons to do that and the solution that @romseygeek implemented might not be applicable in all cases (term vectors highlighting for instance).

@jimczi Thanks, Jim. I will rephrase this note as you suggested, and remove the part about fixing. I was asked to create this note, as there were several SDH and other issues, where highlighted fragments did not match a query, which confused users.

jimczi · 2018-04-13T12:39:01Z

docs/reference/search/request/highlighting.asciidoc

+order:: Sorts highlighted fragments by score when set to `score`.  By default,
+fragments will be output in the order they appear in the field (order: `none`).
+Setting this option to `score` will output the most relevant fragments first.
+Only `unified` highlighter truly calculates the score in a similar way the score


Each highlighter has its own relevancy "sauce" but I wouldn't say that the unified scoring is similar to the score of the query. It uses BM25 but that's just an internal detail, I think we should just say that each highlighter applies its own logic to compute the relevancy score and we can describe the details in the example below.

jimczi · 2018-04-13T12:41:32Z

docs/reference/search/request/highlighting.asciidoc

+A highlighter uses `pre-tags`, `post-tags` to encode highlighted terms.
+
+
+===== An example of the work of the plain highlighter


Can you describe how the unified highlighter works instead ? Maybe just the re-analysis mode (plain highlighting) since it is very similar to the plain highlighter. We want to deprecate (and remove) the plain highlighter so I'd prefer if we document something that will last longer.

jimczi · 2018-04-13T12:43:16Z

docs/reference/search/request/highlighting.asciidoc

+    {"token":"fox","start_offset":164,"end_offset":167,"position":35},
+    {"token":"world","start_offset":175,"end_offset":180,"position":38},
+    {"token":"you","start_offset":185,"end_offset":188,"position":40}
+


The unified highlighter does not index all terms but only those that can match the query. This is an issue currently in the plain highlighter since it caches all these terms in memory so we shouldn't document this and rely on the unified highlighter instead.

Add more explanation to some highlighting parameters Add a document describing how highlighters work internally.

…ng-docs

jimczi

I left a small comment but LGTM otherwise.
Thanks @mayya-sharipova !

jimczi · 2018-04-18T13:49:19Z

docs/reference/search/request/highlighters-internal.asciidoc

+Relevant settings:  `pre-tags`, `post-tags`.
+
+The goal is to highlight only those terms that participated in generating the 'hit' on the document.
+For some complex boolean queries, this is still work in progress.


Can you adapt the sentence that explains how highlighters don't reflect the boolean logic of a query and only extracts the leaf (terms, phrases, prefix, ...). We can change the note when we have an highlighter (or adapted the unified) that is able to handle boolean queries.

@jimczi Thanks for the review, Jim! I will change this sentence as you suggested.

mayya-sharipova · 2018-04-18T20:37:39Z

@elasticmachine run sample packaging tests

- add more explanation to some highlighting parameters - add a document describing how highlighters work internally

mayya-sharipova added >docs General docs changes v7.0.0 v6.3.0 labels Feb 23, 2018

mayya-sharipova requested a review from jimczi February 23, 2018 15:21

mayya-sharipova force-pushed the update-highlighting-docs branch from fd2fd8c to 547f9c5 Compare March 7, 2018 17:52

[DOCS] add details on highlighting

8fae9f8

Add additional information on inner working of highlighters Closes elastic#28681, elastic#28816

mayya-sharipova force-pushed the update-highlighting-docs branch from 547f9c5 to 8fae9f8 Compare March 7, 2018 21:41

[DOCS] add details on highlighting

f6e1990

Add additional information on inner working of highlighters Closes elastic#28681, elastic#28816

mayya-sharipova force-pushed the update-highlighting-docs branch from ed6c860 to f6e1990 Compare March 19, 2018 20:18

mayya-sharipova added the review label Mar 20, 2018

mayya-sharipova added 2 commits April 12, 2018 12:59

Merge branch 'master' into update-highlighting-docs

090378e

Add note about highlighted query matches

131c1ac

jimczi requested changes Apr 13, 2018

View reviewed changes

mayya-sharipova added 3 commits April 17, 2018 17:08

Update documentation on highlighting

ba73595

Add more explanation to some highlighting parameters Add a document describing how highlighters work internally.

Merge remote-tracking branch 'upstream/master' into update-highlighti…

7879fd3

…ng-docs

Updating highlighting info

ca1bc70

jimczi approved these changes Apr 18, 2018

View reviewed changes

Update highlighting docs

aaa53fc

mayya-sharipova merged commit bf6cfff into elastic:master Apr 18, 2018

mayya-sharipova deleted the update-highlighting-docs branch April 18, 2018 21:41

mayya-sharipova added a commit that referenced this pull request Apr 19, 2018

[DOCS] Update highlighting docs (#28802)

ff45b81

- add more explanation to some highlighting parameters - add a document describing how highlighters work internally

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

debadair mentioned this pull request Oct 11, 2019

Add documentation for "boundary_scanner_locale" in Highlighting docs #28816

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOCS] add details on highlighting #28802

[DOCS] add details on highlighting #28802

mayya-sharipova commented Feb 23, 2018

cexmmaqsood commented Feb 28, 2018 •

edited

Loading

mayya-sharipova commented Mar 1, 2018

mayya-sharipova commented Mar 7, 2018

mdcclv commented Mar 16, 2018

mayya-sharipova commented Mar 16, 2018

mdcclv commented Mar 19, 2018

mayya-sharipova commented Mar 19, 2018

jimczi left a comment

jimczi Apr 13, 2018

mayya-sharipova Apr 17, 2018

jimczi Apr 13, 2018

jimczi Apr 13, 2018

jimczi Apr 13, 2018

jimczi left a comment

jimczi Apr 18, 2018

mayya-sharipova Apr 18, 2018

mayya-sharipova commented Apr 18, 2018

		A highlighter uses `pre-tags`, `post-tags` to encode highlighted terms.


		===== An example of the work of the plain highlighter

[DOCS] add details on highlighting #28802

[DOCS] add details on highlighting #28802

Conversation

mayya-sharipova commented Feb 23, 2018

cexmmaqsood commented Feb 28, 2018 • edited Loading

mayya-sharipova commented Mar 1, 2018

mayya-sharipova commented Mar 7, 2018

mdcclv commented Mar 16, 2018

mayya-sharipova commented Mar 16, 2018

mdcclv commented Mar 19, 2018

mayya-sharipova commented Mar 19, 2018

jimczi left a comment

Choose a reason for hiding this comment

jimczi Apr 13, 2018

Choose a reason for hiding this comment

mayya-sharipova Apr 17, 2018

Choose a reason for hiding this comment

jimczi Apr 13, 2018

Choose a reason for hiding this comment

jimczi Apr 13, 2018

Choose a reason for hiding this comment

jimczi Apr 13, 2018

Choose a reason for hiding this comment

jimczi left a comment

Choose a reason for hiding this comment

jimczi Apr 18, 2018

Choose a reason for hiding this comment

mayya-sharipova Apr 18, 2018

Choose a reason for hiding this comment

mayya-sharipova commented Apr 18, 2018

cexmmaqsood commented Feb 28, 2018 •

edited

Loading