Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Paginated search queries should hint current position of page in individual page queries #9717

Open
gashutos opened this issue Sep 3, 2023 · 12 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc v3.0.0 Issues and PRs related to version 3.0.0

Comments

@gashutos
Copy link
Contributor

gashutos commented Sep 3, 2023

Context / Current behavior

Any _search query return hits.total.value in response like mentioned in below example. Which specifies how many hits are matching actual query (sort/search_after are not part of query).

i.e, check response for below query

GET logs-*/_search
{
        "query": {
          "match_all": {}
        },
        "sort" : [
          {"@timestamp" : "asc"}
        ],
        "search_after": ["1998-06-01"]
}

Response [partial]

{
  "took": 25,
  "timed_out": false,
  "_shards": {
    "total": 35,
    "successful": 35,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": [
      {
        "_ind

This hits.total.value is capped at 10000 max as lucene can match at max 10000 documents in results.

If you want exact match in hits.total.value, we need to add track_total_hits=true in request parameter.
i.e.

GET logs-*/_search?track_total_hits=true
{
        "query": {
          "match_all": {}
        },
        "sort" : [
          {"@timestamp" : "asc"}
        ],
        "search_after": ["1998-06-01"]
}

Response for above query [partial]

{
  "took": 32,
  "timed_out": false,
  "_shards": {
    "total": 35,
    "successful": 35,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 247249096,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [

If you check, it has exact matching value 247249096 in hits.total.value.

Above calculation results on additional computational costs on _search query to track total matching hits. So for performance sensitive queries, it is recommended to provide track_total_hits=falsle.
i.e

GET logs-*/_search?track_total_hits=false
{
        "query": {
          "match_all": {}
        },
        "sort" : [
          {"@timestamp" : "asc"}
        ],
        "search_after": ["1998-06-01"]
}

Response from above search query [partial]

{
  "took": 14,
  "timed_out": false,
  "_shards": {
    "total": 35,
    "successful": 35,
    "skipped": 25,
    "failed": 0
  },
  "hits": {
    "max_score": null,

Default behaviour for _search queries if track_total_hits is not provided as request input, is track matching hits up to 10000 documents as mentioned above.

Issue with current behavior

So now, while duing long paginated queries involving 10s of pages, the _search response wont able to give us hint how much pagination is yet to go, because we will end up getting always same hits.total.value which matches the exact _search query but not taking search_after into the account.

Proposal

Modify hits.total.value logic to honor search_after paramter, so that it always can hint user about current pagination position and how many more such paginated requests are required.

@gashutos gashutos added enhancement Enhancement or improvement to existing feature or request untriaged Search Search query, autocomplete ...etc labels Sep 3, 2023
@gashutos
Copy link
Contributor Author

gashutos commented Sep 3, 2023

@reta @dblock @backslasht
Any thoughts on changing this behavior ?

@dblock
Copy link
Member

dblock commented Sep 5, 2023

Is this just a bug? When search_after is specified I'd expect to see the actual number of results (as per search_after).

@msfroh
Copy link
Collaborator

msfroh commented Sep 5, 2023

@gashutos, can you please clarify the proposed behavior? What exactly should change in the search response?

Are you suggesting that using search_after should implicitly set track_total_hits to true (to return the true number of total hits)?

@gashutos
Copy link
Contributor Author

gashutos commented Sep 5, 2023

@dblock @msfroh

Is this just a bug? When search_after is specified I'd expect to see the actual number of results (as per search_after).

Nopes, its not a bug, that's what expected behavior today is to not consider search_after parameter to count matching documents. search_after is used for pagination where 1000s of hits we are trying to get with paginated search queries. But all paginated search queries will return same hits.total.value.
My proposal is to change this behaviour, when search queries moves to next page, hits.total.value should decrease so that it can give hint to user where they are in pagination.

Let me know if this explains, else I can take an example..

@gashutos
Copy link
Contributor Author

gashutos commented Sep 5, 2023

Are you suggesting that using search_after should implicitly set track_total_hits to true (to return the true number of total hits)?

@msfroh I am not thinking to make track_total_hits by default true for search_after queries. By default it tracks only 10000 documents. So even for search_after query, if the document match count goes above 10000, we will just display 10000 for each paginated search responses. But once it comes <=10000 documents considering search_after values, it should start displaying lesser number of hit.
Yeah, I should clarify this case in my proposal.

@dblock
Copy link
Member

dblock commented Sep 5, 2023

@gashutos Now I'm even more confused ;)

  1. Ignore the current implementation. What do you think hits.total.value (total hits) should be conceptually (with and without search_after)?
  2. Would adding remaining.hits be a viable alternative?

@gashutos
Copy link
Contributor Author

gashutos commented Sep 5, 2023

@gashutos Now I'm even more confused ;)

  1. Ignore the current implementation. What do you think hits.total.value (total hits) should be conceptually (with and without search_after)?

I think hits.total.value should honor the search_after value :)
For example, if field has 1 to 1000 int unique values (1000 documents), then asc order sort query on that field reutrns 1 to 5 documents (with size=5) with hits.total.value=1000. To retrieve next page, my next search query will be with search_after=5 so it can return me hits from '6' to '10'. And that search query should return hits.total.value=995. (Instead today it returns always 1000).

  1. Would adding remaining.hits be a viable alternative?

Yeah, I like this idea too. But we will end up achieving same value if we do otherwise as mentined above ?
This is affecting our query performance too. Like we had to disable search_after based skipping like here #9683 to just count matching hits with query.

@dblock
Copy link
Member

dblock commented Sep 5, 2023

I think hits.total.value should honor the search_after value :)

I am with you. I think it's a bug that it doesn't. Let's see what others think? Maybe @msfroh ?

@gashutos
Copy link
Contributor Author

gashutos commented Sep 5, 2023

I think it's a bug that it doesn't. Let's see what others think?

@dblock Yeah, It should be bug, but it is behavior expected by users today. Like #9013, users are expecting search_after queires to return same hits.total.value in all paginated search response.

@dblock
Copy link
Member

dblock commented Sep 5, 2023

Thanks for that context, I forgot about it :( Do you understand why #9013 had these expectations other than backwards compatibility? I am in support of changing it for 3.0, if we can review/improve hits.total.value in this and other cases.

@msfroh
Copy link
Collaborator

msfroh commented Sep 5, 2023

So even for search_after query, if the document match count goes above 10000, we will just display 10000 for each paginated search responses. But once it comes <=10000 documents considering search_after values, it should start displaying lesser number of hit.

But the response isn't saying that total hits is 10000, it's saying that the total hits is >= 10000

It's been a while since I looked into the search_after implementation, but I think it essentially reruns the query but doesn't start collecting hits until it encounters a document with the "after" hit (but it may still be counting those hits -- which still contributes to the total hit count, capped at 10k by default).

Would it make sense to start the 10k total hit count collection once the search_after value is encountered (in 3.0)? If there are fewer than 10k hits after the search_after value, the exact number will be returned. Otherwise, it could still return >= 10000 (unless track_total_hits is true).

Functionally, this would be equivalent to turning the search_after into a filter on the original query, essentially saying "AND only match docs after X".

@gashutos
Copy link
Contributor Author

gashutos commented Sep 5, 2023

Would it make sense to start the 10k total hit count collection once the search_after value is encountered (in 3.0)? If there are fewer than 10k hits after the search_after value, the exact number will be returned. Otherwise, it could still return >= 10000 (unless track_total_hits is true).

Functionally, this would be equivalent to turning the search_after into a filter on the original query, essentially saying "AND only match docs after X".

@msfroh yes, this is the proposal. To make 'search_after' a filter into query itself to count hits.total.value.
Yeah, since this is the behavior change, I am too +1 to make this change in 3.0

This will improve our query performance too when pagination is running over expensive queries, currently it is unnecessarily querying segments/shards where no matching documents for given 'search_after' value but plenty of matching documents for actual query.

@msfroh msfroh added v3.0.0 Issues and PRs related to version 3.0.0 and removed untriaged labels Sep 6, 2023
@getsaurabh02 getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc v3.0.0 Issues and PRs related to version 3.0.0
Projects
Status: Later (6 months plus)
Development

No branches or pull requests

3 participants