From 10b470ef90d4214edbd7107d10048b2337423f39 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Mon, 5 Feb 2024 18:25:35 -0500 Subject: [PATCH 01/14] Concurrent segment search GA and API changes for 2.12 Signed-off-by: Fanit Kolchina --- _api-reference/index-apis/stats.md | 76 +++++----- _api-reference/nodes-apis/nodes-stats.md | 9 +- _api-reference/profile.md | 141 ++++++++----------- _search-plugins/concurrent-segment-search.md | 79 ----------- 4 files changed, 101 insertions(+), 204 deletions(-) diff --git a/_api-reference/index-apis/stats.md b/_api-reference/index-apis/stats.md index 7d515bcdcf..a812059fca 100644 --- a/_api-reference/index-apis/stats.md +++ b/_api-reference/index-apis/stats.md @@ -77,6 +77,41 @@ GET /testindex/_stats ``` {% include copy-curl.html %} +#### Example request: Comma-separated list of indexes + +```json +GET /testindex1,testindex2/_stats +``` +{% include copy-curl.html %} + +#### Example request: Wildcard expression + +```json +GET /testindex*/_stats +``` +{% include copy-curl.html %} + +#### Example request: Specific stats + +```json +GET /testindex/_stats/refresh,flush +``` +{% include copy-curl.html %} + +#### Example request: Expand wildcards + +```json +GET /testindex*/_stats?expand_wildcards=open,hidden +``` +{% include copy-curl.html %} + +#### Example request: Shard-level statistics + +```json +GET /testindex/_stats?level=shards +``` +{% include copy-curl.html %} + #### Example response By default, the returned statistics are aggregated in the `primaries` and `total` aggregations. The `primaries` aggregation contains statistics for the primary shards. The `total` aggregation contains statistics for both primary and replica shards. The following is an example Index Stats API response: @@ -773,46 +808,9 @@ By default, the returned statistics are aggregated in the `primaries` and `total ``` -#### Example request: Comma-separated list of indexes - -```json -GET /testindex1,testindex2/_stats -``` -{% include copy-curl.html %} - -#### Example request: Wildcard expression - -```json -GET /testindex*/_stats -``` -{% include copy-curl.html %} - -#### Example request: Specific stats - -```json -GET /testindex/_stats/refresh,flush -``` -{% include copy-curl.html %} - -#### Example request: Expand wildcards - -```json -GET /testindex*/_stats?expand_wildcards=open,hidden -``` -{% include copy-curl.html %} - -#### Example request: Shard-level statistics - -```json -GET /testindex/_stats?level=shards -``` -{% include copy-curl.html %} - -## Concurrent segment search - -Starting in OpenSearch 2.10, [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/) allows each shard-level request to search segments in parallel during the query phase. If you [enable the experimental concurrent segment search feature flag]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search#enabling-the-feature-flag), the Index Stats API response will contain several additional fields with statistics about slices (units of work executed by a thread). These fields will be provided whether or not the cluster and index settings for concurrent segment search are enabled. For more information about slices, see [Concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search#searching-segments-concurrently). +## Response fields -The following table provides information about the added response fields. +The following table provides information about the response fields. |Response field | Description | |:--- |:--- | diff --git a/_api-reference/nodes-apis/nodes-stats.md b/_api-reference/nodes-apis/nodes-stats.md index bfef620f32..c81f8fa4a9 100644 --- a/_api-reference/nodes-apis/nodes-stats.md +++ b/_api-reference/nodes-apis/nodes-stats.md @@ -831,6 +831,10 @@ get.missing_total | Integer | The number of failed get operations. get.missing_time_in_millis | Integer | The total time for all failed get operations, in milliseconds. get.current | Integer | The number of get operations that are currently running. search | Object | Statistics about the search operations for the node. +search.concurrent_avg_slice_count |The average slice count of all search requests. This is computed as the total slice count divided by the total number of concurrent search requests. +search.concurrent_query_total |The total number of query operations that use concurrent segment search. +search.concurrent_query_time_in_millis |The total amount of time taken by all query operations that use concurrent segment search, in milliseconds. +search.concurrent_query_current |The number of currently running query operations that use concurrent segment search. search.open_contexts | Integer | The number of open search contexts. search.query_total | Integer | The total number of shard query operations. search.query_time_in_millis | Integer | The total amount of time for all shard query operations, in milliseconds. @@ -1259,11 +1263,6 @@ Field | Field type | Description admission_control.global_cpu_usage.transport.rejection_count.search | Integer | The total number of search rejections in the transport layer when the node CPU usage limit was breached. In this case, additional search requests are rejected until the system recovers. admission_control.global_cpu_usage.transport.rejection_count.indexing | Integer | The total number of indexing rejections in the transport layer when the node CPU usage limit was breached. In this case, additional indexing requests are rejected until the system recovers. - -## Concurrent segment search - -Starting in OpenSearch 2.10, [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/) allows each shard-level request to search segments in parallel during the query phase. If you [enable the experimental concurrent segment search feature flag]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search#enabling-the-feature-flag), the Nodes Stats API response will contain several additional fields with statistics about slices (units of work executed by a thread). For the descriptions of those fields, see [Index Stats API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/stats#concurrent-segment-search). - ## Required permissions If you use the Security plugin, make sure you have the appropriate permissions: `cluster:monitor/nodes/stats`. diff --git a/_api-reference/profile.md b/_api-reference/profile.md index faa197735f..c865237423 100644 --- a/_api-reference/profile.md +++ b/_api-reference/profile.md @@ -18,6 +18,14 @@ The Profile API provides timing information about the execution of individual co The Profile API is a resource-consuming operation that adds overhead to search operations. {: .warning} +## Concurrent segment search + +Starting in OpenSearch 2.10, [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/) allows each shard-level request to search segments in parallel during the query phase. The Profile API response contains several additional fields with statistics about _slices_. + +A slice is the unit of work that can be executed by a thread. Each query can be partitioned into multiple slices, with each slice containing one or more segments. All the slices can be executed either in parallel or in some order depending on the available threads in the pool. + +In general, the max/min/avg slice time captures statistics across all slices for a timing type. For example, when profiling aggregations, the `max_slice_time_in_nanos` field in the `aggregations` section shows the maximum time consumed by the aggregation operation and its children across all slices. + #### Example request To use the Profile API, include the `profile` parameter set to `true` in the search request sent to the `_search` endpoint: @@ -236,7 +244,10 @@ Field | Data type | Description :--- | :--- | :--- `type` | String | The Lucene query type into which the search query was rewritten. Corresponds to the Lucene class name (which often has the same name in OpenSearch). `description` | String | Contains a Lucene explanation of the query. Helps differentiate queries with the same type. -`time_in_nanos` | Long | The amount of time the query took to execute, in nanoseconds. In a parent query, the time is inclusive of the execution times of all the child queries. +`time_in_nanos` | Long | The total elapsed time for this query, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). +|`max_slice_time_in_nanos` | Long | The maximum amount of time taken by any slice to run a query, in nanoseconds. +|`min_slice_time_in_nanos` | Long | The minimum amount of time taken by any slice to run a query, in nanoseconds. +|`avg_slice_time_in_nanos` | Long | The average amount of time taken by any slice to run a query, in nanoseconds. [`breakdown`](#the-breakdown-object) | Object | Contains timing statistics about low-level Lucene execution. `children` | Array of objects | If a query has subqueries (children), this field contains information about the subqueries. @@ -255,7 +266,15 @@ Field | Description `shallow_advance` | Contains the amount of time required to execute the `advanceShallow` Lucene method. `compute_max_score` | Contains the amount of time required to execute the `getMaxScore` Lucene method. `set_min_competitive_score` | Contains the amount of time required to execute the `setMinCompetitiveScore` Lucene method. -`_count` | Contains the number of invocations of a ``. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components. +`_count` | Contains the number of invocations of a ``. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components. For concurrent segment search, this field contains the total number of invocations of a `` obtained by adding the number of method invocations for all slices. +`` | For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `build_scorer` method, it is the total time spent constructing the `Scorer` object across all slices. +`max_` | The maximum amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `max` time because the method runs at the query level rather than the slice level. +`min_` | The minimum amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `min` time because the method runs at the query level rather than the slice level. +`avg_` | The average amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `avg` time because the method runs at the query level rather than the slice level. +`_count` +`max__count` | The maximum number of invocations of a `` on any slice. Breakdown stats for the `create_weight` method do not include profiled `max` count because the method runs at the query level rather than the slice level. +`min__count` | The minimum number of invocations of a `` on any slice. Breakdown stats for the `create_weight` method do not include profiled `min` count because the method runs at the query level rather than the slice level. +`avg__count` | The average number of invocations of a `` on any slice. Breakdown stats for the `create_weight` method do not include profiled `avg` count because the method runs at the query level rather than the slice level. ### The `collector` array @@ -265,8 +284,13 @@ Field | Description :--- | :--- `name` | The collector name. In the [example response](#example-response), the `collector` is a single `SimpleTopScoreDocCollector`---the default scoring and sorting collector. `reason` | Contains a description of the collector. For possible field values, see [Collector reasons](#collector-reasons). -`time_in_nanos` | A wall-clock time, including timing for all children. +`time_in_nanos` | The total elapsed time for this collector, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total amount of time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). `children` | If a collector has subcollectors (children), this field contains information about the subcollectors. +`max_slice_time_in_nanos` |The maximum amount of time taken by any slice, in nanoseconds. +`min_slice_time_in_nanos` |The minimum amount of time taken by any slice, in nanoseconds. +`avg_slice_time_in_nanos` |The average amount of time taken by any slice, in nanoseconds. +`slice_count` |The total slice count for this query. +`reduce_time_in_nanos` |The amount of time taken to reduce results for all slice collectors, in nanoseconds. Collector times are calculated, combined, and normalized independently, so they are independent of query times. {: .note} @@ -730,42 +754,7 @@ The response contains profiling information: ``` -### Response fields - -The `aggregations` array contains aggregation objects with the following fields. - -Field | Data type | Description -:--- | :--- | :--- -`type` | String | The aggregator type. In the [non-global aggregation example response](#example-response-non-global-aggregation), the aggregator type is `AvgAggregator`. [Global aggregation example response](#example-request-global-aggregation) contains a `GlobalAggregator` with an `AvgAggregator` child. -`description` | String | Contains a Lucene explanation of the aggregation. Helps differentiate aggregations with the same type. -`time_in_nanos` | Long | The amount of time taken to execute the aggregation, in nanoseconds. In a parent aggregation, the time is inclusive of the execution times of all the child aggregations. -[`breakdown`](#the-breakdown-object-1) | Object | Contains timing statistics about low-level Lucene execution. -`children` | Array of objects | If an aggregation has subaggregations (children), this field contains information about the subaggregations. -`debug` | Object | Some aggregations return a `debug` object that describes the details of the underlying execution. - -### The `breakdown` object - -The `breakdown` object represents the timing statistics about low-level Lucene execution, broken down by method. Each field in the `breakdown` object represents an internal Lucene method executed within the aggregation. Timings are listed in wall-clock nanoseconds and are not normalized. The `breakdown` timings are inclusive of all child times. The `breakdown` object is comprised of the following fields. All fields contain integer values. - -Field | Description -:--- | :--- -`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. -`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. -`collect`| Contains the time spent collecting the documents into buckets. -`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method. -`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation. -`reduce`| Contains the time spent in the `reduce` phase. -`_count` | Contains the number of invocations of a ``. For example, `build_leaf_collector_count` contains the number of invocations of the `build_leaf_collector` method. - -## Concurrent segment search - -Starting in OpenSearch 2.10, [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/) allows each shard-level request to search segments in parallel during the query phase. If you enable the experimental concurrent segment search feature flag, the Profile API response will contain several additional fields with statistics about _slices_. - -A slice is the unit of work that can be executed by a thread. Each query can be partitioned into multiple slices, with each slice containing one or more segments. All the slices can be executed either in parallel or in some order depending on the available threads in the pool. - -In general, the max/min/avg slice time captures statistics across all slices for a timing type. For example, when profiling aggregations, the `max_slice_time_in_nanos` field in the `aggregations` section shows the maximum time consumed by the aggregation operation and its children across all slices. - -#### Example response +#### Example response: Concurrent segment search The following is an example response for a concurrent search with three segment slices: @@ -979,51 +968,41 @@ The following is an example response for a concurrent search with three segment ``` -### Modified or added response fields - -The following sections contain definitions of all modified or added response fields for concurrent segment search. +### Response fields -#### The `query` array +The `aggregations` array contains aggregation objects with the following fields. -|Field |Description | -|:--- |:--- | -|`time_in_nanos` | The total elapsed time for this query, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). | -|`max_slice_time_in_nanos` | The maximum amount of time taken by any slice to run a query, in nanoseconds. | -|`min_slice_time_in_nanos` | The minimum amount of time taken by any slice to run a query, in nanoseconds. | -|`avg_slice_time_in_nanos` | The average amount of time taken by any slice to run a query, in nanoseconds. | -|`breakdown.` | For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `build_scorer` method, it is the total time spent constructing the `Scorer` object across all slices. | -|`breakdown.max_` | The maximum amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `max` time because the method runs at the query level rather than the slice level. | -|`breakdown.min_` | The minimum amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `min` time because the method runs at the query level rather than the slice level. | -|`breakdown.avg_` | The average amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `avg` time because the method runs at the query level rather than the slice level. | -|`breakdown._count` | For concurrent segment search, this field contains the total number of invocations of a `` obtained by adding the number of method invocations for all slices. | -|`breakdown.max__count` | The maximum number of invocations of a `` on any slice. Breakdown stats for the `create_weight` method do not include profiled `max` count because the method runs at the query level rather than the slice level. | -|`breakdown.min__count` | The minimum number of invocations of a `` on any slice. Breakdown stats for the `create_weight` method do not include profiled `min` count because the method runs at the query level rather than the slice level. | -|`breakdown.avg__count` | The average number of invocations of a `` on any slice. Breakdown stats for the `create_weight` method do not include profiled `avg` count because the method runs at the query level rather than the slice level. | +Field | Data type | Description +:--- | :--- | :--- +`type` | String | The aggregator type. In the [non-global aggregation example response](#example-response-non-global-aggregation), the aggregator type is `AvgAggregator`. [Global aggregation example response](#example-request-global-aggregation) contains a `GlobalAggregator` with an `AvgAggregator` child. +`description` | String | Contains a Lucene explanation of the aggregation. Helps differentiate aggregations with the same type. +`time_in_nanos` | Long | The total elapsed time for this aggregation, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total amount of time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). +[`breakdown`](#the-breakdown-object-1) | Object | Contains timing statistics about low-level Lucene execution. +`children` | Array of objects | If an aggregation has subaggregations (children), this field contains information about the subaggregations. +`debug` | Object | Some aggregations return a `debug` object that describes the details of the underlying execution. +`max_slice_time_in_nanos` |The maximum amount of time taken by any slice to run an aggregation, in nanoseconds. +`min_slice_time_in_nanos` |The minimum amount of time taken by any slice to run an aggregation, in nanoseconds. +`avg_slice_time_in_nanos` |The average amount of time taken by any slice to run an aggregation, in nanoseconds. +`` |The total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `collect` method, it is the total time spent collecting documents into buckets across all slices. +`max_` |The maximum amount of time taken by any slice to run an aggregation method. +`min_`|The minimum amount of time taken by any slice to run an aggregation method. +`avg_` |The average amount of time taken by any slice to run an aggregation method. +`_count` |The total method count across all slices. For example, for the `collect` method, it is the total number of invocations of this method needed to collect documents into buckets across all slices. +`max__count` |The maximum number of invocations of a `` on any slice. +`min__count` |The minimum number of invocations of a `` on any slice. +`avg__count` |The average number of invocations of a `` on any slice. -#### The `collector` array -|Field |Description | -|:--- |:--- | -|`time_in_nanos` |The total elapsed time for this collector, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total amount of time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). | -|`max_slice_time_in_nanos` |The maximum amount of time taken by any slice, in nanoseconds. | -|`min_slice_time_in_nanos` |The minimum amount of time taken by any slice, in nanoseconds. | -|`avg_slice_time_in_nanos` |The average amount of time taken by any slice, in nanoseconds. | -|`slice_count` |The total slice count for this query. | -|`reduce_time_in_nanos` |The amount of time taken to reduce results for all slice collectors, in nanoseconds. | +### The `breakdown` object -#### The `aggregations` array +The `breakdown` object represents the timing statistics about low-level Lucene execution, broken down by method. Each field in the `breakdown` object represents an internal Lucene method executed within the aggregation. Timings are listed in wall-clock nanoseconds and are not normalized. The `breakdown` timings are inclusive of all child times. The `breakdown` object is comprised of the following fields. All fields contain integer values. -|Field |Description | -|:--- |:--- | -|`time_in_nanos` |The total elapsed time for this aggregation, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total amount of time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). | -|`max_slice_time_in_nanos` |The maximum amount of time taken by any slice to run an aggregation, in nanoseconds. | -|`min_slice_time_in_nanos` |The minimum amount of time taken by any slice to run an aggregation, in nanoseconds. | -|`avg_slice_time_in_nanos` |The average amount of time taken by any slice to run an aggregation, in nanoseconds. | -|`` |The total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `collect` method, it is the total time spent collecting documents into buckets across all slices. | -|`max_` |The maximum amount of time taken by any slice to run an aggregation method. | -|`min_`|The minimum amount of time taken by any slice to run an aggregation method. | -|`avg_` |The average amount of time taken by any slice to run an aggregation method. | -|`_count` |The total method count across all slices. For example, for the `collect` method, it is the total number of invocations of this method needed to collect documents into buckets across all slices. | -|`max__count` |The maximum number of invocations of a `` on any slice. | -|`min__count` |The minimum number of invocations of a `` on any slice. | -|`avg__count` |The average number of invocations of a `` on any slice. | +Field | Description +:--- | :--- +`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. +`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. +`collect`| Contains the time spent collecting the documents into buckets. +`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method. +`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation. +`reduce`| Contains the time spent in the `reduce` phase. +`_count` | Contains the number of invocations of a ``. For example, `build_leaf_collector_count` contains the number of invocations of the `build_leaf_collector` method. diff --git a/_search-plugins/concurrent-segment-search.md b/_search-plugins/concurrent-segment-search.md index 8ece3493f1..82244c4ab7 100644 --- a/_search-plugins/concurrent-segment-search.md +++ b/_search-plugins/concurrent-segment-search.md @@ -7,9 +7,6 @@ nav_order: 53 # Concurrent segment search -This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/2587) or the [project board](https://github.com/orgs/opensearch-project/projects/117/views/1). -{: .warning} - Use concurrent segment search to search segments in parallel during the query phase. Cases in which concurrent segment search improves search latency include the following: - When sending long-running requests, for example, requests that contain aggregations or large ranges @@ -23,71 +20,6 @@ In OpenSearch, each search request follows the scatter-gather protocol. The coor Without concurrent segment search, Lucene executes a request sequentially across all segments on each shard during the query phase. The query phase then collects the top hits for the search request. With concurrent segment search, each shard-level request will search the segments in parallel during the query phase. For each shard, the segments are divided into multiple _slices_. Each slice is the unit of work that can be executed in parallel on a separate thread, so the slice count determines the maximum degree of parallelism for a shard-level request. Once all the slices complete their work, Lucene performs a reduce operation on the slices, merging them and creating the final result for this shard-level request. Slices are executed using a new `index_searcher` thread pool, which is different from the `search` thread pool that handles shard-level requests. -## Enabling the feature flag - -There are several methods for enabling concurrent segment search, depending on the installation type. - -### Enable in opensearch.yml - -If you are running an OpenSearch cluster and want to enable concurrent segment search in the config file, add the following line to `opensearch.yml`: - -```yaml -opensearch.experimental.feature.concurrent_segment_search.enabled: true -``` -{% include copy.html %} - -### Enable with Docker containers - -If you’re running Docker, add the following line to `docker-compose.yml` under the `opensearch-node` > `environment` section: - -```bash -OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.concurrent_segment_search.enabled=true" -``` -{% include copy.html %} - -### Enable on a node using a tarball installation - -To enable concurrent segment search on a tarball installation, provide the new JVM parameter either in `config/jvm.options` or `OPENSEARCH_JAVA_OPTS`. - -#### OPTION 1: Modify jvm.options - -Add the following lines to `config/jvm.options` before starting the `opensearch` process to enable the feature and its dependency: - -```bash --Dopensearch.experimental.feature.concurrent_segment_search.enabled=true -``` -{% include copy.html %} - -Then run OpenSearch: - -```bash -./bin/opensearch -``` -{% include copy.html %} - -#### OPTION 2: Enable with an environment variable - -As an alternative to directly modifying `config/jvm.options`, you can define the properties by using an environment variable. This can be done using a single command when you start OpenSearch or by defining the variable with `export`. - -To add these flags inline when starting OpenSearch, run the following command: - -```bash -OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.concurrent_segment_search.enabled=true" ./opensearch-{{site.opensearch_version}}/bin/opensearch -``` -{% include copy.html %} - -If you want to define the environment variable separately prior to running OpenSearch, run the following commands: - -```bash -export OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.concurrent_segment_search.enabled=true" -``` -{% include copy.html %} - -```bash -./bin/opensearch -``` -{% include copy.html %} - ## Disabling concurrent search at the index or cluster level After you enable the experimental feature flag, all search requests will use concurrent segment search during the query phase. To disable concurrent segment search for all indexes, set the following dynamic cluster setting: @@ -143,17 +75,6 @@ The [`terminate_after` search parameter]({{site.url}}{{site.baseurl}}/api-refere Typically, queries are used with smaller `terminate_after` values and thus complete quickly because the search is performed on a reduced dataset. Therefore, concurrent search may not further improve performance in this case. Moreover, when `terminate_after` is used with other search request parameters, such as `track_total_hits` or `size`, it adds complexity and changes the expected query behavior. Falling back to a non-concurrent path for search requests that include `terminate_after` ensures consistent results between concurrent and non-concurrent requests. -## API changes - -If you enable the concurrent segment search feature flag, the following Stats API responses will contain several additional fields with statistics about slices: - -- [Index Stats]({{site.url}}{{site.baseurl}}/api-reference/index-apis/stats/) -- [Nodes Stats]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/) - -For descriptions of the added fields, see [Index Stats API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/stats#concurrent-segment-search). - -Additionally, some [Profile API]({{site.url}}{{site.baseurl}}/api-reference/profile/) response fields will be modified and others added. For more information, see the [concurrent segment search section of the Profile API]({{site.url}}{{site.baseurl}}/api-reference/profile#concurrent-segment-search). - ## Limitations Parent aggregations on [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/) fields do not support the concurrent search model. Thus, if a search request contains a parent aggregation, the aggregation will be executed using the non-concurrent path even if concurrent segment search is enabled at the cluster level. From 36fb3fb3be991bc9a230064cf64d2d5e16df70fb Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Mon, 5 Feb 2024 18:28:27 -0500 Subject: [PATCH 02/14] Add types Signed-off-by: Fanit Kolchina --- _api-reference/nodes-apis/nodes-stats.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_api-reference/nodes-apis/nodes-stats.md b/_api-reference/nodes-apis/nodes-stats.md index c81f8fa4a9..8a6fb277cf 100644 --- a/_api-reference/nodes-apis/nodes-stats.md +++ b/_api-reference/nodes-apis/nodes-stats.md @@ -831,10 +831,10 @@ get.missing_total | Integer | The number of failed get operations. get.missing_time_in_millis | Integer | The total time for all failed get operations, in milliseconds. get.current | Integer | The number of get operations that are currently running. search | Object | Statistics about the search operations for the node. -search.concurrent_avg_slice_count |The average slice count of all search requests. This is computed as the total slice count divided by the total number of concurrent search requests. -search.concurrent_query_total |The total number of query operations that use concurrent segment search. -search.concurrent_query_time_in_millis |The total amount of time taken by all query operations that use concurrent segment search, in milliseconds. -search.concurrent_query_current |The number of currently running query operations that use concurrent segment search. +search.concurrent_avg_slice_count | Integer | The average slice count of all search requests. This is computed as the total slice count divided by the total number of concurrent search requests. +search.concurrent_query_total |Integer | The total number of query operations that use concurrent segment search. +search.concurrent_query_time_in_millis | Integer | The total amount of time taken by all query operations that use concurrent segment search, in milliseconds. +search.concurrent_query_current |Integer | The number of currently running query operations that use concurrent segment search. search.open_contexts | Integer | The number of open search contexts. search.query_total | Integer | The total number of shard query operations. search.query_time_in_millis | Integer | The total amount of time for all shard query operations, in milliseconds. From 14b8ee64589f0c2b4fd84abc8162b76481678e73 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Mon, 5 Feb 2024 19:06:08 -0500 Subject: [PATCH 03/14] Changed wording to enable concur seg search Signed-off-by: Fanit Kolchina --- _search-plugins/concurrent-segment-search.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/_search-plugins/concurrent-segment-search.md b/_search-plugins/concurrent-segment-search.md index 82244c4ab7..0fa51a1de0 100644 --- a/_search-plugins/concurrent-segment-search.md +++ b/_search-plugins/concurrent-segment-search.md @@ -20,26 +20,26 @@ In OpenSearch, each search request follows the scatter-gather protocol. The coor Without concurrent segment search, Lucene executes a request sequentially across all segments on each shard during the query phase. The query phase then collects the top hits for the search request. With concurrent segment search, each shard-level request will search the segments in parallel during the query phase. For each shard, the segments are divided into multiple _slices_. Each slice is the unit of work that can be executed in parallel on a separate thread, so the slice count determines the maximum degree of parallelism for a shard-level request. Once all the slices complete their work, Lucene performs a reduce operation on the slices, merging them and creating the final result for this shard-level request. Slices are executed using a new `index_searcher` thread pool, which is different from the `search` thread pool that handles shard-level requests. -## Disabling concurrent search at the index or cluster level +## Enabling concurrent search at the index or cluster level -After you enable the experimental feature flag, all search requests will use concurrent segment search during the query phase. To disable concurrent segment search for all indexes, set the following dynamic cluster setting: +By default, concurrent segment search is disabled on the cluster. To enable concurrent segment search for all indexes in the cluster, set the following dynamic cluster setting: ```json PUT _cluster/settings { "persistent":{ - "search.concurrent_segment_search.enabled": false + "search.concurrent_segment_search.enabled": true } } ``` {% include copy-curl.html %} -To disable concurrent segment search for a particular index, specify the index name in the endpoint: +To enable concurrent segment search for a particular index, specify the index name in the endpoint: ```json PUT /_settings { - "index.search.concurrent_segment_search.enabled": false + "index.search.concurrent_segment_search.enabled": enable } ``` {% include copy-curl.html %} From 3de97f8d44fb9683384694e787e72e57208a6a16 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Tue, 6 Feb 2024 15:26:56 -0500 Subject: [PATCH 04/14] Tech review comments Signed-off-by: Fanit Kolchina --- _search-plugins/concurrent-segment-search.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/_search-plugins/concurrent-segment-search.md b/_search-plugins/concurrent-segment-search.md index 0fa51a1de0..611723f0da 100644 --- a/_search-plugins/concurrent-segment-search.md +++ b/_search-plugins/concurrent-segment-search.md @@ -22,7 +22,15 @@ Without concurrent segment search, Lucene executes a request sequentially across ## Enabling concurrent search at the index or cluster level -By default, concurrent segment search is disabled on the cluster. To enable concurrent segment search for all indexes in the cluster, set the following dynamic cluster setting: +By default, concurrent segment search is disabled on the cluster. You can enable concurrent segment search at two levels: + +- Cluster level +- Index level + +The index-level setting takes priority over the cluster-level setting. Thus, if the cluster setting is enabled but the index setting is disabled, then concurrent search will be disabled for that index. +{: .note} + +To enable concurrent segment search for all indexes in the cluster, set the following dynamic cluster setting: ```json PUT _cluster/settings From d99b61fdec818e3731afc471df7c1f002d4515b3 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Tue, 6 Feb 2024 15:28:00 -0500 Subject: [PATCH 05/14] Typo Signed-off-by: Fanit Kolchina --- _search-plugins/concurrent-segment-search.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/concurrent-segment-search.md b/_search-plugins/concurrent-segment-search.md index 611723f0da..3ee1444160 100644 --- a/_search-plugins/concurrent-segment-search.md +++ b/_search-plugins/concurrent-segment-search.md @@ -47,7 +47,7 @@ To enable concurrent segment search for a particular index, specify the index na ```json PUT /_settings { - "index.search.concurrent_segment_search.enabled": enable + "index.search.concurrent_segment_search.enabled": true } ``` {% include copy-curl.html %} From 4fcae371c48384a9996b916c9534b9a9067f4c99 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Wed, 7 Feb 2024 13:42:30 -0500 Subject: [PATCH 06/14] Tech review comments Signed-off-by: Fanit Kolchina --- _api-reference/profile.md | 473 +++++++++++++++++++------------------- 1 file changed, 232 insertions(+), 241 deletions(-) diff --git a/_api-reference/profile.md b/_api-reference/profile.md index c865237423..89e8acaf97 100644 --- a/_api-reference/profile.md +++ b/_api-reference/profile.md @@ -20,7 +20,7 @@ The Profile API is a resource-consuming operation that adds overhead to search o ## Concurrent segment search -Starting in OpenSearch 2.10, [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/) allows each shard-level request to search segments in parallel during the query phase. The Profile API response contains several additional fields with statistics about _slices_. +Starting in OpenSearch 2.12, [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/) allows each shard-level request to search segments in parallel during the query phase. The Profile API response contains several additional fields with statistics about _slices_. A slice is the unit of work that can be executed by a thread. Each query can be partitioned into multiple slices, with each slice containing one or more segments. All the slices can be executed either in parallel or in some order depending on the available threads in the pool. @@ -221,6 +221,220 @@ The response contains profiling information: ``` +#### Example response: Concurrent segment search + +The following is an example response for a concurrent search with three segment slices: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 10, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 5, + "relation": "eq" + }, + "max_score": 1.0, + "hits": [ + ... + ] + }, + "aggregations": { + ... + }, + "profile": { + "shards": [ + { + "id": "[9Y7lbpaWRhyr5Y-41Zl48g][idx][0]", + "inbound_network_time_in_millis": 0, + "outbound_network_time_in_millis": 0, + "searches": [ + { + "query": [ + { + "type": "MatchAllDocsQuery", + "description": "*:*", + "time_in_nanos": 868000, + "max_slice_time_in_nanos": 19376, + "min_slice_time_in_nanos": 12250, + "avg_slice_time_in_nanos": 16847, + "breakdown": { + "max_match": 0, + "set_min_competitive_score_count": 0, + "match_count": 0, + "avg_score_count": 1, + "shallow_advance_count": 0, + "next_doc": 29708, + "min_build_scorer": 3125, + "score_count": 5, + "compute_max_score_count": 0, + "advance": 0, + "min_set_min_competitive_score": 0, + "min_advance": 0, + "score": 29250, + "avg_set_min_competitive_score_count": 0, + "min_match_count": 0, + "avg_score": 333, + "max_next_doc_count": 3, + "max_compute_max_score_count": 0, + "avg_shallow_advance": 0, + "max_shallow_advance_count": 0, + "set_min_competitive_score": 0, + "min_build_scorer_count": 2, + "next_doc_count": 8, + "min_match": 0, + "avg_next_doc": 888, + "compute_max_score": 0, + "min_set_min_competitive_score_count": 0, + "max_build_scorer": 5791, + "avg_match_count": 0, + "avg_advance": 0, + "build_scorer_count": 6, + "avg_build_scorer_count": 2, + "min_next_doc_count": 2, + "min_shallow_advance_count": 0, + "max_score_count": 2, + "avg_match": 0, + "avg_compute_max_score": 0, + "max_advance": 0, + "avg_shallow_advance_count": 0, + "avg_set_min_competitive_score": 0, + "avg_compute_max_score_count": 0, + "avg_build_scorer": 4027, + "max_set_min_competitive_score_count": 0, + "advance_count": 0, + "max_build_scorer_count": 2, + "shallow_advance": 0, + "min_compute_max_score": 0, + "max_match_count": 0, + "create_weight_count": 1, + "build_scorer": 32459, + "max_set_min_competitive_score": 0, + "max_compute_max_score": 0, + "min_shallow_advance": 0, + "match": 0, + "max_shallow_advance": 0, + "avg_advance_count": 0, + "min_next_doc": 708, + "max_advance_count": 0, + "min_score": 291, + "max_next_doc": 999, + "create_weight": 1834, + "avg_next_doc_count": 2, + "max_score": 376, + "min_compute_max_score_count": 0, + "min_score_count": 1, + "min_advance_count": 0 + } + } + ], + "rewrite_time": 8126, + "collector": [ + { + "name": "QueryCollectorManager", + "reason": "search_multi", + "time_in_nanos": 564708, + "reduce_time_in_nanos": 1251042, + "max_slice_time_in_nanos": 121959, + "min_slice_time_in_nanos": 28958, + "avg_slice_time_in_nanos": 83208, + "slice_count": 3, + "children": [ + { + "name": "SimpleTopDocsCollectorManager", + "reason": "search_top_hits", + "time_in_nanos": 500459, + "reduce_time_in_nanos": 840125, + "max_slice_time_in_nanos": 22168, + "min_slice_time_in_nanos": 5792, + "avg_slice_time_in_nanos": 12084, + "slice_count": 3 + }, + { + "name": "NonGlobalAggCollectorManager: [histo]", + "reason": "aggregation", + "time_in_nanos": 552167, + "reduce_time_in_nanos": 311292, + "max_slice_time_in_nanos": 95333, + "min_slice_time_in_nanos": 18416, + "avg_slice_time_in_nanos": 66249, + "slice_count": 3 + } + ] + } + ] + } + ], + "aggregations": [ + { + "type": "NumericHistogramAggregator", + "description": "histo", + "time_in_nanos": 2847834, + "max_slice_time_in_nanos": 117374, + "min_slice_time_in_nanos": 20624, + "avg_slice_time_in_nanos": 75597, + "breakdown": { + "min_build_leaf_collector": 9500, + "build_aggregation_count": 3, + "post_collection": 3209, + "max_collect_count": 2, + "initialize_count": 3, + "reduce_count": 0, + "avg_collect": 17055, + "max_build_aggregation": 26000, + "avg_collect_count": 1, + "max_build_leaf_collector": 64833, + "min_build_leaf_collector_count": 1, + "build_aggregation": 41125, + "min_initialize": 583, + "max_reduce": 0, + "build_leaf_collector_count": 3, + "avg_reduce": 0, + "min_collect_count": 1, + "avg_build_leaf_collector_count": 1, + "avg_build_leaf_collector": 45000, + "max_collect": 24625, + "reduce": 0, + "avg_build_aggregation": 12013, + "min_post_collection": 292, + "max_initialize": 1333, + "max_post_collection": 750, + "collect_count": 5, + "avg_post_collection": 541, + "avg_initialize": 986, + "post_collection_count": 3, + "build_leaf_collector": 86833, + "min_collect": 6250, + "min_build_aggregation": 3541, + "initialize": 2786791, + "max_build_leaf_collector_count": 1, + "min_reduce": 0, + "collect": 29834 + }, + "debug": { + "total_buckets": 1 + } + } + ] + } + ] + } +} +``` +
+ ## Response fields The response includes the following fields. @@ -245,9 +459,9 @@ Field | Data type | Description `type` | String | The Lucene query type into which the search query was rewritten. Corresponds to the Lucene class name (which often has the same name in OpenSearch). `description` | String | Contains a Lucene explanation of the query. Helps differentiate queries with the same type. `time_in_nanos` | Long | The total elapsed time for this query, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). -|`max_slice_time_in_nanos` | Long | The maximum amount of time taken by any slice to run a query, in nanoseconds. -|`min_slice_time_in_nanos` | Long | The minimum amount of time taken by any slice to run a query, in nanoseconds. -|`avg_slice_time_in_nanos` | Long | The average amount of time taken by any slice to run a query, in nanoseconds. +`max_slice_time_in_nanos` | Long | The maximum amount of time taken by any slice to run a query, in nanoseconds. +`min_slice_time_in_nanos` | Long | The minimum amount of time taken by any slice to run a query, in nanoseconds. +`avg_slice_time_in_nanos` | Long | The average amount of time taken by any slice to run a query, in nanoseconds. [`breakdown`](#the-breakdown-object) | Object | Contains timing statistics about low-level Lucene execution. `children` | Array of objects | If a query has subqueries (children), this field contains information about the subqueries. @@ -266,15 +480,7 @@ Field | Description `shallow_advance` | Contains the amount of time required to execute the `advanceShallow` Lucene method. `compute_max_score` | Contains the amount of time required to execute the `getMaxScore` Lucene method. `set_min_competitive_score` | Contains the amount of time required to execute the `setMinCompetitiveScore` Lucene method. -`_count` | Contains the number of invocations of a ``. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components. For concurrent segment search, this field contains the total number of invocations of a `` obtained by adding the number of method invocations for all slices. -`` | For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `build_scorer` method, it is the total time spent constructing the `Scorer` object across all slices. -`max_` | The maximum amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `max` time because the method runs at the query level rather than the slice level. -`min_` | The minimum amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `min` time because the method runs at the query level rather than the slice level. -`avg_` | The average amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `avg` time because the method runs at the query level rather than the slice level. -`_count` -`max__count` | The maximum number of invocations of a `` on any slice. Breakdown stats for the `create_weight` method do not include profiled `max` count because the method runs at the query level rather than the slice level. -`min__count` | The minimum number of invocations of a `` on any slice. Breakdown stats for the `create_weight` method do not include profiled `min` count because the method runs at the query level rather than the slice level. -`avg__count` | The average number of invocations of a `` on any slice. Breakdown stats for the `create_weight` method do not include profiled `avg` count because the method runs at the query level rather than the slice level. +`_count` | Contains the number of invocations of a ``. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components. For concurrent segment search, this field contains the total number of invocations of a `` obtained by adding the number of method invocations for all slices. For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `build_scorer` method, it is the total time spent constructing the `Scorer` object across all slices. ### The `collector` array @@ -286,11 +492,11 @@ Field | Description `reason` | Contains a description of the collector. For possible field values, see [Collector reasons](#collector-reasons). `time_in_nanos` | The total elapsed time for this collector, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total amount of time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). `children` | If a collector has subcollectors (children), this field contains information about the subcollectors. -`max_slice_time_in_nanos` |The maximum amount of time taken by any slice, in nanoseconds. -`min_slice_time_in_nanos` |The minimum amount of time taken by any slice, in nanoseconds. -`avg_slice_time_in_nanos` |The average amount of time taken by any slice, in nanoseconds. -`slice_count` |The total slice count for this query. -`reduce_time_in_nanos` |The amount of time taken to reduce results for all slice collectors, in nanoseconds. +`max_slice_time_in_nanos` |The maximum amount of time taken by any slice, in nanoseconds. This field is included only if you enable concurrent segment search. +`min_slice_time_in_nanos` |The minimum amount of time taken by any slice, in nanoseconds. This field is included only if you enable concurrent segment search. +`avg_slice_time_in_nanos` |The average amount of time taken by any slice, in nanoseconds. This field is included only if you enable concurrent segment search. +`slice_count` |The total slice count for this query. This field is included only if you enable concurrent segment search. +`reduce_time_in_nanos` |The amount of time taken to reduce results for all slice collectors, in nanoseconds. This field is included only if you enable concurrent segment search. Collector times are calculated, combined, and normalized independently, so they are independent of query times. {: .note} @@ -754,220 +960,6 @@ The response contains profiling information: ``` -#### Example response: Concurrent segment search - -The following is an example response for a concurrent search with three segment slices: - -
- - Response - - {: .text-delta} - -```json -{ - "took": 10, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 5, - "relation": "eq" - }, - "max_score": 1.0, - "hits": [ - ... - ] - }, - "aggregations": { - ... - }, - "profile": { - "shards": [ - { - "id": "[9Y7lbpaWRhyr5Y-41Zl48g][idx][0]", - "inbound_network_time_in_millis": 0, - "outbound_network_time_in_millis": 0, - "searches": [ - { - "query": [ - { - "type": "MatchAllDocsQuery", - "description": "*:*", - "time_in_nanos": 868000, - "max_slice_time_in_nanos": 19376, - "min_slice_time_in_nanos": 12250, - "avg_slice_time_in_nanos": 16847, - "breakdown": { - "max_match": 0, - "set_min_competitive_score_count": 0, - "match_count": 0, - "avg_score_count": 1, - "shallow_advance_count": 0, - "next_doc": 29708, - "min_build_scorer": 3125, - "score_count": 5, - "compute_max_score_count": 0, - "advance": 0, - "min_set_min_competitive_score": 0, - "min_advance": 0, - "score": 29250, - "avg_set_min_competitive_score_count": 0, - "min_match_count": 0, - "avg_score": 333, - "max_next_doc_count": 3, - "max_compute_max_score_count": 0, - "avg_shallow_advance": 0, - "max_shallow_advance_count": 0, - "set_min_competitive_score": 0, - "min_build_scorer_count": 2, - "next_doc_count": 8, - "min_match": 0, - "avg_next_doc": 888, - "compute_max_score": 0, - "min_set_min_competitive_score_count": 0, - "max_build_scorer": 5791, - "avg_match_count": 0, - "avg_advance": 0, - "build_scorer_count": 6, - "avg_build_scorer_count": 2, - "min_next_doc_count": 2, - "min_shallow_advance_count": 0, - "max_score_count": 2, - "avg_match": 0, - "avg_compute_max_score": 0, - "max_advance": 0, - "avg_shallow_advance_count": 0, - "avg_set_min_competitive_score": 0, - "avg_compute_max_score_count": 0, - "avg_build_scorer": 4027, - "max_set_min_competitive_score_count": 0, - "advance_count": 0, - "max_build_scorer_count": 2, - "shallow_advance": 0, - "min_compute_max_score": 0, - "max_match_count": 0, - "create_weight_count": 1, - "build_scorer": 32459, - "max_set_min_competitive_score": 0, - "max_compute_max_score": 0, - "min_shallow_advance": 0, - "match": 0, - "max_shallow_advance": 0, - "avg_advance_count": 0, - "min_next_doc": 708, - "max_advance_count": 0, - "min_score": 291, - "max_next_doc": 999, - "create_weight": 1834, - "avg_next_doc_count": 2, - "max_score": 376, - "min_compute_max_score_count": 0, - "min_score_count": 1, - "min_advance_count": 0 - } - } - ], - "rewrite_time": 8126, - "collector": [ - { - "name": "QueryCollectorManager", - "reason": "search_multi", - "time_in_nanos": 564708, - "reduce_time_in_nanos": 1251042, - "max_slice_time_in_nanos": 121959, - "min_slice_time_in_nanos": 28958, - "avg_slice_time_in_nanos": 83208, - "slice_count": 3, - "children": [ - { - "name": "SimpleTopDocsCollectorManager", - "reason": "search_top_hits", - "time_in_nanos": 500459, - "reduce_time_in_nanos": 840125, - "max_slice_time_in_nanos": 22168, - "min_slice_time_in_nanos": 5792, - "avg_slice_time_in_nanos": 12084, - "slice_count": 3 - }, - { - "name": "NonGlobalAggCollectorManager: [histo]", - "reason": "aggregation", - "time_in_nanos": 552167, - "reduce_time_in_nanos": 311292, - "max_slice_time_in_nanos": 95333, - "min_slice_time_in_nanos": 18416, - "avg_slice_time_in_nanos": 66249, - "slice_count": 3 - } - ] - } - ] - } - ], - "aggregations": [ - { - "type": "NumericHistogramAggregator", - "description": "histo", - "time_in_nanos": 2847834, - "max_slice_time_in_nanos": 117374, - "min_slice_time_in_nanos": 20624, - "avg_slice_time_in_nanos": 75597, - "breakdown": { - "min_build_leaf_collector": 9500, - "build_aggregation_count": 3, - "post_collection": 3209, - "max_collect_count": 2, - "initialize_count": 3, - "reduce_count": 0, - "avg_collect": 17055, - "max_build_aggregation": 26000, - "avg_collect_count": 1, - "max_build_leaf_collector": 64833, - "min_build_leaf_collector_count": 1, - "build_aggregation": 41125, - "min_initialize": 583, - "max_reduce": 0, - "build_leaf_collector_count": 3, - "avg_reduce": 0, - "min_collect_count": 1, - "avg_build_leaf_collector_count": 1, - "avg_build_leaf_collector": 45000, - "max_collect": 24625, - "reduce": 0, - "avg_build_aggregation": 12013, - "min_post_collection": 292, - "max_initialize": 1333, - "max_post_collection": 750, - "collect_count": 5, - "avg_post_collection": 541, - "avg_initialize": 986, - "post_collection_count": 3, - "build_leaf_collector": 86833, - "min_collect": 6250, - "min_build_aggregation": 3541, - "initialize": 2786791, - "max_build_leaf_collector_count": 1, - "min_reduce": 0, - "collect": 29834 - }, - "debug": { - "total_buckets": 1 - } - } - ] - } - ] - } -} -``` -
- ### Response fields The `aggregations` array contains aggregation objects with the following fields. @@ -983,15 +975,6 @@ Field | Data type | Description `max_slice_time_in_nanos` |The maximum amount of time taken by any slice to run an aggregation, in nanoseconds. `min_slice_time_in_nanos` |The minimum amount of time taken by any slice to run an aggregation, in nanoseconds. `avg_slice_time_in_nanos` |The average amount of time taken by any slice to run an aggregation, in nanoseconds. -`` |The total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `collect` method, it is the total time spent collecting documents into buckets across all slices. -`max_` |The maximum amount of time taken by any slice to run an aggregation method. -`min_`|The minimum amount of time taken by any slice to run an aggregation method. -`avg_` |The average amount of time taken by any slice to run an aggregation method. -`_count` |The total method count across all slices. For example, for the `collect` method, it is the total number of invocations of this method needed to collect documents into buckets across all slices. -`max__count` |The maximum number of invocations of a `` on any slice. -`min__count` |The minimum number of invocations of a `` on any slice. -`avg__count` |The average number of invocations of a `` on any slice. - ### The `breakdown` object @@ -1006,3 +989,11 @@ Field | Description `build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation. `reduce`| Contains the time spent in the `reduce` phase. `_count` | Contains the number of invocations of a ``. For example, `build_leaf_collector_count` contains the number of invocations of the `build_leaf_collector` method. +`` |The total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `collect` method, it is the total time spent collecting documents into buckets across all slices. +`max_` |The maximum amount of time taken by any slice to run an aggregation method. +`min_`|The minimum amount of time taken by any slice to run an aggregation method. +`avg_` |The average amount of time taken by any slice to run an aggregation method. +`_count` |The total method count across all slices. For example, for the `collect` method, it is the total number of invocations of this method needed to collect documents into buckets across all slices. +`max__count` |The maximum number of invocations of a `` on any slice. +`min__count` |The minimum number of invocations of a `` on any slice. +`avg__count` |The average number of invocations of a `` on any slice. From 312be73e74bebbc3184cf18d7e781a7385b597ac Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Wed, 7 Feb 2024 16:01:56 -0500 Subject: [PATCH 07/14] More tech review comments Signed-off-by: Fanit Kolchina --- _api-reference/profile.md | 34 ++++++++++++++++++++-------------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/_api-reference/profile.md b/_api-reference/profile.md index 89e8acaf97..8ca37b058c 100644 --- a/_api-reference/profile.md +++ b/_api-reference/profile.md @@ -459,9 +459,9 @@ Field | Data type | Description `type` | String | The Lucene query type into which the search query was rewritten. Corresponds to the Lucene class name (which often has the same name in OpenSearch). `description` | String | Contains a Lucene explanation of the query. Helps differentiate queries with the same type. `time_in_nanos` | Long | The total elapsed time for this query, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). -`max_slice_time_in_nanos` | Long | The maximum amount of time taken by any slice to run a query, in nanoseconds. -`min_slice_time_in_nanos` | Long | The minimum amount of time taken by any slice to run a query, in nanoseconds. -`avg_slice_time_in_nanos` | Long | The average amount of time taken by any slice to run a query, in nanoseconds. +`max_slice_time_in_nanos` | Long | The maximum amount of time taken by any slice to run a query, in nanoseconds. This field is included only if you enable concurrent segment search. +`min_slice_time_in_nanos` | Long | The minimum amount of time taken by any slice to run a query, in nanoseconds. This field is included only if you enable concurrent segment search. +`avg_slice_time_in_nanos` | Long | The average amount of time taken by any slice to run a query, in nanoseconds. This field is included only if you enable concurrent segment search. [`breakdown`](#the-breakdown-object) | Object | Contains timing statistics about low-level Lucene execution. `children` | Array of objects | If a query has subqueries (children), this field contains information about the subqueries. @@ -480,7 +480,13 @@ Field | Description `shallow_advance` | Contains the amount of time required to execute the `advanceShallow` Lucene method. `compute_max_score` | Contains the amount of time required to execute the `getMaxScore` Lucene method. `set_min_competitive_score` | Contains the amount of time required to execute the `setMinCompetitiveScore` Lucene method. -`_count` | Contains the number of invocations of a ``. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components. For concurrent segment search, this field contains the total number of invocations of a `` obtained by adding the number of method invocations for all slices. For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `build_scorer` method, it is the total time spent constructing the `Scorer` object across all slices. +`_count` | Contains the number of invocations of a ``. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components. +`max_` | The maximum amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `max` time because the method runs at the query level rather than the slice level. This field is included only if you enable concurrent segment search. +`min_` | The minimum amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `min` time because the method runs at the query level rather than the slice level. This field is included only if you enable concurrent segment search. +`avg_` | The average amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `avg` time because the method runs at the query level rather than the slice level. This field is included only if you enable concurrent segment search. +`max__count` | The maximum number of invocations of a `` on any slice. Breakdown stats for the `create_weight` method do not include profiled `max` count because the method runs at the query level rather than the slice level. This field is included only if you enable concurrent segment search. +`min__count` | The minimum number of invocations of a `` on any slice. Breakdown stats for the `create_weight` method do not include profiled `min` count because the method runs at the query level rather than the slice level. This field is included only if you enable concurrent segment search. +`avg__count` | The average number of invocations of a `` on any slice. Breakdown stats for the `create_weight` method do not include profiled `avg` count because the method runs at the query level rather than the slice level. This field is included only if you enable concurrent segment search. ### The `collector` array @@ -972,9 +978,9 @@ Field | Data type | Description [`breakdown`](#the-breakdown-object-1) | Object | Contains timing statistics about low-level Lucene execution. `children` | Array of objects | If an aggregation has subaggregations (children), this field contains information about the subaggregations. `debug` | Object | Some aggregations return a `debug` object that describes the details of the underlying execution. -`max_slice_time_in_nanos` |The maximum amount of time taken by any slice to run an aggregation, in nanoseconds. -`min_slice_time_in_nanos` |The minimum amount of time taken by any slice to run an aggregation, in nanoseconds. -`avg_slice_time_in_nanos` |The average amount of time taken by any slice to run an aggregation, in nanoseconds. +`max_slice_time_in_nanos` |Long | The maximum amount of time taken by any slice to run an aggregation, in nanoseconds. This field is included only if you enable concurrent segment search. +`min_slice_time_in_nanos` |Long |The minimum amount of time taken by any slice to run an aggregation, in nanoseconds. This field is included only if you enable concurrent segment search. +`avg_slice_time_in_nanos` |Long |The average amount of time taken by any slice to run an aggregation, in nanoseconds. This field is included only if you enable concurrent segment search. ### The `breakdown` object @@ -990,10 +996,10 @@ Field | Description `reduce`| Contains the time spent in the `reduce` phase. `_count` | Contains the number of invocations of a ``. For example, `build_leaf_collector_count` contains the number of invocations of the `build_leaf_collector` method. `` |The total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `collect` method, it is the total time spent collecting documents into buckets across all slices. -`max_` |The maximum amount of time taken by any slice to run an aggregation method. -`min_`|The minimum amount of time taken by any slice to run an aggregation method. -`avg_` |The average amount of time taken by any slice to run an aggregation method. -`_count` |The total method count across all slices. For example, for the `collect` method, it is the total number of invocations of this method needed to collect documents into buckets across all slices. -`max__count` |The maximum number of invocations of a `` on any slice. -`min__count` |The minimum number of invocations of a `` on any slice. -`avg__count` |The average number of invocations of a `` on any slice. +`max_` |The maximum amount of time taken by any slice to run an aggregation method. This field is included only if you enable concurrent segment search. +`min_`|The minimum amount of time taken by any slice to run an aggregation method. This field is included only if you enable concurrent segment search. +`avg_` |The average amount of time taken by any slice to run an aggregation method. This field is included only if you enable concurrent segment search. +`_count` |The total method count across all slices. For example, for the `collect` method, it is the total number of invocations of this method needed to collect documents into buckets across all slices. This field is included only if you enable concurrent segment search. +`max__count` |The maximum number of invocations of a `` on any slice. This field is included only if you enable concurrent segment search. +`min__count` |The minimum number of invocations of a `` on any slice. This field is included only if you enable concurrent segment search. +`avg__count` |The average number of invocations of a `` on any slice. This field is included only if you enable concurrent segment search. From ca02659d785c7d258075dba8a3209cbeca3cab3c Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Wed, 7 Feb 2024 16:12:47 -0500 Subject: [PATCH 08/14] Added link to response fields Signed-off-by: Fanit Kolchina --- _api-reference/index-apis/stats.md | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/_api-reference/index-apis/stats.md b/_api-reference/index-apis/stats.md index a812059fca..1297aa10df 100644 --- a/_api-reference/index-apis/stats.md +++ b/_api-reference/index-apis/stats.md @@ -810,11 +810,4 @@ By default, the returned statistics are aggregated in the `primaries` and `total ## Response fields -The following table provides information about the response fields. - -|Response field | Description | -|:--- |:--- | -|`search.concurrent_avg_slice_count` |The average slice count of all search requests. This is computed as the total slice count divided by the total number of concurrent search requests. | -|`search.concurrent_query_total` |The total number of query operations that use concurrent segment search. | -|`search.concurrent_query_time_in_millis` |The total amount of time taken by all query operations that use concurrent segment search, in milliseconds. | -|`search.concurrent_query_current` |The number of currently running query operations that use concurrent segment search. | +For information about response fields, see [Nodes Stats API response fields]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/#indices) From 9a383c9b588682640b9b4a7785770bb01ec58fc6 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Thu, 8 Feb 2024 14:12:28 -0500 Subject: [PATCH 09/14] More tech review comments Signed-off-by: Fanit Kolchina --- _api-reference/profile.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/_api-reference/profile.md b/_api-reference/profile.md index 8ca37b058c..c9a0058d25 100644 --- a/_api-reference/profile.md +++ b/_api-reference/profile.md @@ -26,7 +26,7 @@ A slice is the unit of work that can be executed by a thread. Each query can be In general, the max/min/avg slice time captures statistics across all slices for a timing type. For example, when profiling aggregations, the `max_slice_time_in_nanos` field in the `aggregations` section shows the maximum time consumed by the aggregation operation and its children across all slices. -#### Example request +#### Example request: Non-concurrent search To use the Profile API, include the `profile` parameter set to `true` in the search request sent to the `_search` endpoint: @@ -988,14 +988,13 @@ The `breakdown` object represents the timing statistics about low-level Lucene e Field | Description :--- | :--- -`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. -`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. -`collect`| Contains the time spent collecting the documents into buckets. -`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method. -`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation. +`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). +`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). +`collect`| Contains the time spent collecting the documents into buckets. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). +`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). +`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). `reduce`| Contains the time spent in the `reduce` phase. `_count` | Contains the number of invocations of a ``. For example, `build_leaf_collector_count` contains the number of invocations of the `build_leaf_collector` method. -`` |The total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `collect` method, it is the total time spent collecting documents into buckets across all slices. `max_` |The maximum amount of time taken by any slice to run an aggregation method. This field is included only if you enable concurrent segment search. `min_`|The minimum amount of time taken by any slice to run an aggregation method. This field is included only if you enable concurrent segment search. `avg_` |The average amount of time taken by any slice to run an aggregation method. This field is included only if you enable concurrent segment search. From ff372eb3031e55a5612ca571c4debd178f1f4004 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Thu, 8 Feb 2024 14:36:07 -0500 Subject: [PATCH 10/14] Add tech review comments Signed-off-by: Fanit Kolchina --- _api-reference/profile.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/_api-reference/profile.md b/_api-reference/profile.md index c9a0058d25..bc7687c97b 100644 --- a/_api-reference/profile.md +++ b/_api-reference/profile.md @@ -988,17 +988,17 @@ The `breakdown` object represents the timing statistics about low-level Lucene e Field | Description :--- | :--- -`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). -`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). -`collect`| Contains the time spent collecting the documents into buckets. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). -`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). -`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). -`reduce`| Contains the time spent in the `reduce` phase. +`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. For concurrent segment search, the `initialize` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). +`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. For concurrent segment search, the `build_leaf_collector` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). +`collect`| Contains the time spent collecting the documents into buckets. For concurrent segment search, the `collect` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). +`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method. For concurrent segment search, the `post_collection` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). +`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation. For concurrent segment search, the `build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). +`reduce`| Contains the time spent in the `reduce` phase. For concurrent segment search, the `reduce` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). `_count` | Contains the number of invocations of a ``. For example, `build_leaf_collector_count` contains the number of invocations of the `build_leaf_collector` method. `max_` |The maximum amount of time taken by any slice to run an aggregation method. This field is included only if you enable concurrent segment search. `min_`|The minimum amount of time taken by any slice to run an aggregation method. This field is included only if you enable concurrent segment search. `avg_` |The average amount of time taken by any slice to run an aggregation method. This field is included only if you enable concurrent segment search. -`_count` |The total method count across all slices. For example, for the `collect` method, it is the total number of invocations of this method needed to collect documents into buckets across all slices. This field is included only if you enable concurrent segment search. +`_count` |The total method count across all slices. For example, for the `collect` method, it is the total number of invocations of this method needed to collect documents into buckets across all slices. `max__count` |The maximum number of invocations of a `` on any slice. This field is included only if you enable concurrent segment search. `min__count` |The minimum number of invocations of a `` on any slice. This field is included only if you enable concurrent segment search. `avg__count` |The average number of invocations of a `` on any slice. This field is included only if you enable concurrent segment search. From 19bac1ade73ee33eba12b21371122e26249b7be9 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 9 Feb 2024 07:47:17 -0500 Subject: [PATCH 11/14] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _api-reference/index-apis/stats.md | 2 +- _api-reference/profile.md | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/_api-reference/index-apis/stats.md b/_api-reference/index-apis/stats.md index 1297aa10df..0658709f54 100644 --- a/_api-reference/index-apis/stats.md +++ b/_api-reference/index-apis/stats.md @@ -810,4 +810,4 @@ By default, the returned statistics are aggregated in the `primaries` and `total ## Response fields -For information about response fields, see [Nodes Stats API response fields]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/#indices) +For information about response fields, see [Nodes Stats API response fields]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/#indices). diff --git a/_api-reference/profile.md b/_api-reference/profile.md index bc7687c97b..20f7fae003 100644 --- a/_api-reference/profile.md +++ b/_api-reference/profile.md @@ -978,9 +978,9 @@ Field | Data type | Description [`breakdown`](#the-breakdown-object-1) | Object | Contains timing statistics about low-level Lucene execution. `children` | Array of objects | If an aggregation has subaggregations (children), this field contains information about the subaggregations. `debug` | Object | Some aggregations return a `debug` object that describes the details of the underlying execution. -`max_slice_time_in_nanos` |Long | The maximum amount of time taken by any slice to run an aggregation, in nanoseconds. This field is included only if you enable concurrent segment search. -`min_slice_time_in_nanos` |Long |The minimum amount of time taken by any slice to run an aggregation, in nanoseconds. This field is included only if you enable concurrent segment search. -`avg_slice_time_in_nanos` |Long |The average amount of time taken by any slice to run an aggregation, in nanoseconds. This field is included only if you enable concurrent segment search. +`max_slice_time_in_nanos` |Long | The maximum amount of time taken by any slice to run an aggregation, in nanoseconds. This field is included only if you enable concurrent segment search. +`min_slice_time_in_nanos` |Long |The minimum amount of time taken by any slice to run an aggregation, in nanoseconds. This field is included only if you enable concurrent segment search. +`avg_slice_time_in_nanos` |Long |The average amount of time taken by any slice to run an aggregation, in nanoseconds. This field is included only if you enable concurrent segment search. ### The `breakdown` object @@ -989,7 +989,7 @@ The `breakdown` object represents the timing statistics about low-level Lucene e Field | Description :--- | :--- `initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. For concurrent segment search, the `initialize` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). -`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. For concurrent segment search, the `build_leaf_collector` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). +`build_leaf_collector`| Contains the time spent running the aggregation's `getLeafCollector()` method, which creates a new collector to collect the given context. For concurrent segment search, the `build_leaf_collector` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). `collect`| Contains the time spent collecting the documents into buckets. For concurrent segment search, the `collect` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). `post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method. For concurrent segment search, the `post_collection` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). `build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation. For concurrent segment search, the `build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). From c7805db62bd371042dd14261c945eb17926c1d16 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 9 Feb 2024 08:25:46 -0500 Subject: [PATCH 12/14] Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/concurrent-segment-search.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_search-plugins/concurrent-segment-search.md b/_search-plugins/concurrent-segment-search.md index 3ee1444160..ff809dd04a 100644 --- a/_search-plugins/concurrent-segment-search.md +++ b/_search-plugins/concurrent-segment-search.md @@ -20,14 +20,14 @@ In OpenSearch, each search request follows the scatter-gather protocol. The coor Without concurrent segment search, Lucene executes a request sequentially across all segments on each shard during the query phase. The query phase then collects the top hits for the search request. With concurrent segment search, each shard-level request will search the segments in parallel during the query phase. For each shard, the segments are divided into multiple _slices_. Each slice is the unit of work that can be executed in parallel on a separate thread, so the slice count determines the maximum degree of parallelism for a shard-level request. Once all the slices complete their work, Lucene performs a reduce operation on the slices, merging them and creating the final result for this shard-level request. Slices are executed using a new `index_searcher` thread pool, which is different from the `search` thread pool that handles shard-level requests. -## Enabling concurrent search at the index or cluster level +## Enabling concurrent segment search at the index or cluster level By default, concurrent segment search is disabled on the cluster. You can enable concurrent segment search at two levels: - Cluster level - Index level -The index-level setting takes priority over the cluster-level setting. Thus, if the cluster setting is enabled but the index setting is disabled, then concurrent search will be disabled for that index. +The index-level setting takes priority over the cluster-level setting. Thus, if the cluster setting is enabled but the index setting is disabled, then concurrent segment search will be disabled for that index. {: .note} To enable concurrent segment search for all indexes in the cluster, set the following dynamic cluster setting: From cf5227785229a2028539689e33e067ec7e44144c Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 9 Feb 2024 08:27:09 -0500 Subject: [PATCH 13/14] Update _api-reference/profile.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _api-reference/profile.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/profile.md b/_api-reference/profile.md index 20f7fae003..5ff3b21f5e 100644 --- a/_api-reference/profile.md +++ b/_api-reference/profile.md @@ -223,7 +223,7 @@ The response contains profiling information: #### Example response: Concurrent segment search -The following is an example response for a concurrent search with three segment slices: +The following is an example response for a concurrent segment search with three segment slices:
From 870a3457e7478455fa8d0bac1261f2df30f41061 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Mon, 12 Feb 2024 09:37:37 -0500 Subject: [PATCH 14/14] Resolve merge conflicts Signed-off-by: Fanit Kolchina --- _search-plugins/concurrent-segment-search.md | 32 ++------------------ 1 file changed, 2 insertions(+), 30 deletions(-) diff --git a/_search-plugins/concurrent-segment-search.md b/_search-plugins/concurrent-segment-search.md index 283450c171..aac10c4378 100644 --- a/_search-plugins/concurrent-segment-search.md +++ b/_search-plugins/concurrent-segment-search.md @@ -77,43 +77,15 @@ The `search.concurrent.max_slice_count` setting can take the following valid val - `0`: Use the default Lucene mechanism. - Positive integer: Use the max target slice count mechanism. Usually, a value between 2 and 8 should be sufficient. -## API changes - -If you enable the concurrent segment search feature flag, the following Stats API responses will contain several additional fields with statistics about slices: - -- [Index Stats]({{site.url}}{{site.baseurl}}/api-reference/index-apis/stats/) -- [Nodes Stats]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/) - -For descriptions of the added fields, see [Index Stats API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/stats#concurrent-segment-search). - -Additionally, some [Profile API]({{site.url}}{{site.baseurl}}/api-reference/profile/) response fields will be modified and others added. For more information, see the [concurrent segment search section of the Profile API]({{site.url}}{{site.baseurl}}/api-reference/profile#concurrent-segment-search). - -## Limitations - -The following aggregations do not support the concurrent search model. If a search request contains one of these aggregations, the request will be executed using the non-concurrent path even if concurrent segment search is enabled at the cluster level or index level. -- Parent aggregations on [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/) fields. See [this GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/9316) for more information. -- `sampler` and `diversified_sampler` aggregations. See [this GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/110750) for more information. - -## Other considerations - -The following sections provide additional considerations for concurrent segment search. - - ### The `terminate_after` search parameter The [`terminate_after` search parameter]({{site.url}}{{site.baseurl}}/api-reference/search/#url-parameters) is used to terminate a search request once a specified number of documents has been collected. If you include the `terminate_after` parameter in a request, concurrent segment search is disabled and the request is run in a non-concurrent manner. Typically, queries are used with smaller `terminate_after` values and thus complete quickly because the search is performed on a reduced dataset. Therefore, concurrent search may not further improve performance in this case. Moreover, when `terminate_after` is used with other search request parameters, such as `track_total_hits` or `size`, it adds complexity and changes the expected query behavior. Falling back to a non-concurrent path for search requests that include `terminate_after` ensures consistent results between concurrent and non-concurrent requests. -### Sorting - -Depending on the data layout of the segments, the sort optimization feature can prune entire segments based on the min and max values as well as previously collected values. If the top values are present in the first few segments and all other segments are pruned, query latency may increase when sorting with concurrent segment search. Conversely, if the last few segments contain the top values, then latency may improve with concurrent segment search. - -### Terms aggregations - -Non-concurrent search calculates the document count error and returns it in the `doc_count_error_upper_bound` response parameter. During concurrent segment search, the `shard_size` parameter is applied at the segment slice level. Because of this, concurrent search may introduce an additional document count error. +## Limitations -For more information about how `shard_size` can affect both `doc_count_error_upper_bound` and collected buckets, see [this GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/11680#issuecomment-1885882985). +Parent aggregations on [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/) fields do not support the concurrent search model. Thus, if a search request contains a parent aggregation, the aggregation will be executed using the non-concurrent path even if concurrent segment search is enabled at the cluster level. ## Developer information: AggregatorFactory changes