diff --git a/.github/workflows/encoding-check.yml b/.github/workflows/encoding-check.yml new file mode 100644 index 0000000000..ade95e5f37 --- /dev/null +++ b/.github/workflows/encoding-check.yml @@ -0,0 +1,27 @@ +name: Encoding Checker + +on: [pull_request] + +jobs: + encoding-checker: + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v3 + - name: Check for possible file that does not follow utf-8 encoding + run: | + set +e + IFS=$(echo -en "\n\b") + COUNTER=0 + for i in `find . -type f \( -name "*.txt" -o -name "*.md" -o -name "*.markdown" -o -name "*.html" \) | grep -vE "^./.git"`; + do + grep -axv '.*' "$i" + if [ "$?" -eq 0 ]; then + echo -e "######################\n$i\n######################" + COUNTER=$(( COUNTER + 1 )) + fi + done + if [ "$COUNTER" != 0 ]; then + echo "Found files that is not following utf-8 encoding, exit 1" + exit 1 + fi diff --git a/_about/version-history.md b/_about/version-history.md index 25e345568f..9a57c1ef35 100644 --- a/_about/version-history.md +++ b/_about/version-history.md @@ -9,6 +9,7 @@ permalink: /version-history/ OpenSearch version | Release highlights | Release date :--- | :--- | :--- +[2.13.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.13.0.md) | Makes agents and tools and the OpenSearch Assistant Toolkit generally available. Introduces vector quantization within OpenSearch. Adds LLM guardrails and hybrid search with aggregations. Adds the Bloom filter skipping index for Apache Spark data sources, I/O-based admission control, and the ability to add an alerting cluster that manages all alerting tasks. For a full list of release highlights, see the Release Notes. | 2 April 2024 [2.12.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.12.0.md) | Makes concurrent segment search and conversational search generally available. Provides an experimental OpenSearch Assistant Toolkit, including agents and tools, workflow automation, and OpenSearch Assistant for OpenSearch Dashboards UI. Adds a new match-only text field, query insights to monitor top N queries, and k-NN search on nested fields. For a full list of release highlights, see the Release Notes. | 20 February 2024 [2.11.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.11.1.md) | Includes maintenance changes and bug fixes for cross-cluster replication, alerting, observability, OpenSearch Dashboards, index management, machine learning, security, and security analytics. For a full list of release highlights, see the Release Notes. | 30 November 2023 [2.11.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.11.0.md) | Adds multimodal and sparse neural search capability and the ability to take shallow snapshots that refer to data stored in remote-backed storage. Makes the search comparison tool generally available. Includes a simplified workflow to create threat detectors in Security Analytics and improved security in OpenSearch Dashboards. Experimental features include a new framework and toolset for distributed tracing and updates to conversational search. For a full list of release highlights, see the Release Notes. | 16 October 2023 diff --git a/_config.yml b/_config.yml index 57dbf8c641..d9c0ee823f 100644 --- a/_config.yml +++ b/_config.yml @@ -5,10 +5,10 @@ baseurl: "/docs/latest" # the subpath of your site, e.g. /blog url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com permalink: /:path/ -opensearch_version: '2.12.0' -opensearch_dashboards_version: '2.12.0' -opensearch_major_minor_version: '2.12' -lucene_version: '9_9_2' +opensearch_version: '2.13.0' +opensearch_dashboards_version: '2.13.0' +opensearch_major_minor_version: '2.13' +lucene_version: '9_10_0' # Build settings markdown: kramdown diff --git a/_data-prepper/pipelines/configuration/buffers/kafka.md b/_data-prepper/pipelines/configuration/buffers/kafka.md index f641874a91..87600601b4 100644 --- a/_data-prepper/pipelines/configuration/buffers/kafka.md +++ b/_data-prepper/pipelines/configuration/buffers/kafka.md @@ -128,6 +128,7 @@ Option | Required | Type | Description #### producer_properties Use the following configuration options to configure a Kafka producer. + Option | Required | Type | Description :--- | :--- | :--- | :--- `max_request_size` | No | Integer | The maximum size of the request that the producer sends to Kafka. Default is 1 MB. diff --git a/_data-prepper/pipelines/configuration/processors/parse-xml.md b/_data-prepper/pipelines/configuration/processors/parse-xml.md new file mode 100644 index 0000000000..861705da2b --- /dev/null +++ b/_data-prepper/pipelines/configuration/processors/parse-xml.md @@ -0,0 +1,55 @@ +--- +layout: default +title: parse_xml +parent: Processors +grand_parent: Pipelines +nav_order: 83 +--- + +# parse_xml + +The `parse_xml` processor parses XML data for an event. + +## Configuration + +You can configure the `parse_xml` processor with the following options. + +| Option | Required | Type | Description | +| :--- | :--- | :--- | :--- | +| `source` | No | String | Specifies which `event` field to parse. | +| `destination` | No | String | The destination field of the parsed XML. Defaults to the root of the `event`. Cannot be `""`, `/`, or any white-space-only string because these are not valid `event` fields. | +| `pointer` | No | String | A JSON pointer to the field to be parsed. The value is null by default, meaning that the entire `source` is parsed. The `pointer` can access JSON array indexes as well. If the JSON pointer is invalid, then the entire `source` data is parsed into the outgoing `event` object. If the key that is pointed to already exists in the `event` object and the `destination` is the root, then the pointer uses the entire path of the key. | +| `parse_when` | No | String | Specifies under what conditions the processor should perform parsing. Default is no condition. Accepts a Data Prepper expression string following the [Data Prepper Expression Syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). | +| `tags_on_failure` | No | String | A list of strings that specify the tags to be set if the processor fails or an unknown exception occurs while parsing. + +## Usage + +The following examples show how to use the `parse_xml` processor in your pipeline. + +### Example: Minimum configuration + +The following example shows the minimum configuration for the `parse_xml` processor: + +```yaml +parse-xml-pipeline: + source: + stdin: + processor: + - parse_xml: + source: "my_xml" + sink: + - stdout: +``` +{% include copy.html %} + +When the input event contains the following data: + +``` +{ "my_xml": "John Doe30" } +``` + +The processor parses the event into the following output: + +``` +{ "name": "John Doe", "age": "30" } +``` \ No newline at end of file diff --git a/_data-prepper/pipelines/configuration/sinks/opensearch.md b/_data-prepper/pipelines/configuration/sinks/opensearch.md index 628515a985..7b8e99339f 100644 --- a/_data-prepper/pipelines/configuration/sinks/opensearch.md +++ b/_data-prepper/pipelines/configuration/sinks/opensearch.md @@ -50,7 +50,6 @@ pipeline: The following table describes options you can configure for the `opensearch` sink. - Option | Required | Type | Description :--- | :--- |:---| :--- `hosts` | Yes | List | A list of OpenSearch hosts to write to, such as `["https://localhost:9200", "https://remote-cluster:9200"]`. @@ -89,9 +88,9 @@ Option | Required | Type | Description `normalize_index` | No | Boolean | If true, then the OpenSearch sink will try to create dynamic index names. Index names with format options specified in `${})` are valid according to the [index naming restrictions]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/#index-naming-restrictions). Any invalid characters will be removed. Default value is `false`. `routing` | No | String | A string used as a hash for generating the `shard_id` for a document when it is stored in OpenSearch. Each incoming record is searched. When present, the string is used as the routing field for the document. When not present, the default routing mechanism (`document_id`) is used by OpenSearch when storing the document. Supports formatting with fields in events and [Data Prepper expressions]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/), such as `${/my_field}-test-${getMetadata(\"some_metadata_key\")}`. `document_root_key` | No | String | The key in the event that will be used as the root in the document. The default is the root of the event. If the key does not exist, then the entire event is written as the document. If `document_root_key` is of a basic value type, such as a string or integer, then the document will have a structure of `{"data": }`. -`serverless` | No | Boolean | Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` sink is an Amazon OpenSearch Serverless collection. Default is `false`. -`serverless_options` | No | Object | The network configuration options available when the backend of the `opensearch` sink is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options). - +`serverless` | No | Boolean | **Deprecated in Data Prepper 2.7. Use this option with the `aws` configuration instead.** Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` sink is an Amazon OpenSearch Serverless collection. Default is `false`. +`serverless_options` | No | Object | **Deprecated in Data Prepper 2.7. Use this option with the `aws` configuration instead.** The network configuration options available when the backend of the `opensearch` sink is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options). + ## aws @@ -101,8 +100,8 @@ Option | Required | Type | Description `sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon SQS and Amazon S3. Defaults to `null`, which will use [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). `sts_header_overrides` | No | Map | A map of header overrides that the IAM role assumes for the sink plugin. `sts_external_id` | No | String | The external ID to attach to AssumeRole requests from AWS STS. -`serverless` | No | Boolean | **Deprecated in Data Prepper 2.7. Use this option with the `aws` configuration instead.** Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` sink is an Amazon OpenSearch Serverless collection. Default is `false`. -`serverless_options` | No | Object | **Deprecated in Data Prepper 2.7. Use this option with the `aws` configuration instead.** The network configuration options available when the backend of the `opensearch` sink is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options). +`serverless` | No | Boolean | Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` sink is an Amazon OpenSearch Serverless collection. Default is `false`. +`serverless_options` | No | Object | The network configuration options available when the backend of the `opensearch` sink is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options). ## actions diff --git a/_data/versions.json b/_data/versions.json index 1dd727f3e1..03e43e6d4a 100644 --- a/_data/versions.json +++ b/_data/versions.json @@ -1,10 +1,11 @@ { - "current": "2.12", + "current": "2.13", "all": [ - "2.12", + "2.13", "1.3" ], "archived": [ + "2.12", "2.11", "2.10", "2.9", @@ -21,7 +22,7 @@ "1.1", "1.0" ], - "latest": "2.12" + "latest": "2.13" } diff --git a/_query-dsl/term/exists.md b/_query-dsl/term/exists.md index a62dda981b..1d52744c91 100644 --- a/_query-dsl/term/exists.md +++ b/_query-dsl/term/exists.md @@ -146,4 +146,8 @@ The response contains the matching document: ## Parameters -The query accepts the name of the field (``) as a top-level parameter. \ No newline at end of file +The query accepts the name of the field (``) as a top-level parameter. + +Parameter | Data type | Description +:--- | :--- | :--- +`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0. diff --git a/_query-dsl/term/fuzzy.md b/_query-dsl/term/fuzzy.md index f5a9773aeb..9afa85ea93 100644 --- a/_query-dsl/term/fuzzy.md +++ b/_query-dsl/term/fuzzy.md @@ -67,7 +67,7 @@ GET _search "fuzzy": { "": { "value": "sample", - ... + ... } } } @@ -80,11 +80,12 @@ The `` accepts the following parameters. All parameters except `value` ar Parameter | Data type | Description :--- | :--- | :--- `value` | String | The term to search for in the field specified in ``. +`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0. `fuzziness` | `AUTO`, `0`, or a positive integer | The number of character edits (insert, delete, substitute) needed to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases. `max_expansions` | Positive integer | The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms. Default is `50`. `prefix_length` | Non-negative integer | The number of leading characters that are not considered in fuzziness. Default is `0`. `rewrite` | String | Determines how OpenSearch rewrites and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`. -`transpositions` | Boolean | Specifies whether to allow transpositions of two adjacent characters (`ab` to `ba`) as edits. Default is `true`. +`transpositions` | Boolean | Specifies whether to allow transpositions of two adjacent characters (`ab` to `ba`) as edits. Default is `true`. Specifying a large value in `max_expansions` can lead to poor performance, especially if `prefix_length` is set to `0`, because of the large number of variations of the word that OpenSearch tries to match. {: .warning} diff --git a/_query-dsl/term/ids.md b/_query-dsl/term/ids.md index a1a098f586..0c3b5393fb 100644 --- a/_query-dsl/term/ids.md +++ b/_query-dsl/term/ids.md @@ -32,3 +32,4 @@ The query accepts the following parameter. Parameter | Data type | Description :--- | :--- | :--- `values` | Array of strings | The document IDs to search for. Required. +`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0. diff --git a/_query-dsl/term/prefix.md b/_query-dsl/term/prefix.md index 14f208f3c5..eda5307d14 100644 --- a/_query-dsl/term/prefix.md +++ b/_query-dsl/term/prefix.md @@ -50,7 +50,7 @@ GET _search "prefix": { "": { "value": "sample", - ... + ... } } } @@ -63,8 +63,9 @@ The `` accepts the following parameters. All parameters except `value` ar Parameter | Data type | Description :--- | :--- | :--- `value` | String | The term to search for in the field specified in ``. +`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0. `case_insensitive` | Boolean | If `true`, allows case-insensitive matching of the value with the indexed field values. Default is `false` (case sensitivity is determined by the field's mapping). `rewrite` | String | Determines how OpenSearch rewrites and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`. If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, prefix queries are not run. If `index_prefixes` is enabled, the `search.allow_expensive_queries` setting is ignored and an optimized query is built and run. -{: .important} \ No newline at end of file +{: .important} diff --git a/_query-dsl/term/range.md b/_query-dsl/term/range.md index 076ab5ad15..8a8f53c480 100644 --- a/_query-dsl/term/range.md +++ b/_query-dsl/term/range.md @@ -90,7 +90,7 @@ OpenSearch populates missing date components with the following values: - `SECOND_OF_MINUTE`: `59` - `NANO_OF_SECOND`: `999_999_999` -If the year is missing, it is not populated. +If the year is missing, it is not populated. For example, consider the following request that specifies only the year in the start date: @@ -131,7 +131,7 @@ GET products/_search ``` {% include copy-curl.html %} -In the preceding example, `2019/01/01` is the anchor date (the starting point) for the date math. After the two pipe characters (`||`), you are specifying a mathematical expression relative to the anchor date. In this example, you are subtracting 1 year (`-1y`) and 1 day (`-1d`). +In the preceding example, `2019/01/01` is the anchor date (the starting point) for the date math. After the two pipe characters (`||`), you are specifying a mathematical expression relative to the anchor date. In this example, you are subtracting 1 year (`-1y`) and 1 day (`-1d`). You can also round off dates by adding a forward slash to the date or time unit. @@ -175,8 +175,8 @@ GET /products/_search "query": { "range": { "created": { - "time_zone": "-04:00", - "gte": "2022-04-17T06:00:00" + "time_zone": "-04:00", + "gte": "2022-04-17T06:00:00" } } } @@ -184,7 +184,7 @@ GET /products/_search ``` {% include copy-curl.html %} -The `gte` parameter in the preceding query is converted to `2022-04-17T10:00:00 UTC`, which is the UTC equivalent of `2022-04-17T06:00:00-04:00`. +The `gte` parameter in the preceding query is converted to `2022-04-17T10:00:00 UTC`, which is the UTC equivalent of `2022-04-17T06:00:00-04:00`. The `time_zone` parameter does not affect the `now` value because `now` always corresponds to the current system time in UTC. {: .note} @@ -200,7 +200,7 @@ GET _search "range": { "": { "gt": 10, - ... + ... } } } @@ -215,7 +215,7 @@ Parameter | Data type | Description :--- | :--- | :--- `format` | String | A [format]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/#formats) for dates in this query. Default is the field's mapped format. `relation` | String | Indicates how the range query matches values for [`range`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/) fields. Valid values are:
- `INTERSECTS` (default): Matches documents whose `range` field value intersects the range provided in the query.
- `CONTAINS`: Matches documents whose `range` field value contains the entire range provided in the query.
- `WITHIN`: Matches documents whose `range` field value is entirely within the range provided in the query. -`boost` | Floating-point | Boosts the query by the given multiplier. Useful for searches that contain more than one query. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is `1`. +`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0. `time_zone` | String | The time zone used to convert [`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) values to UTC in the query. Valid values are ISO 8601 [UTC offsets](https://en.wikipedia.org/wiki/List_of_UTC_offsets) and [IANA time zone IDs](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). For more information, see [Time zone](#time-zone). If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, range queries on [`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) and [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) fields are not run. diff --git a/_query-dsl/term/regexp.md b/_query-dsl/term/regexp.md index 31bc6460aa..65d6953516 100644 --- a/_query-dsl/term/regexp.md +++ b/_query-dsl/term/regexp.md @@ -43,7 +43,7 @@ GET _search "regexp": { "": { "value": "[Ss]ample", - ... + ... } } } @@ -56,6 +56,7 @@ The `` accepts the following parameters. All parameters except `value` ar Parameter | Data type | Description :--- | :--- | :--- `value` | String | The regular expression used for matching terms in the field specified in ``. +`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0. `case_insensitive` | Boolean | If `true`, allows case-insensitive matching of the regular expression value with the indexed field values. Default is `false` (case sensitivity is determined by the field's mapping). `flags` | String | Enables optional operators for Lucene’s regular expression engine. `max_determinized_states` | Integer | Lucene converts a regular expression to an automaton with a number of determinized states. This parameter specifies the maximum number of automaton states the query requires. Use this parameter to prevent high resource consumption. To run complex regular expressions, you may need to increase the value of this parameter. Default is 10,000. @@ -63,4 +64,3 @@ Parameter | Data type | Description If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, `regexp` queries are not run. {: .important} - diff --git a/_query-dsl/term/term.md b/_query-dsl/term/term.md index 20694fb455..c1c296b9a0 100644 --- a/_query-dsl/term/term.md +++ b/_query-dsl/term/term.md @@ -82,7 +82,7 @@ GET _search "term": { "": { "value": "sample", - ... + ... } } } @@ -95,5 +95,5 @@ The `` accepts the following parameters. All parameters except `value` ar Parameter | Data type | Description :--- | :--- | :--- `value` | String | The term to search for in the field specified in ``. A document is returned in the results only if its field value exactly matches the term, with the correct spacing and capitalization. -`boost` | Floating-point | Boosts the query by the given multiplier. Useful for searches that contain more than one query. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is `1`. -`case_insensitive` | Boolean | If `true`, allows case-insensitive matching of the value with the indexed field values. Default is `false` (case sensitivity is determined by the field's mapping). \ No newline at end of file +`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0. +`case_insensitive` | Boolean | If `true`, allows case-insensitive matching of the value with the indexed field values. Default is `false` (case sensitivity is determined by the field's mapping). diff --git a/_query-dsl/term/terms-set.md b/_query-dsl/term/terms-set.md index 452cae66b2..ea0251ddff 100644 --- a/_query-dsl/term/terms-set.md +++ b/_query-dsl/term/terms-set.md @@ -153,7 +153,7 @@ GET _search "terms_set": { "": { "terms": [ "term1", "term2" ], - ... + ... } } } @@ -167,4 +167,5 @@ Parameter | Data type | Description :--- | :--- | :--- `terms` | Array of strings | The array of terms to search for in the field specified in ``. A document is returned in the results only if the required number of terms matches the document's field values exactly, with the correct spacing and capitalization. `minimum_should_match_field` | String | The name of the [numeric]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/numeric/) field that specifies the number of matching terms required in order to return a document in the results. -`minimum_should_match_script` | String | A script that returns the number of matching terms required in order to return a document in the results. \ No newline at end of file +`minimum_should_match_script` | String | A script that returns the number of matching terms required in order to return a document in the results. +`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0. diff --git a/_query-dsl/term/terms.md b/_query-dsl/term/terms.md index 5e5524e6a3..fd15126255 100644 --- a/_query-dsl/term/terms.md +++ b/_query-dsl/term/terms.md @@ -39,7 +39,7 @@ The query accepts the following parameters. All parameters are optional. Parameter | Data type | Description :--- | :--- | :--- `` | String | The field in which to search. A document is returned in the results only if its field value exactly matches at least one term, with the correct spacing and capitalization. -`boost` | Floating-point | Boosts the query by the given multiplier. Useful for searches that contain more than one query. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is `1`. +`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0. ## Terms lookup @@ -249,4 +249,5 @@ Parameter | Data type | Description `index` | String | The name of the index in which to fetch field values. Required. `id` | String | The document ID of the document from which to fetch field values. Required. `path` | String | The name of the field from which to fetch field values. Specify nested fields using dot path notation. Required. -`routing` | String | Custom routing value of the document from which to fetch field values. Optional. Required if a custom routing value was provided when the document was indexed. \ No newline at end of file +`routing` | String | Custom routing value of the document from which to fetch field values. Optional. Required if a custom routing value was provided when the document was indexed. +`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0. diff --git a/_query-dsl/term/wildcard.md b/_query-dsl/term/wildcard.md index 897ab6ed0f..0652581941 100644 --- a/_query-dsl/term/wildcard.md +++ b/_query-dsl/term/wildcard.md @@ -61,7 +61,7 @@ The `` accepts the following parameters. All parameters except `value` ar Parameter | Data type | Description :--- | :--- | :--- `value` | String | The wildcard pattern used for matching terms in the field specified in ``. -`boost` | Floating-point | Boosts the query by the given multiplier. Useful for searches that contain more than one query. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is `1`. +`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0. `case_insensitive` | Boolean | If `true`, allows case-insensitive matching of the value with the indexed field values. Default is `false` (case sensitivity is determined by the field's mapping). `rewrite` | String | Determines how OpenSearch rewrites and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`. diff --git a/_search-plugins/async/index.md b/_search-plugins/async/index.md index a663e7faf9..099279ba91 100644 --- a/_search-plugins/async/index.md +++ b/_search-plugins/async/index.md @@ -31,6 +31,7 @@ Options | Description | Default value | Required `wait_for_completion_timeout` | The amount of time that you plan to wait for the results. You can see whatever results you get within this time just like in a normal search. You can poll the remaining results based on an ID. The maximum value is 300 seconds. | 1 second | No `keep_on_completion` | Whether you want to save the results in the cluster after the search is complete. You can examine the stored results at a later time. | `false` | No `keep_alive` | The amount of time that the result is saved in the cluster. For example, `2d` means that the results are stored in the cluster for 48 hours. The saved search results are deleted after this period or if the search is canceled. Note that this includes the query execution time. If the query overruns this time, the process cancels this query automatically. | 12 hours | No +`index` | The name of the index to be searched. Can be an individual name, a comma-separated list of indexes, or a wildcard expression of index names. | All indexes in the cluster | No #### Example request diff --git a/_security/access-control/users-roles.md b/_security/access-control/users-roles.md index ae7670bc29..687796d0c4 100644 --- a/_security/access-control/users-roles.md +++ b/_security/access-control/users-roles.md @@ -29,7 +29,7 @@ OpenSearch Dashboards provides a user-friendly interface for managing roles. Rol ### Editing the `roles.yml` file If you want more granular control of your security configuration, you can edit roles and their associated permissions in the `roles.yml` file. This method provides direct access to the underlying configuration and can be version controlled for use in collaborative development environments. -For more information about creating roles, see the [Create roles][https://opensearch.org/docs/latest/security/access-control/users-roles/#create-roles) documentation. +For more information about creating roles, see the [Create roles](https://opensearch.org/docs/latest/security/access-control/users-roles/#create-roles) documentation. Unless you need to create new [reserved or hidden users]({{site.url}}{{site.baseurl}}/security/access-control/api/#reserved-and-hidden-resources), we **highly** recommend using OpenSearch Dashboards or the REST API to create new users, roles, and role mappings. The `.yml` files are for initial setup, not ongoing use. {: .warning } diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md index 6a4486d966..ed2fae5cc4 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md @@ -96,3 +96,5 @@ The following are known limitations of the searchable snapshots feature: - Accessing data from a remote repository is slower than local disk reads, so higher latencies on search queries are expected. - Many remote object stores charge on a per-request basis for retrieval, so users should closely monitor any costs incurred. - Searching remote data can impact the performance of other queries running on the same node. We recommend that users provision dedicated nodes with the `search` role for performance-critical applications. +- For better search performance, consider [force merging]({{site.url}}{{site.baseurl}}/api-reference/index-apis/force-merge/) indexes into a smaller number of segments before taking a snapshot. For the best performance, at the cost of using compute resources prior to snapshotting, force merge your index into one segment. +- We recommend configuring a maximum ratio of remote data to local disk cache size using the `cluster.filecache.remote_data_ratio` setting. A ratio of 5 is a good starting point for most workloads to ensure good query performance. If the ratio is too large, then there may not be sufficient disk space to handle the search workload. See issue [#11676](https://github.com/opensearch-project/OpenSearch/issues/11676) for a known bug related to this scenario. \ No newline at end of file diff --git a/release-notes/opensearch-documentation-release-notes-2.13.0.md b/release-notes/opensearch-documentation-release-notes-2.13.0.md new file mode 100644 index 0000000000..38f7678cde --- /dev/null +++ b/release-notes/opensearch-documentation-release-notes-2.13.0.md @@ -0,0 +1,35 @@ +# OpenSearch Documentation Website 2.13.0 Release Notes + +The OpenSearch 2.13.0 documentation includes the following additions and updates. + +## New documentation for 2.13.0 + +- Add example to text chunking processor documentation [#6794](https://github.com/opensearch-project/documentation-website/pull/6794) +- Add documentation for default use cases [#6767](https://github.com/opensearch-project/documentation-website/pull/6767) +- Add documentation for IO Based AdmissionController Stats [#6755](https://github.com/opensearch-project/documentation-website/pull/6755) +- Add the supported metric types [#6754](https://github.com/opensearch-project/documentation-website/pull/6754) +- Add guardrails for remote model [#6750](https://github.com/opensearch-project/documentation-website/pull/6750) +- Add qa model and new settings in ml-commons [#6749](https://github.com/opensearch-project/documentation-website/pull/6749) +- Update documentation for automatic remote model deployment [#6748](https://github.com/opensearch-project/documentation-website/pull/6748) +- Add client_config parameter documentation [#6746](https://github.com/opensearch-project/documentation-website/pull/6746) +- Remove experimental feature labels and flags for OS Assistant [#6745](https://github.com/opensearch-project/documentation-website/pull/6745) +- Remove experimental feature warning for Flow Framework plugin docs [#6741](https://github.com/opensearch-project/documentation-website/pull/6741) +- Add documentation for new workflow steps [#6740](https://github.com/opensearch-project/documentation-website/pull/6740) +- Add documentation for optional param for get workflow step API [#6736](https://github.com/opensearch-project/documentation-website/pull/6736) +- Update plugins.md with semver range support specification [#6733](https://github.com/opensearch-project/documentation-website/pull/6733) +- Remove feature flag requirement for fuzzy filter settings [#6731](https://github.com/opensearch-project/documentation-website/pull/6731) +- Update doc for decoupling of remote cluster state with remote backed data storage [#6730](https://github.com/opensearch-project/documentation-website/pull/6730) +- Add static setting for checkPendingFlushUpdate functionality of lucene index writer [#6728](https://github.com/opensearch-project/documentation-website/pull/6728) +- Add documentation for retry settings for Remote reindex [#6726](https://github.com/opensearch-project/documentation-website/pull/6726) +- Add Default Model Id for Neural Sparse Search Query in neural_query_enricher [#6725](https://github.com/opensearch-project/documentation-website/pull/6725) +- Add post_filter is supported in hybrid search [#6724](https://github.com/opensearch-project/documentation-website/pull/6724) +- Update deb/rpm autorestart service after upgrade documentation [#6720](https://github.com/opensearch-project/documentation-website/pull/6720) +- Add documentation page for Vega Visualizations [#6711](https://github.com/opensearch-project/documentation-website/pull/6711) +- Add documentation for text chunking processor [#6707](https://github.com/opensearch-project/documentation-website/pull/6707) +- Update documentation to support InnerProduct with k-NN Lucene Engine [#6703](https://github.com/opensearch-project/documentation-website/pull/6703) +- Add documentation for kuromoji_completion filter [#6699](https://github.com/opensearch-project/documentation-website/pull/6699) +- Add note about not passing 0 vector for cosine sim in k-NN [#6698](https://github.com/opensearch-project/documentation-website/pull/6698) +- Update the multiple data source documentation [Multiple Data Source][2.13.0] [#6689](https://github.com/opensearch-project/documentation-website/pull/6689) +- Add document on how to configure XContent codepoint limit (YAML) [#6666](https://github.com/opensearch-project/documentation-website/pull/6666) +- Add force-merge API supports primary_only parameter [#6664](https://github.com/opensearch-project/documentation-website/pull/6664) +- Add aggregations to Search-Hybrid search section [#6661](https://github.com/opensearch-project/documentation-website/pull/6661)