From 176452ba1cc18827175bdbe5b24f7ad1dc9307ac Mon Sep 17 00:00:00 2001
From: Ian Menendez <61611304+IanMenendez@users.noreply.github.com>
Date: Wed, 1 May 2024 20:19:33 -0300
Subject: [PATCH] feat: fix overlap rate param (#7045)

Signed-off-by: Ian Menendez <ianfmenendezd@gmail.com>
Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
---
 _ingest-pipelines/processors/text-chunking.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/_ingest-pipelines/processors/text-chunking.md b/_ingest-pipelines/processors/text-chunking.md
index d11c380bde..97229d2aaa 100644
--- a/_ingest-pipelines/processors/text-chunking.md
+++ b/_ingest-pipelines/processors/text-chunking.md
@@ -47,11 +47,11 @@ The following table lists the required and optional parameters for the `text_chu
 The following table lists the optional parameters for the `fixed_token_length` algorithm.
 
 | Parameter  | Data type | Required/Optional  | Description  |
-|:---|:---|:---|:---|
-| `token_limit`	     | Integer	   | Optional	 | The token limit for chunking algorithms. Valid values are integers of at least `1`. Default is `384`.	                                                  |
-| `tokenizer`	       | String	    | Optional	 | The [word tokenizer]({{site.url}}{{site.baseurl}}/analyzers/tokenizers/index/#word-tokenizers) name. Default is `standard`.	 |
-| `overlap_rate`	    | String	    | Optional	 | The degree of overlap in the token algorithm. Valid values are floats between `0` and `0.5`, inclusive. Default is `0`.	                                              |
-| `max_chunk_limit`	 | Integer	   | Optional	 | The chunk limit for chunking algorithms. Default is 100. To disable this parameter, set it to `-1`.	|
+|:---|:----------|:---|:---|
+| `token_limit`	     | Integer	  | Optional	 | The token limit for chunking algorithms. Valid values are integers of at least `1`. Default is `384`.	                                                  |
+| `tokenizer`	       | String	   | Optional	 | The [word tokenizer]({{site.url}}{{site.baseurl}}/analyzers/tokenizers/index/#word-tokenizers) name. Default is `standard`.	 |
+| `overlap_rate`	    | Float     | Optional	 | The degree of overlap in the token algorithm. Valid values are floats between `0` and `0.5`, inclusive. Default is `0`.	                                              |
+| `max_chunk_limit`	 | Integer   | Optional	 | The chunk limit for chunking algorithms. Default is 100. To disable this parameter, set it to `-1`.	|
 
 The default value of `token_limit` is `384` so that output passages don't exceed the token limit constraint of the downstream text embedding models. For [OpenSearch-supported pretrained models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#supported-pretrained-models), like `msmarco-distilbert-base-tas-b` and `opensearch-neural-sparse-encoding-v1`, the input token limit is `512`. The `standard` tokenizer tokenizes text into words. According to [OpenAI](https://platform.openai.com/docs/introduction), 1 token equals approximately 0.75 words of English text. The default token limit is calculated as 512 * 0.75 = 384.
 {: .note}