Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Add kv processor content gap doc #5781

Merged
merged 17 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 118 additions & 0 deletions _ingest-pipelines/processors/kv.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
layout: default
title: KV
parent: Ingest processors
nav_order: 200
redirect_from:
- /api-reference/ingest-apis/processors/lowercase/
---

# KV processor

The `kv` processor automatically extracts specific event fields or messages that are `key=value` format. This structured format organizes your data by grouping it together based on keys and values. It's helpful for analyzing, visualizing, and using data such as user behavior analytics, performance optimizations, or security investigations.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

## Example
The following is the syntax for the `lowercase` processor:

```json
{
"kv": {
"field": "message",
"field_split": " ",
"value_split": " "
}
}
```
{% include copy-curl.html %}

#### Configuration parameters
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

The following table lists the required and optional parameters for the `lowercase` processor.

| Name | Required/Optional | Description |

Check failure on line 32 in _ingest-pipelines/processors/kv.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _ingest-pipelines/processors/kv.md#L32

[OpenSearch.TableHeadings] 'Required/Optional' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'Required/Optional' is a table heading and should be in sentence case.", "location": {"path": "_ingest-pipelines/processors/kv.md", "range": {"start": {"line": 32, "column": 11}}}, "severity": "ERROR"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the other processors, should the headings be: Parameter | Required | Description |

(Although, sometimes "Required/Optional" is the heading. See https://opensearch.org/docs/latest/ingest-pipelines/processors/date-index-name/ )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are updating the tables to now read Required/Optional. The first round originally said Required, but those need updating. I'll sync them this week. Good comment. Thanks!

|---|---|---|
`field` | Required | The name of the field that contains the data to be parsed. Supports [template snippets]({{site.url}}{{site.baseurl}}/ingest-pipelines/create-ingest/#template-snippets). |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`field` | Required | The name of the field that contains the data to be parsed. Supports [template snippets]({{site.url}}{{site.baseurl}}/ingest-pipelines/create-ingest/#template-snippets). |
`field` | Required | The name of the field containing the data to be parsed. Supports [template snippets]({{site.url}}{{site.baseurl}}/ingest-pipelines/create-ingest/#template-snippets). |

`field_split` | Required | The regex pattern for key-value pair splitting. |
`value_split` | Required | The regex pattern for splitting the key from the value within a key-value pair, for example, equal sign `=` or colon `:`.
`exclude_keys` | Optional | The keys to exclude from the document. Default is `null`. |
`include_keys` | Optional | The keys for filtering and inserting. Default is to include all keys. |
`prefix` | Optional | The prefix to add to the extracted keys. Default is `null`. |
`strip_brackets` | Optional | If set to `true`, strips brackets `()`, `<>,` or `[]` and quotes `'` or `"` from extracted values. Default is `false`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put parentheses around the examples?: ((), <>, or []) and quotes (' or ")

`trim_key` | Optional | String of characters to trim from extracted keys. |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
`trim value` | Optional | String of characters to trim from extracted values. |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
`description` | Optional | A brief description of the processor. |
`if` | Optional | A condition for running this processor. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`if` | Optional | A condition for running this processor. |
`if` | Optional | A condition for running the processor. |

`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. |
`on_failure` | Optional | A list of processors to run if the processor fails. |
`ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. |
`ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified field. Default is `false`. |

`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either "in order to distinguish" or "when distinguishing".

`target_field` | Optional | The name of the field in which to insert the extracted keys. Default is `null`. Supports [template snippets]({{site.url}}{{site.baseurl}}/ingest-pipelines/create-ingest/#template-snippets). |

## Using the processor

Follow these steps to use the processor in a pipeline.

**Step 1: Create a pipeline.**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Step 1: Create a pipeline.**
**Step 1: Create a pipeline**


The following query creates a pipeline, named `kv-pipeline`, that uses the `kv` processor to extract the `message` field of a document:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following query creates a pipeline, named `kv-pipeline`, that uses the `kv` processor to extract the `message` field of a document:
The following query creates a pipeline, named `kv-pipeline`, that uses the `kv` processor to extract the `message` field of a document:


```json
PUT _ingest/pipeline/kv-pipeline
{
"description" : "Pipeline that extracts user profile data",
"processors" : [
{
"kv" : {
"field" : "message",
"field_split": "",
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
"value_split": "="
}
}
]
}
```
{% include copy-curl.html %}

**Step 2 (Optional): Test the pipeline.**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Step 2 (Optional): Test the pipeline.**
**Step 2 (Optional): Test the pipeline**


It is recommended that you test your pipeline before you ingest documents.
{: .tip}

To test the pipeline, run the following query:

```json
POST _ingest/pipeline/kv-pipeline/_simulate

<insert example from SME>
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```
{% include copy-curl.html %}

#### Response
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm heading level.


The following example response confirms that the pipeline is working as expected:

```json

<insert reponse example from SME>
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

```

**Step 3: Ingest a document.**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Step 3: Ingest a document.**
**Step 3: Ingest a document**


The following query ingests a document into an index named `testindex1`:

```json
PUT testindex1/_doc/1?pipeline=kv-pipeline
<insert example from SME>
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
```
{% include copy-curl.html %}

**Step 4 (Optional): Retrieve the document.**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Step 4 (Optional): Retrieve the document.**
**Step 4 (Optional): Retrieve the document**


To retrieve the document, run the following query:

```json
GET testindex1/_doc/1
```
{% include copy-curl.html %}
2 changes: 1 addition & 1 deletion _ingest-pipelines/processors/lowercase.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The following table lists the required and optional parameters for the `lowercas

| Name | Required | Description |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
|---|---|---|
`field` | Required | The name of the field that contains the data to be converted. Supports template snippets. |
`field` | Required | The name of the field that contains the data to be converted. Supports [template snippets]({{site.url}}{{site.baseurl}}/ingest-pipelines/create-ingest/#template-snippets). |
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
`description` | Optional | A brief description of the processor. |
`if` | Optional | A condition for running this processor. |
`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. |
Expand Down