-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC] Add new doc for grok processor #5019
Merged
Merged
Changes from 15 commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
fae8001
Add new doc for grok processor
vagimeli 632dc96
Merge branch 'main' into grok-processor
vagimeli c9cbd62
Merge branch 'main' into grok-processor
vagimeli f028ff0
Writing
vagimeli f23b3c5
Merge branch 'main' into grok-processor
vagimeli 56cfb78
Writing
vagimeli 3103ac7
Merge branch 'main' into grok-processor
vagimeli e0bcd0e
Merge branch 'main' into grok-processor
vagimeli 8059c24
Merge branch 'main' into grok-processor
vagimeli 8d84939
Add custom patterns
vagimeli 589eaf4
Merge branch 'main' into grok-processor
vagimeli 898ff52
Update _api-reference/ingest-apis/processors/grok.md
vagimeli 431b45d
Update _api-reference/ingest-apis/processors/grok.md
vagimeli 2b620fb
Merge branch 'main' into grok-processor
vagimeli bf81892
Address tech feedback
vagimeli ce08abe
Address tech feedback
vagimeli 0ddb2db
Merge branch 'main' into grok-processor
vagimeli d7b7af8
Merge branch 'main' into grok-processor
vagimeli 10acca3
Merge branch 'main' into grok-processor
vagimeli 9318025
Merge branch 'main' into grok-processor
vagimeli efc2346
Revise content
vagimeli d591f6d
Revise intro section based on doc review
vagimeli 1791c59
Revise intro section based on doc review
vagimeli f9c2ee0
Revise intro section based on doc review
vagimeli 3eace4a
Merge branch 'main' into grok-processor
vagimeli 549ee7d
Merge branch 'main' into grok-processor
vagimeli 601677d
Change example and intro sentences to examples
kolchfa-aws 08f1d71
Link fix
kolchfa-aws 615f3d3
Address doc review feedback
vagimeli File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,211 @@ | ||
--- | ||
layout: default | ||
title: Grok | ||
parent: Ingest processors | ||
grand_parent: Ingest pipelines | ||
nav_order: 140 | ||
--- | ||
|
||
# Grok | ||
|
||
The `grok` processor is used to parse and extract structured data from unstructured data. It is useful in log analytics and data processing pipelines where data is often in a raw and unformatted state. The `grok` processor uses a combination of pattern matching and regular expressions to identify and extract information from the input text. The processor supports a range of predefined patterns for common data types such as timestamps, IP addresses, and usernames. The `grok` processor can perform transformations on extracted data, such as converting a timestamp to a proper date field. | ||
|
||
The following is the syntax for the `grok` processor: | ||
|
||
```json | ||
{ | ||
"grok": { | ||
"field": "your_message", | ||
"patterns": ["your_patterns"] | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Grok patterns | ||
|
||
The Grok processor is based on the [`java-grok`](https://mvnrepository.com/artifact/io.krakens/java-grok) library and supports all compatible patterns. The `java-grok` library is built using the [`java.util.regex`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/package-summary.html) regular expression library. | ||
|
||
You can add custom patterns to your pipelines using the `patterns_definitions` parameter. When debugging custom patterns, the [Grok Debugger](https://grokdebugger.com/) can help you test and debug grok patterns before using them in your [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). | ||
|
||
## Configuration parameters | ||
|
||
To configure the `grok` processor, you have various options that allow you to define patterns, match specific keys, and control the processor's behavior. The following table lists the required and optional parameters for the `grok` processor. | ||
|
||
Parameter | Required | Description | | ||
|-----------|-----------|-----------| | ||
`field` | Required | The name of the field to which the data should be parsed. | | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`patterns` | Required | A list of grok expressions used to match and extract named captures. The first expression in the list that matches is returned. | | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`pattern_definitions` | Optional | A dictionary of pattern names and pattern tuples (a pair of a pattern name, which is a string that identifies the pattern, and a pattern, which is the string that specifies the pattern itself) is used to define custom patterns for the current processor. If a pattern matches an existing name, it overrides the pre-existing definition. | | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`trace_match` | Optional | When the parameter is set to `true`, the processor adds a field named `_grok_match_index` to the processed document. This field contains the index of the pattern within the `patterns` array that successfully matched the document. This information can be useful for debugging and understanding which pattern was applied to the document. Default is `false`. | | ||
`description` | Optional | A brief description of the processor. | | ||
`if` | Optional | A condition for running this processor. | | ||
`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | | ||
`ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | | ||
`on_failure` | Optional | A list of processors to run if the processor fails. | | ||
`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | | ||
|
||
## Using the processor | ||
|
||
Follow these steps to use the processor in a pipeline. | ||
|
||
**Step 1: Create a pipeline.** | ||
|
||
The following query creates a pipeline, named `log_line`. It extracts fields from the `message` field of the document using the specified pattern. In this case, it extracts the `clientip`, `timestamp`, and `response_status` fields: | ||
|
||
```json | ||
PUT _ingest/pipeline/log_line | ||
{ | ||
"description": "Extract fields from a log line", | ||
"processors": [ | ||
{ | ||
"grok": { | ||
"field": "message", | ||
"patterns": ["%{IPORHOST:clientip} %{HTTPDATE:timestamp} %{NUMBER:response_status:int}"] | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
**Step 2 (Optional): Test the pipeline.** | ||
|
||
{::nomarkdown}<img src="{{site.url}}{{site.baseurl}}/images/icons/alert-icon.png" class="inline-icon" alt="alert icon"/>{:/} **NOTE**<br>It is recommended that you test your pipeline before you ingest documents. | ||
{: .note} | ||
|
||
To test the pipeline, run the following query: | ||
|
||
```json | ||
POST _ingest/pipeline/log_line/_simulate | ||
{ | ||
"docs": [ | ||
{ | ||
"_source": { | ||
"message": "127.0.0.1 198.126.12 10/Oct/2000:13:55:36 -0700 200" | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
#### Response | ||
|
||
The following response confirms that the pipeline is working as expected: | ||
|
||
```json | ||
{ | ||
"docs": [ | ||
{ | ||
"doc": { | ||
"_index": "_index", | ||
"_id": "_id", | ||
"_source": { | ||
"message": "127.0.0.1 198.126.12 10/Oct/2000:13:55:36 -0700 200", | ||
"response_status": 200, | ||
"clientip": "198.126.12", | ||
"timestamp": "10/Oct/2000:13:55:36 -0700" | ||
}, | ||
"_ingest": { | ||
"timestamp": "2023-09-13T21:41:52.064540505Z" | ||
} | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
|
||
**Step 3: Ingest a document.** | ||
|
||
The following query ingests a document into an index named `testindex1`: | ||
|
||
```json | ||
PUT testindex1/_doc/1?pipeline=log_line | ||
{ | ||
"message": "127.0.0.1 198.126.12 10/Oct/2000:13:55:36 -0700 200" | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
**Step 4 (Optional): Retrieve the document.** | ||
|
||
To retrieve the document, run the following query: | ||
|
||
```json | ||
GET testindex1/_doc/1 | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Tailoring grok with custom patterns | ||
|
||
Custom grok patterns can be used in a pipeline to extract structured data from log messages that do not match the built-in grok patterns. This can be useful for parsing log messages from custom applications or for parsing log messages that have been modified in some way. Custom patterns adhere to a straightforward structure: each pattern has a unique name and the corresponding regular expression that defines its matching behavior.These custom patterns can be incorporated into the `grok` processor using the `pattern_definitons` parameter. This parameter accepts a dictionary where the keys represent the pattern names and the values represent the corresponding regular expressions. | ||
|
||
The following is an example of how to include a custom pattern in your configuration. In this example, `MY_CUSTOM_PATTERN` is defined and subsequently used in the `patterns` list, which tells grok to look for this pattern in the log message. The pattern is a regular expression that matches any sequence of alphanumeric characters and captures the matched characters into the `my_field`. | ||
|
||
```json | ||
{ | ||
"processors":[ | ||
{ | ||
"grok":{ | ||
"field":"message", | ||
"patterns":[ | ||
"%{MY_CUSTOM_PATTERN:my_field}" | ||
], | ||
"pattern_definitions":{ | ||
"MY_CUSTOM_PATTERN":"([a-zA-Z0-9]+)" | ||
} | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Tracing which patterns matched | ||
vagimeli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
To trace which patterns matched and populated the fields, you can use the `trace_match` parameter. The following is an example of how to include this parameter in your configuration: | ||
|
||
```json | ||
PUT _ingest/pipeline/log_line | ||
{ | ||
"description": "Extract fields from a log line", | ||
"processors": [ | ||
{ | ||
"grok": { | ||
"field": "message", | ||
"patterns": ["%{IPORHOST:clientip} %{HTTPDATE:timestamp} %{NUMBER:response_status:int}"], | ||
"trace_match": true | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
#### Response | ||
|
||
The following response shows the output of the same pipeline used in step 1, but with `trace_match` set to true: | ||
|
||
```json | ||
{ | ||
"docs": [ | ||
{ | ||
"doc": { | ||
"_index": "_index", | ||
"_id": "_id", | ||
"_source": { | ||
"message": "127.0.0.1 198.126.12 10/Oct/2000:13:55:36 -0700 200", | ||
"response_status": 200, | ||
"clientip": "198.126.12", | ||
"timestamp": "10/Oct/2000:13:55:36 -0700" | ||
}, | ||
"_ingest": { | ||
"_grok_match_index": "0", | ||
"timestamp": "2023-10-23T19:18:37.14624097Z" | ||
} | ||
} | ||
} | ||
] | ||
} | ||
``` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Grok Constructor available with OpenSearch? Or is this an Elastic feature only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which Grok Constructor are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted