Skip to content

Commit

Permalink
[Automatic Import] Fix Structured log flow to handle different type o…
Browse files Browse the repository at this point in the history
…f structured syslogs (elastic#212611)

## Release note
Fix structured log flow to handle multiple types of structured logs.

## Summary
The structured log flow has some issues where the KV header validation
fails for some type of logs. This PR fixes the flow to match variety of
structured syslog messages.

A variety of logs are tested.

```
[2025-01-03T07:48:58.989821Z] [DEBUG] AuthService - EventID=361a5289eaf8e42b4c195b9b | Message="Session expired" | UserID=2882 | Duration=376ms
[2025-01-29T17:34:18.989830Z] [ERROR] InventoryService - EventID=acbb20d3c955edf718e691d9 | Message="Item restocked" | UserID=9656 | Duration=421ms
[2025-01-11T21:51:54.989839Z] [ERROR] APIGateway - EventID=9c273d43b946020d5fdbe36c | Message="Response sent" | UserID=1468 | Duration=409ms
[2025-01-20T08:40:22.989848Z] [WARN] PaymentService - EventID=ae8c1425079119b848fa451cb7a | Message="3D Secure required" | UserID=9353 | Duration=270ms
```

```
2021-10-22 22:11:32,131 DEBUG [org.keycloak.events] (default task-3) type=CODE_TO_TOKEN, realmId=test, clientId=security-admin-console, userId=ce637d23--4fca-9088-1aea1d053e19, ipAddress=10.1.2.1, token_id=561459c0-75f1-46d4-986d, grant_type=authorization_code, refresh_token_type=Refresh, scope=openid, refresh_token_id=07434488-ca99-412a-c2e47c93d6d1, code_id=bae6e56e-368f-4809-48cfb6279f5e, client_auth_method=client-secret
2021-10-22 22:12:09,871 DEBUG [org.keycloak.events] (default task-3) operationType=CREATE, realmId=test, clientId=7bcaf1cb-820a-40f1-75ced03ef03b, userId=ce637d23-b89c-4fca-1aea1d053e19, ipAddress=10.1.2.6, resourceType=USER, resourcePath=users/07972d16-b173-803d-90f211080f40
```

```
[18/Feb/2025:22:39:18 +0000] CONNECT conn=730729 from=10.2.2.9:56518 to=10.2.1.14:4389 protocol=LDAP
[18/Feb/2025:22:39:16 +0000] CONNECT conn=207223 from=10.2.1.24:55730 to=10.1.3.7:4389 protocol=LDAP
```

```
<134>1 1647479580.487048774 MX84_2 airmarshal_events type=rogue_ssid_detected ssid='' bssid='AA:17:C8:D8:51' src='AA:17:C8:D8:51' dst='FF:FF:FF:FF:FF' wired_mac='AC:17:C7:D8:51' vlan_id='0' channel='6' rssi='35' fc_type='0' fc_subtype='8'
<134>1 1647479604.334549372 MX84_5 airmarshal_events type=rogue_ssid_detected ssid='' bssid='92:17:C7:D8:51' src='92:17:C8:D8:51' dst='6A:3A:3E:85:F6' wired_mac='AC:17:C7:D8:51' vlan_id='0' channel='6' rssi='23' fc_type='0' fc_subtype='5'
```

### Checklist
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
  • Loading branch information
bhapas authored Feb 27, 2025
1 parent 92867c6 commit f579f2d
Show file tree
Hide file tree
Showing 6 changed files with 54 additions and 14 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,26 @@ export const KV_EXAMPLE_ANSWER = {
ignore_missing: true,
};

export const KV_HEADER_EXAMPLE_LOGS = [
{
example:
'[18/Feb/2025:22:39:16 +0000] CONNECT conn=20597223 from=10.1.1.1:1234 to=10.2.3.4:4389 protocol=LDAP',
header: '[18/Feb/2025:22:39:16 +0000] CONNECT',
structuredBody: 'conn=20597223 from=10.1.1.1:1234 to=10.2.3.4:4389 protocol=LDAP',
grok_pattern:
'[%{HTTPDATE:`{packageName}.{dataStreamName}.`timestamp}] %{WORD:`{packageName}.{dataStreamName}`action}s%{GREEDYDATA:message}',
},
{
example:
'2021-10-22 22:12:09,871 DEBUG [org.keycloak.events] (default task-3) operationType=CREATE, realmId=test, clientId=abcdefgh userId=sdfsf-b89c-4fca-9088-sdfsfsf, ipAddress=10.1.1.1, resourceType=USER, resourcePath=users/07972d16-b173-4c99-803d-90f211080f40',
header: '2021-10-22 22:12:09,871 DEBUG [org.keycloak.events] (default task-3)',
structuredBody:
'operationType=CREATE, realmId=test, clientId=7bcaf1cb-820a-40f1-91dd-75ced03ef03b, userId=ce637d23-b89c-4fca-9088-1aea1d053e19, ipAddress=10.1.1.1, resourceType=USER, resourcePath=users/07972d16-b173-4c99-803d-90f211080f40',
grok_pattern:
'%{TIMESTAMP_ISO8601:`{packageName}.{dataStreamName}.`timestamp} %{LOGLEVEL:`{packageName}.{dataStreamName}`loglevel} [%{DATA:`{packageName}.{dataStreamName}`logsource}] (%{DATA:`{packageName}.{dataStreamName}`task})s%{GREEDYDATA:message}',
},
];

export const KV_HEADER_EXAMPLE_ANSWER = {
rfc: 'RFC2454',
regex:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import { JsonOutputParser } from '@langchain/core/output_parsers';
import type { KVState } from '../../types';
import type { HandleKVNodeParams } from './types';
import { KV_HEADER_PROMPT } from './prompts';
import { KV_HEADER_EXAMPLE_ANSWER } from './constants';
import { KV_HEADER_EXAMPLE_ANSWER, KV_HEADER_EXAMPLE_LOGS } from './constants';

export async function handleHeader({
state,
Expand All @@ -23,6 +23,7 @@ export async function handleHeader({
samples: state.logSamples,
packageName: state.packageName,
dataStreamName: state.dataStreamName,
example_logs: KV_HEADER_EXAMPLE_LOGS,
ex_answer: JSON.stringify(KV_HEADER_EXAMPLE_ANSWER, null, 2),
});

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,8 @@ describe('Testing kv header', () => {
field: 'message',
field_split: '',
target_field: 'testPackage.testDatastream',
trim_key: '',
trim_value: '',
trim_key: null,
trim_value: null,
value_split: '',
},
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ export const KV_MAIN_PROMPT = ChatPromptTemplate.fromMessages([
4. The \`value_split\` is the delimeter regex pattern to use for splitting the key from the value within a key-value pair (e.g., ':' or '=' )
5. The \`field_split\` is the regex pattern to use for splitting key-value pairs in the log. Make sure the regex pattern breaks the log into key-value pairs.
6. Ensure that the KV processor can handle different scenarios, such as: Optional or missing fields in the logs , Varying delimiters between keys and values (e.g., = or :), Complex log structures (e.g., nested key-value pairs or key-value pairs within strings, whitespaces , urls, ipv4 , ipv6 address, mac address etc.,).
7. Use \`trim_key\` for string of characters to trim from extracted keys.
8. Use \`trim_value\` for string of characters to trim from extracted values.
7. Use \`trim_key\` for string of characters to trim from extracted keys. Make sure to escape single quotes like \`\\'\`.
8. Use \`trim_value\` for string of characters to trim from extracted values. Make sure to escape single quotes like \`\\'\`.
You ALWAYS follow these guidelines when writing your response:
<guidelines>
Expand Down Expand Up @@ -68,23 +68,34 @@ export const KV_HEADER_PROMPT = ChatPromptTemplate.fromMessages([
],
[
'human',
`Looking at the multiple syslog samples provided in the context, your task is to separate the "header" and the "message body" from this log. Our goal is to identify which RFC they belong to. Then create a regex pattern that can separate the header and the structured body.
You then have to create a grok pattern using the regex pattern.
You are given a log entry in a structured format.
`Here are a series of syslog samples in a structured log format, and your task is to create a regex and a grok pattern that will correctly parse only the header part of these logs. The pattern should be critical about the following points:
Follow these steps to identify the header pattern:
1. Identify if the log samples fall under RFC5424 or RFC3164. If not, return 'Custom Format'.
2. The log samples contain the header and structured body. The header may contain any or all of priority, timestamp, loglevel, hostname, ipAddress, messageId or any free-form text or non key-value information etc.,
3. Make sure the regex and grok pattern matches all the header information. Only the structured message body should be under GREEDYDATA in grok pattern.
2. If the log samples fall under RFC3164 or RFC5424 then parse the header and structured body according to the RFC definition.
3. If the log sampels are in custom format , pay special attention to the special characters like brackets , colons or any punctuation marks in the syslog header, and ensure they are properly escaped.
4. The log samples contain the header and structured body. The header may contain any or all of priority, timestamp, loglevel, hostname, ipAddress, messageId or any free-form text or non key-value information etc.,
You ALWAYS follow these guidelines when writing your response:
You ALWAYS follow these guidelines when writing your response:
<guidelines>
- Do not parse the message part in the regex. Just the header part should be in regex and grok_pattern.
- Timestamp Handling: Pay close attention to the timestamp format, ensuring that it is handled correctly with respect to any variations in date or time formatting. The timestamp should be extracted accurately, and make sure the pattern accounts for any variations in timezone representation, like time zone offsets or 'UTC' markers.
Also look for special characters around the timestamp in Custom Format, Like a timestamp enclosed in [] or <> or (). Match these characters in the grok pattern with appropriate excaping.
- Special Characters: Ensure that all special characters, like brackets, colons, or any punctuation marks in the syslog header, are properly escaped. Be particularly cautious with characters that could interfere with the regex engine, such as periods (.), asterisks (*), or square brackets ([]), and ensure they are treated correctly in the pattern.
- Strict Parsing of the Header: The regex and grok pattern should strictly focus on parsing only the header part of the syslog sample. Do not include any logic for parsing the structured message body. The message body should be captured using the GREEDYDATA field in the grok pattern, and any non-header content should be left out of the main pattern.
- Pattern Efficiency: Ensure that both the regex and the grok pattern are as efficient as possible while still accurately capturing the header components. Avoid overly complex or overly broad patterns that could capture unintended data.
- Make sure to map the remaining message body to \'message\' in grok pattern.
- If there are special characters between header and message body like space character, make sure to include that character in the header grok pattern
- Make sure to add \`{packageName}.{dataStreamName}\` as a prefix to each field in the pattern. Refer to example response.
- Do not respond with anything except the processor as a JSON object enclosed with 3 backticks (\`), see example response above. Use strict JSON response format.
</guidelines>
Some of the example samples look like this:
<example_logs>
\`\`\`json
{example_logs}
\`\`\`
</example_logs>
You are required to provide the output in the following example response format:
<example_response>
Expand Down Expand Up @@ -120,6 +131,7 @@ Follow these steps to fix the errors in the header pattern:
2. The log samples contain the header and structured body. The header may contain any or all of priority, timestamp, loglevel, hostname, ipAddress, messageId or any free-form text or non key-value information etc.,
3. The message body may start with a description, followed by structured key-value pairs.
4. Make sure the regex and grok pattern matches all the header information. Only the structured message body should be under GREEDYDATA in grok pattern.
You ALWAYS follow these guidelines when writing your response:
<guidelines>
- Do not parse the message part in the regex. Just the header part should be in regex and grok_pattern.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
field: message
field_split: '{{ kvInput.field_split }}'
value_split: '{{ kvInput.value_split }}'
trim_key: '{{ kvInput.trim_key }}'
trim_value: '{{ kvInput.trim_value }}'
trim_key: {{ kvInput.trim_key }}
trim_value: {{ kvInput.trim_value }}
target_field: '{{ packageName }}.{{ dataStreamName }}'
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,13 @@ export function createKVProcessor(kvInput: KVProcessor, state: KVState): ESProce
autoescape: false,
});
const template = env.getTemplate('kv.yml.njk');
if (kvInput.trim_key) {
kvInput.trim_key = kvInput.trim_key.replace(/['"]/g, '\\$&');
}

if (kvInput.trim_value) {
kvInput.trim_value = kvInput.trim_value.replace(/['"]/g, '\\$&');
}
const renderedTemplate = template.render({
kvInput,
packageName: state.packageName,
Expand Down

0 comments on commit f579f2d

Please sign in to comment.