Skip to content

Commit

Permalink
Add json_array_parser parser and assign_keys transformer (#30644)
Browse files Browse the repository at this point in the history
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
We are using otel-collector as an infrastructure and receive many types
of data from a client. The client's sent data is always a form of json
and in one use case the json is a simple headerless jarray and so we
need a way to parse it and match headers to each field (something
similar to what csv_parser does but also supports types supported in a
json format and nested objects)

**Link to tracking Issue:** <Issue number if applicable>

#30321

**Testing:** <Describe what testing was performed and which tests were
added.>
* unittests
All the tests found in csv_parser were copied and adjusted adding test
scenarios for different types (numbers, booleans, null) as well as a
test for parsing a nested object as a part of the jarray
* End to end tests
Used generated traffic on a running otel collector thats using the
parser and verified the data is as expected in the end table

**Documentation:** <Describe the documentation added.>
*
[json_array_parser.md](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/beacea489ff4ae61c0bac4f477c04748944c9054/pkg/stanza/docs/operators/json_array_parser.md)
*
[assign_keys.md](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/beacea489ff4ae61c0bac4f477c04748944c9054/pkg/stanza/docs/operators/assign_keys.md)

---------

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
  • Loading branch information
RoeiDimi and djaglowski authored Jan 23, 2024
1 parent c89b81a commit aee7b70
Show file tree
Hide file tree
Showing 55 changed files with 1,229 additions and 0 deletions.
18 changes: 18 additions & 0 deletions .chloggen/add_jarray_parser_and_assign_keys_transformer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: pkg/stanza

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add a json array parser operator and an assign keys transformer.

# One or more tracking issues related to the change
issues: [30321]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
Json array parser opreator can be used to parse a json array string input into a list of objects. |
Assign keys transformer can be used to assigns keys from the configuration to an input list
1 change: 1 addition & 0 deletions cmd/configschema/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,7 @@ require (
github.com/tinylib/msgp v1.1.9 // indirect
github.com/tklauser/go-sysconf v0.3.12 // indirect
github.com/tklauser/numcpus v0.6.1 // indirect
github.com/valyala/fastjson v1.6.4 // indirect
github.com/vincent-petithory/dataurl v1.0.0 // indirect
github.com/vishvananda/netlink v1.2.1-beta.2 // indirect
github.com/vishvananda/netns v0.0.0-20210104183010-2eb08e3e575f // indirect
Expand Down
2 changes: 2 additions & 0 deletions cmd/configschema/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions cmd/otelcontribcol/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -635,6 +635,7 @@ require (
github.com/tinylib/msgp v1.1.9 // indirect
github.com/tklauser/go-sysconf v0.3.12 // indirect
github.com/tklauser/numcpus v0.6.1 // indirect
github.com/valyala/fastjson v1.6.4 // indirect
github.com/vincent-petithory/dataurl v1.0.0 // indirect
github.com/vishvananda/netlink v1.1.1-0.20201029203352-d40f9887b852 // indirect
github.com/vishvananda/netns v0.0.0-20200728191858-db3c7e526aae // indirect
Expand Down
2 changes: 2 additions & 0 deletions cmd/otelcontribcol/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions cmd/oteltestbedcol/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,7 @@ require (
github.com/tinylib/msgp v1.1.9 // indirect
github.com/tklauser/go-sysconf v0.3.12 // indirect
github.com/tklauser/numcpus v0.6.1 // indirect
github.com/valyala/fastjson v1.6.4 // indirect
github.com/vultr/govultr/v2 v2.17.2 // indirect
github.com/yusufpapurcu/wmi v1.2.3 // indirect
go.etcd.io/bbolt v1.3.8 // indirect
Expand Down
2 changes: 2 additions & 0 deletions cmd/oteltestbedcol/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions exporter/datadogexporter/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ require (
github.com/tinylib/msgp v1.1.9 // indirect
github.com/tklauser/go-sysconf v0.3.12 // indirect
github.com/tklauser/numcpus v0.6.1 // indirect
github.com/valyala/fastjson v1.6.4 // indirect
github.com/vultr/govultr/v2 v2.17.2 // indirect
github.com/yusufpapurcu/wmi v1.2.3 // indirect
github.com/zorkian/go-datadog-api v2.30.0+incompatible // indirect
Expand Down
2 changes: 2 additions & 0 deletions exporter/datadogexporter/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions exporter/datadogexporter/integrationtest/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions extension/healthcheckextension/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ require (
github.com/spf13/pflag v1.0.5 // indirect
github.com/tklauser/go-sysconf v0.3.12 // indirect
github.com/tklauser/numcpus v0.6.1 // indirect
github.com/valyala/fastjson v1.6.4 // indirect
github.com/yusufpapurcu/wmi v1.2.3 // indirect
go.opentelemetry.io/collector v0.92.1-0.20240118172122-8131d31601b8 // indirect
go.opentelemetry.io/collector/config/configauth v0.92.1-0.20240118172122-8131d31601b8 // indirect
Expand Down
2 changes: 2 additions & 0 deletions extension/healthcheckextension/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -605,6 +605,7 @@ require (
github.com/tinylib/msgp v1.1.9 // indirect
github.com/tklauser/go-sysconf v0.3.12 // indirect
github.com/tklauser/numcpus v0.6.1 // indirect
github.com/valyala/fastjson v1.6.4 // indirect
github.com/vincent-petithory/dataurl v1.0.0 // indirect
github.com/vishvananda/netlink v1.2.1-beta.2 // indirect
github.com/vishvananda/netns v0.0.0-20210104183010-2eb08e3e575f // indirect
Expand Down
2 changes: 2 additions & 0 deletions go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions pkg/stanza/adapter/register.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/output/stdout"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/csv"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/json"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/jsonarray"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/keyvalue"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/regex"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/scope"
Expand All @@ -17,6 +18,7 @@ import (
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/trace"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/uri"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/add"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/assignkeys"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/copy"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/filter"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/flatten"
Expand Down
2 changes: 2 additions & 0 deletions pkg/stanza/docs/operators/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Inputs:
Parsers:
- [csv_parser](./csv_parser.md)
- [json_parser](./json_parser.md)
- [json_array_parser](./json_array_parser.md)
- [regex_parser](./regex_parser.md)
- [scope_name_parser](./scope_name_parser.md)
- [syslog_parser](./syslog_parser.md)
Expand All @@ -43,3 +44,4 @@ General purpose:
- [retain](./retain.md)
- [router](./router.md)
- [unquote](./unquote.md)
- [assign_keys](./assign_keys.md)
99 changes: 99 additions & 0 deletions pkg/stanza/docs/operators/assign_keys.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
## `assign_keys` operator

The `assign_keys` assigns keys from the configuration to an input list. the output is a map containing these key-value pairs

### Configuration Fields

| Field | Default | Description |
| --- | --- | --- |
| `id` | `assign_keys` | A unique identifier for the operator. |
| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. |
| `field` | required | The [field](../types/field.md) to assign keys to. |
| `keys` | required | The list of strings to be used as the keys to the input list's values. Its length is expected to be equal to the length of the values list from field. In case there is a mismatch, an error will result. |
| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](../types/on_error.md). |
| `if` | | An [expression](../types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. |

### Example Configurations:

<hr>
Assign keys to a list in body
<br>
<br>

```yaml
- type: assign_keys
field: body
keys: ["foo", "bar", "charlie", "foxtrot"]
```
<table>
<tr><td> Input Entry </td> <td> Output Entry </td></tr>
<tr>
<td>
```json
{
"body": [1, "debug", "Debug Message", true]
}
```

</td>
<td>

```json
{
"body": {
"foo": 1,
"bar": "debug",
"charlie": "Debug Message",
"foxtrot": true,
}
}
```

</td>
</tr>
</table>
<hr>
Assign keys to a list in an attributes field
<br>
<br>

```yaml
- type: assign_keys
field: attributes.input
keys: ["foo", "bar"]
```
<table>
<tr><td> Input Entry </td> <td> Output Entry </td></tr>
<tr>
<td>
```json
{
"attributes": {
"input": [1, "debug"],
"field2": "unchanged",
}
}
```

</td>
<td>

```json
{
"attributes": {
"input": {
"foo": 1,
"bar": "debug",
},
"field2": "unchanged",
}
}
```

</td>
</tr>
</table>
129 changes: 129 additions & 0 deletions pkg/stanza/docs/operators/json_array_parser.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
## `json_array_parser` operator

The `json_array_parser` operator parses the string-type field selected by `parse_from` assumed to be of a json array format into a list.
A JArray string (or a json array string) is a string that represents a JSON array. A JSON array is a type of data structure that is used to store data in a structured way. It consists of an ordered list of values that can be either strings, numbers, objects, or even other arrays.
#### Examples:
a simple json array string with strictly strings in it:
```
"[\"Foo\", \"Bar\", \"Charlie\"]"
```

json array after parsing:
```json
["Foo", "Bar", "Charlie"]
```

a more complex json array string with different types in it without nested objects:
```
"[\"Hello\", 42, true, null]"
```

json array after parsing:
```json
["Hello", 42, true, null]
```

a more complex json array string with different types in it with nested objects:
```
"[\"Hello\", 42, {\"name\": \"Alice\", \"age\": 25}, [1, 2, 3], true, null]"
```

json array after parsing:
```json
["Hello", 42, {"name": "Alice", "age": 25}, [1, 2, 3], true, null]
```

Notice that for this example, the current parser will parse every nested object as a string and so the result is actually this -
```json
["Hello", 42, "{\"name\": \"Alice\", \"age\": 25}", "[1, 2, 3]", true, null]
```

More information on json arrays can be found [here](https://json-schema.org/understanding-json-schema/reference/array)


### Configuration Fields

| Field | Default | Description |
|--------------------|------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
| `id` | `json_array_parser` | A unique identifier for the operator. |
| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. |
| `parse_from` | `body` | The [field](../types/field.md) from which the value will be parsed. |
| `parse_to` | required. can be one of `body` or a nested field inside `body`, `attributes` or `resource` (ie `attributes.parsed`) | The [field](../types/field.md) to which the value will be parsed. |
| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](../types/on_error.md). |
| `timestamp` | `nil` | An optional [timestamp](../types/timestamp.md) block which will parse a timestamp field before passing the entry to the output operator. |
| `severity` | `nil` | An optional [severity](../types/severity.md) block which will parse a severity field before passing the entry to the output operator. |

### Embedded Operations

The `json_array_parser` can be configured to embed certain operations such as timestamp and severity parsing. For more information, see [complex parsers](../types/parsers.md#complex-parsers).

### Example Configurations

#### Parse the field `body` with a json array parser into an attributes field

Configuration:

```yaml
- type: json_array_parser
parse_from: body
parse_to: attributes.output
```
<table>
<tr><td> Input Entry </td> <td> Output Entry </td></tr>
<tr>
<td>
```json
{
"body": "[1,\"debug\",\"Debug Message\", true]"
}
```

</td>
<td>

```json
{
"attributes": {
"output": [1, "debug", "Debug Message", true]
}
}
```

</td>
</tr>
</table>

#### Parse the field `body` with a json array parser

Configuration:

```yaml
- type: json_array_parser
parse_to: body
```
<table>
<tr><td> Input Entry </td> <td> Output Entry </td></tr>
<tr>
<td>
```json
{
"body": "[1,\"debug\",\"Debug Message\", true]"
}
```

</td>
<td>

```json
{
"body": [1, "debug", "Debug Message", true]
}
```

</td>
</tr>
</table>
1 change: 1 addition & 0 deletions pkg/stanza/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ require (
github.com/open-telemetry/opentelemetry-collector-contrib/extension/storage v0.92.0
github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal v0.92.0
github.com/stretchr/testify v1.8.4
github.com/valyala/fastjson v1.6.4
go.opentelemetry.io/collector/component v0.92.1-0.20240118172122-8131d31601b8
go.opentelemetry.io/collector/config/configtls v0.92.1-0.20240118172122-8131d31601b8
go.opentelemetry.io/collector/confmap v0.92.1-0.20240118172122-8131d31601b8
Expand Down
Loading

0 comments on commit aee7b70

Please sign in to comment.