Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement setting to keep null values in events #13928

Merged
merged 22 commits into from
Oct 15, 2019
Merged

Implement setting to keep null values in events #13928

merged 22 commits into from
Oct 15, 2019

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Oct 4, 2019

Resolves #5522.

This PR proposes to add a new setting in Beats' configuration, keep_null, that will allow published events to contain null values in them. The default value of this setting will be false, which preserves the current behavior of Beats (i.e. prior to this setting being introduced).

Depending on the Beat, this setting can be specified at different places in the configuration.

Filebeat

Specify the setting under an input's configuration. For example:

filebeat.inputs:
- type: stdin
  enabled: true
  keep_null: true
  json.keys_under_root: false

With the above configuration, input text such as this:

{"test-null": null, "@timestamp":"2016-11-01T14:39:36.322", "type": 42,  "level":"INFO", "message":"test"}

Results in the following output event:

{"@timestamp":"2019-10-04T19:10:44.041Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.0.0"},"input":{"type":"stdin"},"ecs":{"version":"1.1.0"},"host":{"name":"Shaunaks-MacBook-Pro-Work.local"},"agent":{"hostname":"Shaunaks-MacBook-Pro-Work.local","id":"8ae6a937-ade6-482c-8f47-59e7dff17a44","version":"8.0.0","type":"filebeat","ephemeral_id":"def7a61b-ea7e-4825-97ba-87fb1257f3b7"},"log":{"offset":0,"file":{"path":""}},"json":{"test-null":null,"@timestamp":"2016-11-01T14:39:36.322","type":42,"level":"INFO","message":"test"}}

Notice, in particular, that the output event contains the json.test-null field whose value is null.

The setting can also be specified under Filebeat modules' input configuration.

Metricbeat

Specify the setting under a module's configuration. For example:

- module: http
  metricsets:
    - server
  host: "localhost"
  port: "8080"
  enabled: true
  keep_null: true

With the above configuration, the following HTTP request:

curl -H 'Content-Type: application/json' 'http://localhost:8080' -d '{ "foo": 17, "bar": null }'

Results in the following output event:

{"@timestamp":"2019-10-08T10:51:35.640Z","@metadata":{"beat":"metricbeat","type":"_doc","version":"8.0.0"},"host":{"hostname":"Shaunaks-MacBook-Pro-Work.local","name":"Shaunaks-MacBook-Pro-Work.local","architecture":"x86_64","os":{"kernel":"18.7.0","build":"18G103","platform":"darwin","version":"10.14.6","family":"darwin","name":"Mac OS X"},"id":"EF6274EA-462F-5316-A14A-850E7BFD8126"},"agent":{"id":"0b7c7d8b-d4b0-4d37-8704-a97231e6b460","version":"8.0.0","type":"metricbeat","ephemeral_id":"307c695c-ff2f-4ff1-812b-4e2b1eb56279","hostname":"Shaunaks-MacBook-Pro-Work.local"},"service":{"type":"http"},"http":{"server":{"foo":17,"bar":null}},"event":{"dataset":"http.server","module":"http"},"metricset":{"name":"server"},"ecs":{"version":"1.1.0"}}

Notice, in particular, that the output event contains the http.server.bar field whose value is null.

Auditbeat

Specify the setting under a module's configuration. For example:

- module: system
  datasets:
    - process
    - socket
  period: 1s
  keep_null: true

Packetbeat

Specify the setting under a protocol's configuration or the flows configuration. For example:

packetbeat.flows:
  timeout: 30s
  period: 10s
  keep_null: true
- type: http
  ports: [80, 8080, 8000, 5000, 8002]
  keep_null: true

Winlogbeat

Specify the setting under an event log's configuration. For example:

winlogbeat.event_logs:
  - name: Application
    keep_null: true

Heartbeat

Specify the setting under a monitor's configuration. For example:

heartbeat.monitors:
- type: http
  schedule: '@every 5s'
  urls: ["https://myhost:443"]
  keep_null: true

Functionbeat

Specify the setting under a function's configuration. For example:

functionbeat.provider.aws.endpoint: "s3.amazonaws.com"
functionbeat.provider.aws.deploy_bucket: "functionbeat-deploy"
functionbeat.provider.aws.functions:
  - name: cloudwatch
    enabled: true
    type: cloudwatch_logs
    description: "lambda function for cloudwatch logs"
    triggers:
      - log_group_name: /aws/lambda/my-lambda-function
    keep_null: true

@ycombinator ycombinator added enhancement review libbeat needs_backport PR is waiting to be backported to other branches. v8.0.0 v7.5.0 labels Oct 4, 2019
@ycombinator ycombinator requested review from ph and urso October 4, 2019 19:13
@ycombinator
Copy link
Contributor Author

Reviewers:

  1. I'm not married to the name of this setting, emit_null_values. Please feel free to suggest alternatives 😄.

  2. I would love some guidance on the right configuration level/scope for this setting. As of now it exists at the top-most/global level in a Beat's configuration. Is that what we want? Should be per-output? Per-input? Some kind of hierarchy with the setting's value at deeper levels overriding it's value from higher levels? Something else?

@urso
Copy link

urso commented Oct 7, 2019

  1. I'm in favor for shorter but descriptive names like keep_null.

  2. I'm no fan of global settings. Also in context of fleet every global setting is a pain. I'd say it is enough to have the setting per input only (keep change simple).

@ycombinator
Copy link
Contributor Author

I'm no fan of global settings. Also in context of fleet every global setting is a pain. I'd say it is enough to have the setting per input only (keep change simple).

@urso If we make the setting per input then that implies it will only be available in Filebeat. However, other Beats could also generate events with null values, right?

@urso
Copy link

urso commented Oct 7, 2019

If we make the setting per input then that implies it will only be available in Filebeat. However, other Beats could also generate events with null values, right?

You would need to add these setting to metricbeat/heartbeat module setup as well. We already do this for 'processors', 'fields', 'tags' settings. Each beat passes these setting along when calling ConnectWith. ConnectWith then merges global and local settings into a list of event processors.

@ycombinator ycombinator added in progress Pull request is currently in progress. and removed review labels Oct 8, 2019
@ycombinator ycombinator changed the title Implement setting to allow emitting of null values in events Implement setting to keep null values in events Oct 8, 2019
@ycombinator ycombinator requested a review from a team as a code owner October 8, 2019 11:52
@ycombinator ycombinator requested review from a team as code owners October 8, 2019 20:09
===== `keep_null`

If this option is set to true, fields with `null` values will be published in
the output document. By default, `keep_null` is set to `false`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dedemorton, I was wondering if you could review this doc snippet before I copy it to other Beats' docs. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ycombinator
Copy link
Contributor Author

@urso This PR is ready for review again. It has become rather large in size, primarily due to changes to the reference yml and docs files. If you want I can pull those changes out into a separate follow-up PR to make it easier to review this one. Just let me know!

@@ -3,3 +3,6 @@
enabled: true
period: 10s
hosts: ["localhost:3000"]

# Set to true to publish fields with null values in events.
#keep_null: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to add this to the reference config of all modules, we don't do it with other common options like fields, service.name in metricbeat or pipeline in filebeat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in de6bb5d724212d844919b514480ab5cc2d732c32.

x-pack/filebeat/input/config.go Outdated Show resolved Hide resolved
@@ -69,6 +69,7 @@ func NewInput(
out, err := connector.ConnectWith(cfg, beat.ClientConfig{
Processing: beat.ProcessingConfig{
DynamicFields: inputContext.DynamicFields,
KeepNull: conf.KeepNull,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeepNull will be set be automatically set by the pipelineConnector in filebeat/channel/connector.go. This makes me wonder if we need to set Processing at all.

Copy link
Contributor Author

@ycombinator ycombinator Oct 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder if we need to set Processing at all.

I do think we need to set Processing here because of the DynamicFields config option. The pipelineConnector doesn't set it in its ConnectWith method.

makes sense.

@ycombinator
Copy link
Contributor Author

@urso This is ready for another round of review, when you get a chance. Thanks!

@ycombinator
Copy link
Contributor Author

jenkins, test this

@ycombinator ycombinator merged commit ec7a9a2 into elastic:master Oct 15, 2019
@ycombinator ycombinator added review and removed in progress Pull request is currently in progress. needs_backport PR is waiting to be backported to other branches. labels Oct 16, 2019
@ycombinator ycombinator deleted the libbeat-emit-nulls-in-event-option branch December 25, 2019 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

null value fields are dropped from JSON decoding
4 participants