Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decode: several fixes #98

Conversation

yaauie
Copy link
Contributor

@yaauie yaauie commented Aug 1, 2022

This changeset includes three fixes related to decoding:

  1. correctly handle escaped newlines and carriage returns in extension values
  2. avoid data-loss and corruption by identifying and intercepting invalid payloads
  3. avoid data-loss by ensuring a buffered codec flushes its internal buffer

CR/LF in Extension Values

Per CEF spec v25 (2017-09-28):

Multi-line fields can be sent by CEF by encoding the newline character
as \n or \r. Note that multiple lines are only allowed in the value part
of the extensions

While this plugin has long encoded multiline extension values and other
escape sequences, our decode has only supported escaped backslashes or
escaped equals signs, and with this change becomes compliant with this portion
of the spec. Note that due to the preexisting newline-centric normalization,
round-trip encode/decode cycle is only semantically guaranteed.

Intercepting Invalid Payloads

When encountering malformed-CEF or non-CEF payloads, this plugin now emits
helpful descriptive log messages, and prevents data-loss and corruption by
emitting an event tagged with _cefparsefailure containing the bytes it
received.

This set of changes catches 3 distinct cases of malformed payloads.

  • missing one or more of the 7 required CEF header fields; a payload that
    does not have all 7 unescaped-pipe-terminated header fields cannot be
    reliably interpreted as CEF (prevents corruption).
  • containing something OTHER than a sequence of key=value pairs in the
    extensions space (prevent data-loss; previously when extensions were
    invalid they were silently omitted)
  • containing unescaped newlines (prevents corruption; previously data after
    the first newline was injected into the currently-parsed extension field)

In catching these classes of malformed inputs, this changeset also
resolves #99 in which our failure
to detect a malformed input proactively caused an unhelpful NoMethodError
message to be logged before a _cefparsefailure-tagged event was emitted.

Buffer Flushing

When configured with delimiter, this plugin creates an internal buffer.
When that buffer is present, this codec needs to implement CEF#flush so that the buffered bytes can be consumed when this codec is being flushed/closed.

Per CEF spec v25 (2017-09-28):

> *Multi-line* fields can be sent by `CEF` by encoding the newline character
> as `\n` or `\r`. Note that multiple lines are only allowed in the value part
> of the extensions

While this plugin has long _encoded_ multiline extension values and other
escape sequences, our _decode_ has only supported escaped backslashes or
escaped equals signs, and with this change becomes compliant with this portion
of the spec. Note that due to the preexisting newline-centric normalization,
round-trip encode/decode cycle is only _semantically_ guaranteed.
@yaauie yaauie force-pushed the handle-embedded-newlines-and-carriage-returns branch from 339e6cd to c4555ee Compare August 1, 2022 19:47
@yaauie yaauie force-pushed the handle-embedded-newlines-and-carriage-returns branch from ff9bbf2 to 80d0a07 Compare October 24, 2022 19:56
When encountering malformed-CEF or non-CEF payloads, this plugin now emits
helpful descriptive log messages, and prevents data-loss and corruption by
emitting an event tagged with `_cefparsefailure` containing the bytes it
received.

This set of changes catches 3 distinct cases of malformed payloads.

  - missing one or more of the 7 required CEF header fields; a payload that
    does not have all 7 unescaped-pipe-terminated header fields cannot be
    reliably interpreted as CEF (prevents corruption).
  - containing something OTHER than a sequence of key=value pairs in the
    extensions space (prevent data-loss; previously when extensions were
    invalid they were silently omitted)
  - containing unescaped newlines (prevents corruption; previously data after
    the first newline was injected into the currently-parsed extension field)

In catching these classes of malformed inputs, this changeset also
resolves logstash-plugins#99 in which our failure
to detect a malformed input proactively caused an unhelpful `NoMethodError`
message to be logged before a `_cefparsefailure`-tagged event was emitted.
@yaauie yaauie force-pushed the handle-embedded-newlines-and-carriage-returns branch from 80d0a07 to 2f0c75e Compare October 25, 2022 21:23
@yaauie yaauie changed the title decode: multiline support in extension values decode: several fixes Oct 25, 2022
Copy link

@mashhurs mashhurs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yaauie yaauie merged commit ba56cce into logstash-plugins:main Oct 26, 2022
@yaauie yaauie deleted the handle-embedded-newlines-and-carriage-returns branch October 26, 2022 01:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid to raise exception when the version header field doesn't exist
3 participants