Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation of filestream input with the new improvements #25303

Merged
merged 11 commits into from
May 4, 2021
62 changes: 60 additions & 2 deletions filebeat/docs/inputs/input-filestream-file-options.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,30 @@ a `gz` extension:

See <<regexp-support>> for a list of supported regexp patterns.

===== `prospector.scanner.include_files`

A list of regular expressions to match the files that you want {beatname_uc} to
include. If a list of regexes is provided, only the files that are allowed by
the patterns are harvested.

By default no files are excluded. This option is the counterpart of
`prospector.scanner.exclude_files`.

The following example configures {beatname_uc} to exclude files that
are not under `/var/log`:

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: {type}
...
prospector.scanner.include_files: ['^/var/log/.*']
----

NOTE: Patterns should start with `^` in case of absolute paths.

See <<regexp-support>> for a list of supported regexp patterns.

===== `prospector.scanner.symlinks`

The `symlinks` option allows {beatname_uc} to harvest symlinks in addition to
Expand All @@ -57,6 +81,12 @@ This is, for example, the case for Kubernetes log files.

Because this option may lead to data loss, it is disabled by default.

===== `prospector.scanner.resend_on_touch`

If this option is enabled a file is resent if its size has not changed
but its modification time has changed to a later time than before.
It is disabled by default to avoid accidentally resending files.


[float]
[id="{beatname_lc}-input-{type}-scan-frequency"]
Expand Down Expand Up @@ -117,6 +147,35 @@ If a file that's currently being harvested falls under `ignore_older`, the
harvester will first finish reading the file and close it after
`close.on_state_change.inactive` is reached. Then, after that, the file will be ignored.

[float]
[id="{beatname_lc}-input-{type}-ignore-inactive"]
===== `ignore_inactive`

If this option is enabled, {beatname_uc} ignores every file that has not been
updated since the selected time. Possible options are `since_first_start` and
`since_last_start`. The first option ignores every file that has not been updated since
the first start of {beatname_uc}. It is useful when the Beat might be restarted
due to configuration changes or a failure. The second option tells
the Beat to read from files that have been updated since its start.

The files affected by this setting fall into two categories:

* Files that were never harvested
* Files that were harvested but weren't updated since `ignore_inactive`.

For files that were never seen before, the offset state is set to the end of
the file. If a state already exist, the offset is not changed. In case a file is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the file. If a state already exist, the offset is not changed. In case a file is
the file. If a state already exists, the offset is not changed. If a file is

updated again later, reading continues at the set offset position.

The setting relies on the modification time of the file to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Suggested change
The setting relies on the modification time of the file to
This setting relies on the modification time of the file to

determine if a file is ignored. If the modification time of the file is not
updated when lines are written to a file (which can happen on Windows), the
setting may cause {beatname_uc} to ignore files even though content was added
at a later time.

To remove the state of previously harvested files from the registry file, use
the `clean_inactive` configuration option.

[float]
[id="{beatname_lc}-input-{type}-close-options"]
===== `close.*`
Expand Down Expand Up @@ -218,7 +277,7 @@ single log event to a new file. This option is disabled by default.

[float]
[id="{beatname_lc}-input-{type}-close-timeout"]
===== `close.reader.timeout`
===== `close.reader.after_interval`

WARNING: Only use this option if you understand that data loss is a potential
side effect. Another side effect is that multiline events might not be
Expand Down Expand Up @@ -393,4 +452,3 @@ Set the location of the marker file the following way:
----
file_identity.inode_marker.path: /logs/.filebeat-marker
----

90 changes: 90 additions & 0 deletions filebeat/docs/inputs/input-filestream-reader-options.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -141,3 +141,93 @@ The default is 16384.

The maximum number of bytes that a single log message can have. All bytes after
`mesage_max_bytes` are discarded and not sent. The default is 10MB (10485760).

[float]
===== `parsers`

This option expects a list of parsers the log line has to go through.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This option expects a list of parsers the log line has to go through.
This option expects a list of parsers that the log line has to go through.


Avaliable parsers:
- `multiline`
- `ndjson`
Comment on lines +150 to +152
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't rendering correctly as-is.

Suggested change
Avaliable parsers:
- `multiline`
- `ndjson`
Available parsers:
* `multiline`
* `ndjson`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


In this example, {beatname_uc} is reading multiline messages that consist of 3 lines
and encapsulated in single-line JSON objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and encapsulated in single-line JSON objects.
and are encapsulated in single-line JSON objects.

The multiline message is stored under the key `msg`.

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: {type}
...
parsers:
- ndjson:
keys_under_root: true
message_key: msg
- multiline:
type: counter
lines_count: 3
----

See the available parser settings in detail below.

[float]
===== `multiline`

Options that control how {beatname_uc} deals with log messages that span
multiple lines. See <<multiline-examples>> for more information about
configuring multiline options.
urso marked this conversation as resolved.
Show resolved Hide resolved

[float]
===== `ndjson`

These options make it possible for {beatname_uc} to decode logs structured as
JSON messages. {beatname_uc} processes the logs line by line, so the JSON
decoding only works if there is one JSON object per message.

The decoding happens before line filtering. You can combine JSON
decoding with filtering if you set the `message_key` option. This
can be helpful in situations where the application logs are wrapped in JSON
objects, as with like it happens for example with Docker.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
objects, as with like it happens for example with Docker.
objects, like when using Docker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"as with like it happens for example with Docker." I think we need to fix this one too.


Example configuration:

[source,yaml]
----
- ndjson:
keys_under_root: true
add_error_key: true
message_key: log
----

*`keys_under_root`*:: By default, the decoded JSON is placed under a "json" key
in the output document. If you enable this setting, the keys are copied top
level in the output document. The default is false.

*`overwrite_keys`*:: If `keys_under_root` and this setting are enabled, then the
values from the decoded JSON object overwrite the fields that {beatname_uc}
normally adds (type, source, offset, etc.) in case of conflicts.

*`expand_keys`*:: If this setting is enabled, {beatname_uc} will recursively
de-dot keys in the decoded JSON, and expand them into a hierarchical object
structure. For example, `{"a.b.c": 123}` would be expanded into `{"a":{"b":{"c":123}}}`.
This setting should be enabled when the input is produced by an
https://github.com/elastic/ecs-logging[ECS logger].

*`add_error_key`*:: If this setting is enabled, {beatname_uc} adds a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*`add_error_key`*:: If this setting is enabled, {beatname_uc} adds a
*`add_error_key`*:: If this setting is enabled, {beatname_uc} adds an

"error.message" and "error.type: json" key in case of JSON unmarshalling errors
or when a `message_key` is defined in the configuration but cannot be used.

*`message_key`*:: An optional configuration setting that specifies a JSON key on
which to apply the line filtering and multiline settings. If specified the key
must be at the top level in the JSON object and the value associated with the
key must be a string, otherwise no filtering or multiline aggregation will
occur.

*`document_id`*:: Option configuration setting that specifies the JSON key to
set the document id. If configured, the field will be removed from the original
json document and stored in `@metadata._id`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
json document and stored in `@metadata._id`
JSON document and stored in `@metadata._id`


*`ignore_decoding_error`*:: An optional configuration setting that specifies if
JSON decoding errors should be logged or not. If set to true, errors will not
be logged. The default is false.
32 changes: 26 additions & 6 deletions filebeat/docs/inputs/input-filestream.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,30 @@ experimental[]
++++

Use the `filestream` input to read lines from active log files. It is the
new, improved alternative to the `log` input. However, a few feature are
missing from it, e.g. `multiline` or other special parsing capabilities.
These missing options are probably going to be added again. We strive to
achieve feature parity, if possible.
new, improved alternative to the `log` input. It comes various improvements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new, improved alternative to the `log` input. It comes various improvements
new, improved alternative to the `log` input. It comes with various improvements

to the existing input:

1. Checking of `close_*` options happens out of band. Thus, if an output is blocked
{beatname_uc} is able to close the reader and it avoids keeping too many files open.
Comment on lines +16 to +17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Checking of `close_*` options happens out of band. Thus, if an output is blocked
{beatname_uc} is able to close the reader and it avoids keeping too many files open.
1. Checking of `close_*` options happens out of band. Thus, if an output is blocked,
{beatname_uc} can close the reader and avoid keeping too many files open.


2. Detailed metrics are available for all files that match the `paths` configuration
regardless of the `harvester_limit`. This way, you can keep track of all files,
even ones that are not actively read.

3. The order of `parsers` is configurable. So it is possible to parse JSON lines and then
aggregate the contents into a multiline event.

4. Some position updates and metadata changes no longer depend on the publishing pipeline.
If a the pipeline is blocked some changes are still applied to the registry.
urso marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If a the pipeline is blocked some changes are still applied to the registry.
If a pipeline is blocked, some changes are still applied to the registry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if a or the on this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have one pipeline per input, so I would say the.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you remove the a then? It looks like it still reads "If a the pipeline..."

Copy link
Member

@bmorelli25 bmorelli25 May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


5. Only the most recent updates are serialized to the registry. In contrast, the `log` input
has to serialize the complete registry on each ACK from the outputs. This makes the registry updates
much quicker with this input.

6. The input ensures that only offsets updates are written to the registry append only log.
The `log` writes the complete file state.

7. Stale entries can be removed from the registry, even if there is no active input.

To configure this input, specify a list of glob-based <<filestream-input-paths,`paths`>>
that must be crawled to locate and fetch the log lines.
Expand Down Expand Up @@ -158,10 +178,10 @@ on. If enabled it expands a single `**` into a 8-level deep `*` pattern.
This feature is enabled by default. Set `prospector.scanner.recursive_glob` to false to
disable it.

include::../inputs/input-filestream-reader-options.asciidoc[]

include::../inputs/input-filestream-file-options.asciidoc[]

include::../inputs/input-filestream-reader-options.asciidoc[]

[id="{beatname_lc}-input-{type}-common-options"]
include::../inputs/input-common-options.asciidoc[]

Expand Down