-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update documentation of filestream
input with the new improvements
#25303
Changes from all commits
6673976
78965d8
569a6c5
4a00ccb
c1f433b
793c585
8259a1a
1491adb
7ba1f7b
0c4ab84
8684714
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -37,6 +37,30 @@ a `gz` extension: | |||||
|
||||||
See <<regexp-support>> for a list of supported regexp patterns. | ||||||
|
||||||
===== `prospector.scanner.include_files` | ||||||
|
||||||
A list of regular expressions to match the files that you want {beatname_uc} to | ||||||
include. If a list of regexes is provided, only the files that are allowed by | ||||||
the patterns are harvested. | ||||||
|
||||||
By default no files are excluded. This option is the counterpart of | ||||||
`prospector.scanner.exclude_files`. | ||||||
|
||||||
The following example configures {beatname_uc} to exclude files that | ||||||
are not under `/var/log`: | ||||||
|
||||||
["source","yaml",subs="attributes"] | ||||||
---- | ||||||
{beatname_lc}.inputs: | ||||||
- type: {type} | ||||||
... | ||||||
prospector.scanner.include_files: ['^/var/log/.*'] | ||||||
---- | ||||||
|
||||||
NOTE: Patterns should start with `^` in case of absolute paths. | ||||||
|
||||||
See <<regexp-support>> for a list of supported regexp patterns. | ||||||
|
||||||
===== `prospector.scanner.symlinks` | ||||||
|
||||||
The `symlinks` option allows {beatname_uc} to harvest symlinks in addition to | ||||||
|
@@ -57,6 +81,12 @@ This is, for example, the case for Kubernetes log files. | |||||
|
||||||
Because this option may lead to data loss, it is disabled by default. | ||||||
|
||||||
===== `prospector.scanner.resend_on_touch` | ||||||
|
||||||
If this option is enabled a file is resent if its size has not changed | ||||||
but its modification time has changed to a later time than before. | ||||||
It is disabled by default to avoid accidentally resending files. | ||||||
|
||||||
|
||||||
[float] | ||||||
[id="{beatname_lc}-input-{type}-scan-frequency"] | ||||||
|
@@ -117,6 +147,35 @@ If a file that's currently being harvested falls under `ignore_older`, the | |||||
harvester will first finish reading the file and close it after | ||||||
`close.on_state_change.inactive` is reached. Then, after that, the file will be ignored. | ||||||
|
||||||
[float] | ||||||
[id="{beatname_lc}-input-{type}-ignore-inactive"] | ||||||
===== `ignore_inactive` | ||||||
|
||||||
If this option is enabled, {beatname_uc} ignores every file that has not been | ||||||
updated since the selected time. Possible options are `since_first_start` and | ||||||
`since_last_start`. The first option ignores every file that has not been updated since | ||||||
the first start of {beatname_uc}. It is useful when the Beat might be restarted | ||||||
due to configuration changes or a failure. The second option tells | ||||||
the Beat to read from files that have been updated since its start. | ||||||
|
||||||
The files affected by this setting fall into two categories: | ||||||
|
||||||
* Files that were never harvested | ||||||
* Files that were harvested but weren't updated since `ignore_inactive`. | ||||||
|
||||||
For files that were never seen before, the offset state is set to the end of | ||||||
the file. If a state already exist, the offset is not changed. In case a file is | ||||||
updated again later, reading continues at the set offset position. | ||||||
|
||||||
The setting relies on the modification time of the file to | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ?
Suggested change
|
||||||
determine if a file is ignored. If the modification time of the file is not | ||||||
updated when lines are written to a file (which can happen on Windows), the | ||||||
setting may cause {beatname_uc} to ignore files even though content was added | ||||||
at a later time. | ||||||
|
||||||
To remove the state of previously harvested files from the registry file, use | ||||||
the `clean_inactive` configuration option. | ||||||
|
||||||
[float] | ||||||
[id="{beatname_lc}-input-{type}-close-options"] | ||||||
===== `close.*` | ||||||
|
@@ -218,7 +277,7 @@ single log event to a new file. This option is disabled by default. | |||||
|
||||||
[float] | ||||||
[id="{beatname_lc}-input-{type}-close-timeout"] | ||||||
===== `close.reader.timeout` | ||||||
===== `close.reader.after_interval` | ||||||
|
||||||
WARNING: Only use this option if you understand that data loss is a potential | ||||||
side effect. Another side effect is that multiline events might not be | ||||||
|
@@ -393,4 +452,3 @@ Set the location of the marker file the following way: | |||||
---- | ||||||
file_identity.inode_marker.path: /logs/.filebeat-marker | ||||||
---- | ||||||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -141,3 +141,93 @@ The default is 16384. | |||||||||||||||
|
||||||||||||||||
The maximum number of bytes that a single log message can have. All bytes after | ||||||||||||||||
`mesage_max_bytes` are discarded and not sent. The default is 10MB (10485760). | ||||||||||||||||
|
||||||||||||||||
[float] | ||||||||||||||||
===== `parsers` | ||||||||||||||||
|
||||||||||||||||
This option expects a list of parsers the log line has to go through. | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||
|
||||||||||||||||
Avaliable parsers: | ||||||||||||||||
- `multiline` | ||||||||||||||||
- `ndjson` | ||||||||||||||||
Comment on lines
+150
to
+152
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't rendering correctly as-is.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't look like this syntax was fixed. See https://www.elastic.co/guide/en/beats/filebeat/master/filebeat-input-filestream.html#_parsers |
||||||||||||||||
|
||||||||||||||||
In this example, {beatname_uc} is reading multiline messages that consist of 3 lines | ||||||||||||||||
and encapsulated in single-line JSON objects. | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||
The multiline message is stored under the key `msg`. | ||||||||||||||||
|
||||||||||||||||
["source","yaml",subs="attributes"] | ||||||||||||||||
---- | ||||||||||||||||
{beatname_lc}.inputs: | ||||||||||||||||
- type: {type} | ||||||||||||||||
... | ||||||||||||||||
parsers: | ||||||||||||||||
- ndjson: | ||||||||||||||||
keys_under_root: true | ||||||||||||||||
message_key: msg | ||||||||||||||||
- multiline: | ||||||||||||||||
type: counter | ||||||||||||||||
lines_count: 3 | ||||||||||||||||
---- | ||||||||||||||||
|
||||||||||||||||
See the available parser settings in detail below. | ||||||||||||||||
|
||||||||||||||||
[float] | ||||||||||||||||
===== `multiline` | ||||||||||||||||
|
||||||||||||||||
Options that control how {beatname_uc} deals with log messages that span | ||||||||||||||||
multiple lines. See <<multiline-examples>> for more information about | ||||||||||||||||
configuring multiline options. | ||||||||||||||||
urso marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
|
||||||||||||||||
[float] | ||||||||||||||||
===== `ndjson` | ||||||||||||||||
|
||||||||||||||||
These options make it possible for {beatname_uc} to decode logs structured as | ||||||||||||||||
JSON messages. {beatname_uc} processes the logs line by line, so the JSON | ||||||||||||||||
decoding only works if there is one JSON object per message. | ||||||||||||||||
|
||||||||||||||||
The decoding happens before line filtering. You can combine JSON | ||||||||||||||||
decoding with filtering if you set the `message_key` option. This | ||||||||||||||||
can be helpful in situations where the application logs are wrapped in JSON | ||||||||||||||||
objects, as with like it happens for example with Docker. | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "as with like it happens for example with Docker." I think we need to fix this one too. |
||||||||||||||||
|
||||||||||||||||
Example configuration: | ||||||||||||||||
|
||||||||||||||||
[source,yaml] | ||||||||||||||||
---- | ||||||||||||||||
- ndjson: | ||||||||||||||||
keys_under_root: true | ||||||||||||||||
add_error_key: true | ||||||||||||||||
message_key: log | ||||||||||||||||
---- | ||||||||||||||||
|
||||||||||||||||
*`keys_under_root`*:: By default, the decoded JSON is placed under a "json" key | ||||||||||||||||
in the output document. If you enable this setting, the keys are copied top | ||||||||||||||||
level in the output document. The default is false. | ||||||||||||||||
|
||||||||||||||||
*`overwrite_keys`*:: If `keys_under_root` and this setting are enabled, then the | ||||||||||||||||
values from the decoded JSON object overwrite the fields that {beatname_uc} | ||||||||||||||||
normally adds (type, source, offset, etc.) in case of conflicts. | ||||||||||||||||
|
||||||||||||||||
*`expand_keys`*:: If this setting is enabled, {beatname_uc} will recursively | ||||||||||||||||
de-dot keys in the decoded JSON, and expand them into a hierarchical object | ||||||||||||||||
structure. For example, `{"a.b.c": 123}` would be expanded into `{"a":{"b":{"c":123}}}`. | ||||||||||||||||
This setting should be enabled when the input is produced by an | ||||||||||||||||
https://github.com/elastic/ecs-logging[ECS logger]. | ||||||||||||||||
|
||||||||||||||||
*`add_error_key`*:: If this setting is enabled, {beatname_uc} adds a | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||
"error.message" and "error.type: json" key in case of JSON unmarshalling errors | ||||||||||||||||
or when a `message_key` is defined in the configuration but cannot be used. | ||||||||||||||||
|
||||||||||||||||
*`message_key`*:: An optional configuration setting that specifies a JSON key on | ||||||||||||||||
which to apply the line filtering and multiline settings. If specified the key | ||||||||||||||||
must be at the top level in the JSON object and the value associated with the | ||||||||||||||||
key must be a string, otherwise no filtering or multiline aggregation will | ||||||||||||||||
occur. | ||||||||||||||||
|
||||||||||||||||
*`document_id`*:: Option configuration setting that specifies the JSON key to | ||||||||||||||||
set the document id. If configured, the field will be removed from the original | ||||||||||||||||
json document and stored in `@metadata._id` | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||
|
||||||||||||||||
*`ignore_decoding_error`*:: An optional configuration setting that specifies if | ||||||||||||||||
JSON decoding errors should be logged or not. If set to true, errors will not | ||||||||||||||||
be logged. The default is false. |
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -10,10 +10,30 @@ experimental[] | |||||||||
++++ | ||||||||||
|
||||||||||
Use the `filestream` input to read lines from active log files. It is the | ||||||||||
new, improved alternative to the `log` input. However, a few feature are | ||||||||||
missing from it, e.g. `multiline` or other special parsing capabilities. | ||||||||||
These missing options are probably going to be added again. We strive to | ||||||||||
achieve feature parity, if possible. | ||||||||||
new, improved alternative to the `log` input. It comes various improvements | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
to the existing input: | ||||||||||
|
||||||||||
1. Checking of `close_*` options happens out of band. Thus, if an output is blocked | ||||||||||
{beatname_uc} is able to close the reader and it avoids keeping too many files open. | ||||||||||
Comment on lines
+16
to
+17
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
2. Detailed metrics are available for all files that match the `paths` configuration | ||||||||||
regardless of the `harvester_limit`. This way, you can keep track of all files, | ||||||||||
even ones that are not actively read. | ||||||||||
|
||||||||||
3. The order of `parsers` is configurable. So it is possible to parse JSON lines and then | ||||||||||
aggregate the contents into a multiline event. | ||||||||||
|
||||||||||
4. Some position updates and metadata changes no longer depend on the publishing pipeline. | ||||||||||
If a the pipeline is blocked some changes are still applied to the registry. | ||||||||||
urso marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have one pipeline per input, so I would say the. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you remove the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See step 4 on this page: https://www.elastic.co/guide/en/beats/filebeat/master/filebeat-input-filestream.html#_parsers |
||||||||||
|
||||||||||
5. Only the most recent updates are serialized to the registry. In contrast, the `log` input | ||||||||||
has to serialize the complete registry on each ACK from the outputs. This makes the registry updates | ||||||||||
much quicker with this input. | ||||||||||
|
||||||||||
6. The input ensures that only offsets updates are written to the registry append only log. | ||||||||||
The `log` writes the complete file state. | ||||||||||
|
||||||||||
7. Stale entries can be removed from the registry, even if there is no active input. | ||||||||||
|
||||||||||
To configure this input, specify a list of glob-based <<filestream-input-paths,`paths`>> | ||||||||||
that must be crawled to locate and fetch the log lines. | ||||||||||
|
@@ -158,10 +178,10 @@ on. If enabled it expands a single `**` into a 8-level deep `*` pattern. | |||||||||
This feature is enabled by default. Set `prospector.scanner.recursive_glob` to false to | ||||||||||
disable it. | ||||||||||
|
||||||||||
include::../inputs/input-filestream-reader-options.asciidoc[] | ||||||||||
|
||||||||||
include::../inputs/input-filestream-file-options.asciidoc[] | ||||||||||
|
||||||||||
include::../inputs/input-filestream-reader-options.asciidoc[] | ||||||||||
|
||||||||||
[id="{beatname_lc}-input-{type}-common-options"] | ||||||||||
include::../inputs/input-common-options.asciidoc[] | ||||||||||
|
||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.