diff --git a/filebeat/docs/inputs/input-filestream-file-options.asciidoc b/filebeat/docs/inputs/input-filestream-file-options.asciidoc index b0ced1eab5b..0cb482a8300 100644 --- a/filebeat/docs/inputs/input-filestream-file-options.asciidoc +++ b/filebeat/docs/inputs/input-filestream-file-options.asciidoc @@ -37,6 +37,30 @@ a `gz` extension: See <> for a list of supported regexp patterns. +===== `prospector.scanner.include_files` + +A list of regular expressions to match the files that you want {beatname_uc} to +include. If a list of regexes is provided, only the files that are allowed by +the patterns are harvested. + +By default no files are excluded. This option is the counterpart of +`prospector.scanner.exclude_files`. + +The following example configures {beatname_uc} to exclude files that +are not under `/var/log`: + +["source","yaml",subs="attributes"] +---- +{beatname_lc}.inputs: +- type: {type} + ... + prospector.scanner.include_files: ['^/var/log/.*'] +---- + +NOTE: Patterns should start with `^` in case of absolute paths. + +See <> for a list of supported regexp patterns. + ===== `prospector.scanner.symlinks` The `symlinks` option allows {beatname_uc} to harvest symlinks in addition to @@ -57,6 +81,12 @@ This is, for example, the case for Kubernetes log files. Because this option may lead to data loss, it is disabled by default. +===== `prospector.scanner.resend_on_touch` + +If this option is enabled a file is resent if its size has not changed +but its modification time has changed to a later time than before. +It is disabled by default to avoid accidentally resending files. + [float] [id="{beatname_lc}-input-{type}-scan-frequency"] @@ -117,6 +147,35 @@ If a file that's currently being harvested falls under `ignore_older`, the harvester will first finish reading the file and close it after `close.on_state_change.inactive` is reached. Then, after that, the file will be ignored. +[float] +[id="{beatname_lc}-input-{type}-ignore-inactive"] +===== `ignore_inactive` + +If this option is enabled, {beatname_uc} ignores every file that has not been +updated since the selected time. Possible options are `since_first_start` and +`since_last_start`. The first option ignores every file that has not been updated since +the first start of {beatname_uc}. It is useful when the Beat might be restarted +due to configuration changes or a failure. The second option tells +the Beat to read from files that have been updated since its start. + +The files affected by this setting fall into two categories: + +* Files that were never harvested +* Files that were harvested but weren't updated since `ignore_inactive`. + +For files that were never seen before, the offset state is set to the end of +the file. If a state already exist, the offset is not changed. In case a file is +updated again later, reading continues at the set offset position. + +The setting relies on the modification time of the file to +determine if a file is ignored. If the modification time of the file is not +updated when lines are written to a file (which can happen on Windows), the +setting may cause {beatname_uc} to ignore files even though content was added +at a later time. + +To remove the state of previously harvested files from the registry file, use +the `clean_inactive` configuration option. + [float] [id="{beatname_lc}-input-{type}-close-options"] ===== `close.*` @@ -218,7 +277,7 @@ single log event to a new file. This option is disabled by default. [float] [id="{beatname_lc}-input-{type}-close-timeout"] -===== `close.reader.timeout` +===== `close.reader.after_interval` WARNING: Only use this option if you understand that data loss is a potential side effect. Another side effect is that multiline events might not be @@ -393,4 +452,3 @@ Set the location of the marker file the following way: ---- file_identity.inode_marker.path: /logs/.filebeat-marker ---- - diff --git a/filebeat/docs/inputs/input-filestream-reader-options.asciidoc b/filebeat/docs/inputs/input-filestream-reader-options.asciidoc index 8b365f1ede2..9e3a124c295 100644 --- a/filebeat/docs/inputs/input-filestream-reader-options.asciidoc +++ b/filebeat/docs/inputs/input-filestream-reader-options.asciidoc @@ -141,3 +141,93 @@ The default is 16384. The maximum number of bytes that a single log message can have. All bytes after `mesage_max_bytes` are discarded and not sent. The default is 10MB (10485760). + +[float] +===== `parsers` + +This option expects a list of parsers the log line has to go through. + +Avaliable parsers: +- `multiline` +- `ndjson` + +In this example, {beatname_uc} is reading multiline messages that consist of 3 lines +and encapsulated in single-line JSON objects. +The multiline message is stored under the key `msg`. + +["source","yaml",subs="attributes"] +---- +{beatname_lc}.inputs: +- type: {type} + ... + parsers: + - ndjson: + keys_under_root: true + message_key: msg + - multiline: + type: counter + lines_count: 3 +---- + +See the available parser settings in detail below. + +[float] +===== `multiline` + +Options that control how {beatname_uc} deals with log messages that span +multiple lines. See <> for more information about +configuring multiline options. + +[float] +===== `ndjson` + +These options make it possible for {beatname_uc} to decode logs structured as +JSON messages. {beatname_uc} processes the logs line by line, so the JSON +decoding only works if there is one JSON object per message. + +The decoding happens before line filtering. You can combine JSON +decoding with filtering if you set the `message_key` option. This +can be helpful in situations where the application logs are wrapped in JSON +objects, as with like it happens for example with Docker. + +Example configuration: + +[source,yaml] +---- +- ndjson: + keys_under_root: true + add_error_key: true + message_key: log +---- + +*`keys_under_root`*:: By default, the decoded JSON is placed under a "json" key +in the output document. If you enable this setting, the keys are copied top +level in the output document. The default is false. + +*`overwrite_keys`*:: If `keys_under_root` and this setting are enabled, then the +values from the decoded JSON object overwrite the fields that {beatname_uc} +normally adds (type, source, offset, etc.) in case of conflicts. + +*`expand_keys`*:: If this setting is enabled, {beatname_uc} will recursively +de-dot keys in the decoded JSON, and expand them into a hierarchical object +structure. For example, `{"a.b.c": 123}` would be expanded into `{"a":{"b":{"c":123}}}`. +This setting should be enabled when the input is produced by an +https://github.com/elastic/ecs-logging[ECS logger]. + +*`add_error_key`*:: If this setting is enabled, {beatname_uc} adds a +"error.message" and "error.type: json" key in case of JSON unmarshalling errors +or when a `message_key` is defined in the configuration but cannot be used. + +*`message_key`*:: An optional configuration setting that specifies a JSON key on +which to apply the line filtering and multiline settings. If specified the key +must be at the top level in the JSON object and the value associated with the +key must be a string, otherwise no filtering or multiline aggregation will +occur. + +*`document_id`*:: Option configuration setting that specifies the JSON key to +set the document id. If configured, the field will be removed from the original +json document and stored in `@metadata._id` + +*`ignore_decoding_error`*:: An optional configuration setting that specifies if +JSON decoding errors should be logged or not. If set to true, errors will not +be logged. The default is false. diff --git a/filebeat/docs/inputs/input-filestream.asciidoc b/filebeat/docs/inputs/input-filestream.asciidoc index be121a4fd7e..219a1e50d23 100644 --- a/filebeat/docs/inputs/input-filestream.asciidoc +++ b/filebeat/docs/inputs/input-filestream.asciidoc @@ -10,10 +10,30 @@ experimental[] ++++ Use the `filestream` input to read lines from active log files. It is the -new, improved alternative to the `log` input. However, a few feature are -missing from it, e.g. `multiline` or other special parsing capabilities. -These missing options are probably going to be added again. We strive to -achieve feature parity, if possible. +new, improved alternative to the `log` input. It comes various improvements +to the existing input: + +1. Checking of `close_*` options happens out of band. Thus, if an output is blocked +{beatname_uc} is able to close the reader and it avoids keeping too many files open. + +2. Detailed metrics are available for all files that match the `paths` configuration +regardless of the `harvester_limit`. This way, you can keep track of all files, +even ones that are not actively read. + +3. The order of `parsers` is configurable. So it is possible to parse JSON lines and then +aggregate the contents into a multiline event. + +4. Some position updates and metadata changes no longer depend on the publishing pipeline. +If a the pipeline is blocked some changes are still applied to the registry. + +5. Only the most recent updates are serialized to the registry. In contrast, the `log` input +has to serialize the complete registry on each ACK from the outputs. This makes the registry updates +much quicker with this input. + +6. The input ensures that only offsets updates are written to the registry append only log. +The `log` writes the complete file state. + +7. Stale entries can be removed from the registry, even if there is no active input. To configure this input, specify a list of glob-based <> that must be crawled to locate and fetch the log lines. @@ -158,10 +178,10 @@ on. If enabled it expands a single `**` into a 8-level deep `*` pattern. This feature is enabled by default. Set `prospector.scanner.recursive_glob` to false to disable it. -include::../inputs/input-filestream-reader-options.asciidoc[] - include::../inputs/input-filestream-file-options.asciidoc[] +include::../inputs/input-filestream-reader-options.asciidoc[] + [id="{beatname_lc}-input-{type}-common-options"] include::../inputs/input-common-options.asciidoc[]