Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser processor does not override timestamp #6402

Closed
justingab opened this issue Sep 16, 2019 · 3 comments
Closed

Parser processor does not override timestamp #6402

justingab opened this issue Sep 16, 2019 · 3 comments
Labels
bug unexpected problem or unintended behavior

Comments

@justingab
Copy link

Relevant telegraf.conf:

 [[inputs.docker_log]]
#   ## Docker Endpoint
#   ##   To use TCP, set endpoint = "tcp://[ip]:[port]"
#   ##   To use environment variables (ie, docker-machine), set endpoint = "ENV"
#   # endpoint = "unix:///var/run/docker.sock"
#
#   ## When true, container logs are read from the beginning; otherwise
#   ## reading begins at the end of the log.
    from_beginning = false
#
#   ## Timeout for Docker API calls.
#   # timeout = "5s"
#
#   ## Containers to include and exclude. Globs accepted.
#   ## Note that an empty array for both will include all containers
    container_name_include = ["dockercompose_ingestion_*"]
    [[processors.parser]]
      parse_fields = ["message"]
      merge = "override"
      data_format = "grok"
      #drop_original = true
      grok_patterns = [
                      '%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05.000000"} kind=%{NOTSPACE:kind:tag} message_id=%{NOTSPACE:message_id} size=%{INT:size:int} latency=%{NUMBER:latency:float} latency_event=%{NUMBER:latency_event:float} time_ingest=%{NUMBER:ingest_time:float} loop=%{NUMBER:loop_time:float} row=%{NUMBER:row_result:int}']

System info:

telegraf 1.12.0 running inside an alpine docker container

Steps to reproduce:

  1. run docker container that generates logs like bellow

2019-09-16 21:19:34.647842 publisher_id=mypublisherid kind=MACHINESTATE message_id=0f42f9ccf3596b4473568977c245ff1b size=1026 latency=0.102023 latency_event=0.586842 time_ingest=0.025167 loop=0.025444 row=228157532

  1. run Telegraf to collect logs and store in InfluxDB.

Expected behavior:

The logs are parsed and stored according to the timestamp in the log file not the time of log read in docker

Actual behavior:

For one the from_beginning in docker_logs is ignored it parses all logs since container creation and it applies a timestamp of the time it was read by telegraf.

The processor.parser works, but it does not override the timestamp. If I switch to drop_original this takes only the parsed metrics, with the correct timestamp but then I lose out on the additional docker tags that are relevant.

Additional info:

@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Sep 16, 2019
@danielnelson danielnelson added this to the 1.12.2 milestone Sep 16, 2019
@danielnelson danielnelson self-assigned this Sep 24, 2019
@danielnelson danielnelson changed the title docker_logs with processor.parser not overriding the timestamp Parser processor does not override timestamp Sep 24, 2019
@danielnelson danielnelson modified the milestones: 1.12.2, 1.13.0 Sep 24, 2019
@danielnelson
Copy link
Contributor

I was hoping to add this for 1.12.2, but it's a bit trickier than was expected. In the case where no timestamp is provided in the parsed metric we will want to avoid changing the timestamp. However, since all metrics have a timestamp, we don't have a way to determine if the timestamp should or should not be modified.

@sjwang90 sjwang90 removed this from the 1.13.0 milestone Oct 24, 2019
@sjwang90 sjwang90 added the discussion Topics for discussion label Oct 24, 2019
@danielnelson danielnelson removed the discussion Topics for discussion label Jun 5, 2020
@sensor-freak
Copy link
Contributor

sensor-freak commented Nov 22, 2020

Just lost a few hours looking for the problem with my configuration, now I discovered that it wasn't my fault...

My file input contains measurements in arbitrary (not chronological) order, and the file has to be re-parsed in each interval. But when I try to csv-parse the timestamp field using the merge="override" option, all lines are receiving the same timestamp.

Maybe an additional parameter "merge-override-timestamp = true" could be introduced to handle the problem identified by @danielnelson ?

@danielnelson danielnelson removed their assignment Sep 1, 2021
@srebhan
Copy link
Member

srebhan commented Aug 2, 2023

Works with

[[processors.parser]]
  parse_fields = ["message"]
  merge = "override-with-timestamp"
  data_format = "grok"
  grok_patterns = ['%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05.000000"} publisher_id=%{NOTSPACE:publisher_id:tag} kind=%{NOTSPACE:kind:tag} message_id=%{NOTSPACE:message_id} size=%{INT:size:int} latency=%{NUMBER:latency:float} latency_event=%{NUMBER:latency_event:float} time_ingest=%{NUMBER:ingest_time:float} loop=%{NUMBER:loop_time:float} row=%{NUMBER:row_result:int}']

since version 1.27.0 (see #13147).

@srebhan srebhan closed this as completed Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

5 participants