No fluentbit_input_records_total in metrics after upgrade to 2.2.0 #8287

hoooli · 2023-12-15T09:51:20Z

Yesterday, I upgraded Fluent Bit to version 2.2.0, and since then, metric fluentbit_input_records_total is not working. It consistently returns 0, even though other metrics are functioning properly.

In grafana/prometheus I'm using
rate(fluentbit_input_records_total[1m])

Configuration:

       Flush         5
       Log_Level     info
       Parsers_File  parsers.conf
       Daemon        off
       HTTP_Server  On
       HTTP_Listen  0.0.0.0
       HTTP_PORT    2020

   [INPUT]
       Name              tail
       Tag               kube.*
       Path              /var/log/containers/*.log
       Exclude_Path      /var/log/containers/*test*.log
       Path_Key          path_filename
       Parser            docker
       DB                /var/log/flb_kube.db
       Mem_Buf_Limit     100MB
       Buffer_Max_Size   8MB
       Refresh_Interval  8
       Rotate_Wait       8
   [INPUT]
       Name              tail
       Tag               kube.test.*
       Path              /var/log/containers/*test*.log
       Path_Key          path_filename
       Parser            docker
       DB                /var/log/flb_kube_2.db
       Mem_Buf_Limit     100MB
       Buffer_Max_Size   8MB
       Refresh_Interval  8

  [FILTER]
       Name                kubernetes
       Match               kube.var.*
       Kube_Tag_Prefix     kube.var.log.containers.
       Merge_Log           On
       K8S-Logging.Parser  On
       K8S-Logging.Exclude On
       Annotations         Off
   [FILTER]
       Name                kubernetes
       Match               kube.test.*
       Kube_Tag_Prefix     kube.test.var.log.containers.
       Merge_Log           On
       K8S-Logging.Parser  On
       K8S-Logging.Exclude On
       Annotations         Off

   [OUTPUT]
       Name         forward
       Match        kube.var.*
       Host        fluentd_host
       Port         9880
       Retry_Limit  5
   [OUTPUT]
       Name        loki
       Match       *
       Host        loki_url
       Port        80
       Retry_Limit  1

   [PARSER]
       Name        docker
       Format      json
       Time_Key    time
       Time_Format %Y-%m-%dT%H:%M:%S.%L
       Time_Keep   On
       Decode_Field_As   escaped_utf8    log    do_next
       Decode_Field_As   json       log
       Reserve_Data On
       Preserve_Key On

The text was updated successfully, but these errors were encountered:

patrick-stephens · 2023-12-15T10:32:17Z

This should be resolved by #8223 but not released yet - credit to @braydonk.
There is a potential issue with that fix though: #8284.

Unstable nightly builds are available if you want to test them (not for production obviously) to confirm: ghcr.io/fluent/fluent-bit/unstable:master
There is also an AMD64-only master build on every commit: ghcr.io/fluent/fluent-bit/master

hoooli · 2023-12-15T11:50:35Z

Thank you, I can confirm that there is an issue with the payload in the nightly version.
However, the metrics were working for a few seconds :)

[2023/12/15 12:40:42] [error] [output:loki:loki.1] cannot compose request payload
[2023/12/15 12:40:42] [error] [engine] chunk '1-1702640427.501846521.flb' cannot be retried: task_id=4, input=tail.0 > output=loki.1
[2023/12/15 12:40:42] [error] [output:loki:loki.1] cannot compose request payload
[2023/12/15 12:40:42] [error] [engine] chunk '1-1702640428.418691142.flb' cannot be retried: task_id=9, input=tail.0 > output=loki.1
[2023/12/15 12:40:42] [error] [engine] chunk '1-1702640430.874134875.flb' cannot be retried: task_id=10, input=tail.0 > output=loki.1
[2023/12/15 12:40:43] [error] [output:loki:loki.1] cannot compose request payload
[2023/12/15 12:40:43] [error] [engine] chunk '1-1702640434.394118994.flb' cannot be retried: task_id=2, input=tail.0 > output=loki.1

patrick-stephens · 2023-12-15T11:55:40Z

Yeah so sounds like this is resolved in that regard and now a duplicate of #8284 :)

braydonk · 2023-12-15T13:39:48Z

Thanks for the test. Yeah, there was an unexpected side-effect in my first fix. Hopefully someone is able to take a look at the follow-up soon. #8229

edsiper · 2023-12-27T23:42:29Z

closed as fixed.

hoooli added the status: waiting-for-triage label Dec 15, 2023

patrick-stephens added the waiting-for-release This has been fixed/merged but it's waiting to be included in a release. label Dec 15, 2023

edsiper closed this as completed Dec 27, 2023

skhalash mentioned this issue Dec 28, 2023

Bump Fluent Bit to 2.2.1 kyma-project/telemetry-manager#673

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No fluentbit_input_records_total in metrics after upgrade to 2.2.0 #8287

No fluentbit_input_records_total in metrics after upgrade to 2.2.0 #8287

hoooli commented Dec 15, 2023 •

edited

Loading

patrick-stephens commented Dec 15, 2023

hoooli commented Dec 15, 2023 •

edited

Loading

patrick-stephens commented Dec 15, 2023

braydonk commented Dec 15, 2023 •

edited

Loading

edsiper commented Dec 27, 2023

No fluentbit_input_records_total in metrics after upgrade to 2.2.0 #8287

No fluentbit_input_records_total in metrics after upgrade to 2.2.0 #8287

Comments

hoooli commented Dec 15, 2023 • edited Loading

patrick-stephens commented Dec 15, 2023

hoooli commented Dec 15, 2023 • edited Loading

patrick-stephens commented Dec 15, 2023

braydonk commented Dec 15, 2023 • edited Loading

edsiper commented Dec 27, 2023

hoooli commented Dec 15, 2023 •

edited

Loading

hoooli commented Dec 15, 2023 •

edited

Loading

braydonk commented Dec 15, 2023 •

edited

Loading