Add more metrics to in_monitor_agent #2450

ganmacs · 2019-06-11T02:42:35Z

Which issue(s) this PR fixes:

no

What this PR does / why we need it:

I added metrics which output and buffer have to the response of in_monitor_agent plugin to fetch the fluentd's inner stats.

Docs Changes:

Need to revise examples https://docs.fluentd.org/input/monitor_agent#output-example

Release Note:

Same as the title

repeatedly · 2019-06-11T06:15:24Z

lib/fluent/plugin/output.rb

+        }
+
+        if @buffer && @buffer.respond_to?(:statistics)
+          (@buffer.statistics && @buffer.statistics['buffer'] || {}).each do |k, v|


When @buffer.statistics return false/nil?

looks unnecessary. thank you!
7e0941a

repeatedly · 2019-06-11T06:16:03Z

lib/fluent/plugin/buffer.rb

+          stats['newest_timekey'] = m
+        end
+
+        { 'buffer' => stats }


What is the merit of {'buffer' => stats} rather than stats?

To have consistent with the result of Output#statistics.
I think it better having { 'output' => stats } than stats since in_monitor_agent.rb can identify which the data come from when the components we want to collect stats( at https://github.com/fluent/fluentd/pull/2450/files#diff-0ad62dd2012fecaa6ed2232f8918c528R382) increase.

bai · 2019-06-17T18:07:10Z

👋Semi-related question: do you think it would make sense to add flush_time metric as well, that'd answer "how long it takes for Output to perform write/try_write"?

they are used for monitoring Signed-off-by: Yuta Iwama <ganmacs@gmail.com>

Signed-off-by: Yuta Iwama <ganmacs@gmail.com>

ganmacs · 2019-06-18T05:32:37Z

do you think it would make sense to add flush_time metric as well

Looks neat for me. I added flush_time_count and slow_flush_count 63c006e !

bai · 2019-06-18T05:35:04Z

Fantastic, thank you.

repeatedly · 2019-06-18T05:39:11Z

flush_time_count

Is this useful? This value seems the total of flush time. How to use this value in monitoring?

ganmacs · 2019-06-18T05:45:20Z

I think it is useful for getting the value which is like average flush_time in 5 mins.
(in promql query is like increase(flush_time_count[5m] / write_count[5m]) )

bai · 2019-06-18T05:53:45Z

I think it is useful for getting the value which is like average flush_time in 5 mins.
(in promql query is like increase(flush_time_count[5m] / write_count[5m]) )

This is exactly what I've meant 👍 🎉

repeatedly · 2019-06-18T06:03:34Z

Okay, I understood. elapsed_time is Float type. How to handle overflow? Reset to 0?
Some fluentd users launch many flush thread and it increases flush_time_count quickly.

ganmacs · 2019-06-18T06:15:18Z

elapsed_time is Float type.

Good catch… Since flush_time_count is counter, I think that this value should increase monotonically. I think it's good to use Integer type (to calling #to_i before adding elapsed_time to flush_time ) since most of flush_time is smaller than 1.

done 628379e

bai · 2019-06-19T06:08:03Z

@ganmacs Do we need to multiply elapsed_time by 1000 before calling to_i to get millisecond precision?

Signed-off-by: Yuta Iwama <ganmacs@gmail.com>

repeatedly · 2019-06-21T10:00:50Z

lib/fluent/plugin/output.rb

@@ -1202,7 +1204,10 @@ def backup_chunk(chunk, using_secondary, delayed_commit)

      def check_slow_flush(start)
        elapsed_time = Fluent::Clock.now - start
+        # millsec precision
+        @counters_monitor.synchronize { @flush_time_count += (elapsed_time * 1000).to_i }


How about (elapsed_time * 1000).to_i calculation outside of synchronize?

05d842e fixed

Signed-off-by: Yuta Iwama <ganmacs@gmail.com>

repeatedly · 2019-07-01T02:38:04Z

Thx!

ganmacs requested a review from repeatedly June 11, 2019 02:42

repeatedly reviewed Jun 11, 2019

View reviewed changes

ganmacs added 3 commits June 18, 2019 11:23

Add stats to output and buffer plugins

06de13d

they are used for monitoring Signed-off-by: Yuta Iwama <ganmacs@gmail.com>

Remove unused nil check

6736701

Signed-off-by: Yuta Iwama <ganmacs@gmail.com>

Add slow_flush_count and flush_time_count

63c006e

Signed-off-by: Yuta Iwama <ganmacs@gmail.com>

ganmacs force-pushed the add-more-metrics branch from 7e0941a to 63c006e Compare June 18, 2019 05:31

Call to_i to avoid a overflow

5edaca8

Signed-off-by: Yuta Iwama <ganmacs@gmail.com>

ganmacs force-pushed the add-more-metrics branch from 628379e to 8b5e34f Compare June 21, 2019 08:01

impl fuzzy_assert since we didn't interested in the value of metrics

11105cf

Signed-off-by: Yuta Iwama <ganmacs@gmail.com>

ganmacs force-pushed the add-more-metrics branch from 8b5e34f to 11105cf Compare June 21, 2019 09:00

bai mentioned this pull request Jun 21, 2019

Expose flush_time_count and slow_flush_count metrics fluent/fluent-plugin-prometheus#100

Merged

repeatedly reviewed Jun 21, 2019

View reviewed changes

Calculate eplapsed time outside of lock

05d842e

Signed-off-by: Yuta Iwama <ganmacs@gmail.com>

bai approved these changes Jun 25, 2019

View reviewed changes

repeatedly merged commit 07fee9b into fluent:master Jul 1, 2019

ganmacs deleted the add-more-metrics branch July 1, 2019 02:40

repeatedly mentioned this pull request Jul 10, 2019

Improve stats method #2491

Merged

kazegusuri mentioned this pull request Aug 10, 2019

add more metrics fluent/fluent-plugin-prometheus#107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more metrics to in_monitor_agent #2450

Add more metrics to in_monitor_agent #2450

ganmacs commented Jun 11, 2019

repeatedly Jun 11, 2019 •

edited

Loading

ganmacs Jun 11, 2019

repeatedly Jun 11, 2019

ganmacs Jun 11, 2019

bai commented Jun 17, 2019

ganmacs commented Jun 18, 2019

bai commented Jun 18, 2019

repeatedly commented Jun 18, 2019

ganmacs commented Jun 18, 2019

bai commented Jun 18, 2019

repeatedly commented Jun 18, 2019 •

edited

Loading

ganmacs commented Jun 18, 2019 •

edited

Loading

bai commented Jun 19, 2019

repeatedly Jun 21, 2019

ganmacs Jun 25, 2019

repeatedly commented Jul 1, 2019

Add more metrics to in_monitor_agent #2450

Add more metrics to in_monitor_agent #2450

Conversation

ganmacs commented Jun 11, 2019

repeatedly Jun 11, 2019 • edited Loading

Choose a reason for hiding this comment

ganmacs Jun 11, 2019

Choose a reason for hiding this comment

repeatedly Jun 11, 2019

Choose a reason for hiding this comment

ganmacs Jun 11, 2019

Choose a reason for hiding this comment

bai commented Jun 17, 2019

ganmacs commented Jun 18, 2019

bai commented Jun 18, 2019

repeatedly commented Jun 18, 2019

ganmacs commented Jun 18, 2019

bai commented Jun 18, 2019

repeatedly commented Jun 18, 2019 • edited Loading

ganmacs commented Jun 18, 2019 • edited Loading

bai commented Jun 19, 2019

repeatedly Jun 21, 2019

Choose a reason for hiding this comment

ganmacs Jun 25, 2019

Choose a reason for hiding this comment

repeatedly commented Jul 1, 2019

repeatedly Jun 11, 2019 •

edited

Loading

repeatedly commented Jun 18, 2019 •

edited

Loading

ganmacs commented Jun 18, 2019 •

edited

Loading