-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add per_cpu option to load scraper for hostmetricsreceiver #5243
Changes from 6 commits
87ebca2
592dc7f
b465a8e
58c4bc3
8aaf0b1
01d7d4e
33c2fdf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -16,6 +16,7 @@ package loadscraper | |||
|
||||
import ( | ||||
"context" | ||||
"runtime" | ||||
"time" | ||||
|
||||
"github.com/shirou/gopsutil/load" | ||||
|
@@ -63,6 +64,13 @@ func (s *scraper) scrape(_ context.Context) (pdata.MetricSlice, error) { | |||
return metrics, scrapererror.NewPartialScrapeError(err, metricsLen) | ||||
} | ||||
|
||||
if s.config.PerCPU { | ||||
divisor := float64(runtime.NumCPU()) | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we better expose "num_cpu" as an individual metric, and if people are interested in this they can use the backend of choice, or a "processor" to compute this. What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @bogdandrutu this is a legitimate but difficult question ^^ contextWe are customer of former SignalFx product aquired by Splunk more than 2 years ago I think. In my opinion this deprecation by Splunk was a bit rushed and let their customers with a migration difficult to achieve given that the lack of documentation for SignalFx / Splunk specific requirements and common usages (which is not your problem!). This PR is a tiny fragment of the work I am doing to make this migration seamless and transparent. goalAbout the load metrics, we come from the smart agent load monitor: https://docs.signalfx.com/en/latest/integrations/agent/monitors/load.html which provide an option to average the load metrics per cpu number. This PR aims to bring a similar feature for parity between signalfx smart agent and open telemetry collector to be able to use the otel receiver but keep all existing dependent resources working like charts, detectors.. workaroundFor now, as workaround I still use the smart agent monitor instead of hostmetrics receiver: receivers:
hostmetrics:
collection_interval: 10s
scrapers:
cpu:
disk:
filesystem:
memory:
network:
#load:
paging:
processes:
smartagent/load:
type: load
perCPU: true but I would prefer to drop smart agent monitors in profit of otel receivers if possible your suggestionyour suggestion is full of sens obviously and in fact I could already create the average per cpu on load outside the receiver because I already have everything I need. Indeed, the SignalFx exporter for Otel Collector already expose a metric for number of cpu here: opentelemetry-collector-contrib/exporter/signalfxexporter/internal/translation/constants.go Line 123 in 4f91eb0
so I can basically divide the load metric from hostmetrics receiver by this the problemdoing this way will force users to update all existing resources to use a more complex query to calculate this load averaged per cpu. the problem is we have lot of customers on different signalfx organizations, we do not manage every resources ourselves and without to speak about the complexity of tracking, detect and update resources out of our scope this is also tricky due to contractual / responsibility / permission considerations. even for the resources we manage ourselves properly (iac, git, terraform etc) and are of our responsibility it can be tedious to update existing resources without insert new source of mistakes. For example, our detector to create alert on load metrics is very simple: https://github.com/claranet/terraform-signalfx-detectors/blob/master/modules/smart-agent_system-common/detectors-gen.tf#L80 conclusionIt is a mess ! Honestly I fully understand if you do not want to integrate a "useless" feature in otel collector especially if it is only for a specific vendor. but here is the full explanation, this seems to me to be the less disruptive, the safest and the most straightforward way to handle my need. I also would like to argue that:
ok not sure of the relevance of these arguments but nothing ventured, nothing gained :) |
||||
avgLoadValues.Load1 = avgLoadValues.Load1 / divisor | ||||
avgLoadValues.Load5 = avgLoadValues.Load1 / divisor | ||||
avgLoadValues.Load15 = avgLoadValues.Load1 / divisor | ||||
xp-1000 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
} | ||||
|
||||
metrics.EnsureCapacity(metricsLen) | ||||
|
||||
initializeLoadMetric(metrics.AppendEmpty(), metadata.Metrics.SystemCPULoadAverage1m, now, avgLoadValues.Load1) | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really "per_cpu" (which for me means that is broken down by cpu core), is more or less average per cpu, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is absolutely correct. I confess I simply copied the existing option from smart agent: https://docs.signalfx.com/en/latest/integrations/agent/monitors/load.html#configuration
do you want I change it for something like
average_per_cpu
?