-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul metricbeat module #8631
Consul metricbeat module #8631
Conversation
Thanks for taking this one, really looking forward to see this module happening! About Gauges, Counters and other types on their side: As we don't have a real differentiation on our side, I'm wondering if we should keep the Gauge/Counter/.. prefix, probably not? MapStr supports nesting, so while you put the metrics on it the hierarchy should be created automatically. I see you are currently creating one event per metric, why not putting all of them together in the same document? If all these metrics belong to the same logical unit (same node, same table or so) they can be put together in the same document. That has some benefits, like saving space and increasing indexing performance. From what I see here: https://www.consul.io/docs/agent/telemetry.html I think all metrics are giving you information about the agent you are querying. |
For naming also have a look at our conventions here: https://www.elastic.co/guide/en/beats/devguide/current/event-conventions.html I would not prefix with gauge or counter. I rather hope we can use something like elastic/elasticsearch#33267 in the future. |
Thanks for your comments @exekias @ruflin With the last commit, I have used {
"catalog": {
"service": {
"query": {
"labels": {
"service": "alertmanager"
},
"value": {
"labels": {
"service": "zendesk-widget-lb"
},
"value": 1
}
}
}
},
"fsm": {
"coordinate": {
"batch-update": {
"count": 2,
"labels": {},
"max": 1.622668981552124,
"mean": 1.523213505744934,
"min": 1.4237580299377441,
"rate": 0,
"stddev": 0.14065128273879726,
"sum": 3.046427011489868
}
}, I have some doubt: Each metric could have a value and a label so it leads to values like the ones you can see in the example above. Here it would be very convenient to have the metadata that @ruflin was suggesting but right now the only way is to have a nested object. The following things things are still left:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module can be a great addition to metricbeat 👍
I have some comments, but they can be caused by a lack of knowledge about consul, so I can be completely wrong 🙂
I am a bit worried with converting all fields automatically, according to consul docs there seems to be some metrics whose name depends on service name or even type of request, this could lead to an explosion in fields. I think that some of this info available in consul metric names should be moved to fields.
Also, for the same reason, I think that we could have multiple metricsets, a general one for the agent stats, another one for service specific metrics, another one for samples...
It seems that metrics can also be exported in prometheus format, I wonder if we can take advantage of this somehow, have you explored this option?
For the label / metrics organisation, I wonder if we could take some inspiration from how we do it in prometheus (@exekias ? ). Definitively something we should discuss in more detail. |
Yes, from the way this looks, it sounds like you can do something really similar to what we do with Prometheus metrics: https://github.com/elastic/beats/blob/master/metricbeat/module/prometheus/collector/collector.go#L88-L102 Where we basically gather all metrics and then combine metrics with the same labels in the same document |
I'm not sure if I'm understanding correctly. If a specific metric has 2 labels and shares only one with a second metric, where do you put it if it shares the other label with a third metric? Maybe I'll need a longer explanation 😅 |
Interestingly the output also allows to enable the prometheus format (but it has to be enabled manually it seems and is not default). So the Prometheus approach should work here. @sayden Best have a look at the prometheus module or sync up with @exekias . As far as I remember only metrics with fully identical metrics were mixed together, so the above would create 3 different events. |
b56cb92
to
0f0e6e0
Compare
0f0e6e0
to
9d8850d
Compare
@jsoriano I couldn't find any relevant about this #8631 (comment), just vague information. Feel free to open a follow up PR to add this information in a way that is clear for a user and doesn't lead him to confusions with how Consul and Metricbeat fetches / stores / reports data. |
1f0d442
to
6f75436
Compare
jenkins, test this |
After a conversation with @jsoriano we are going to leave out everything that aren't Gauges until we find out how to properly collect the Samples and Counters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think this is almost ready to go, as summary of things to at least think about before merging this:
- Report
healthy
as boolean? - Implement a
Key() string
method in the value helper so dots are not needed in the units in the mapping definition. - Remove unused code (even if it might be used in future PRs).
fields: | ||
- name: healthy | ||
type: long | ||
description: Overall health of the local server cluster. If all servers are considered healthy by Autopilot, this will be set to 1. If any are unhealthy, this will be 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could convert this to true/false, in consul all metrics are numeric, but we have booleans too 🙂
allowedDetailedValues = map[string]valueHelper{ | ||
//"consul.raft.apply": {Value: "raft.apply"}, | ||
//"consul.raft.commitTime": {Value: "raft.commit_time", Unit: ".ms"}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to leave it, as they are the values that we have removed. Will you be ok if I add a comment like "fields to be added, removed temporarly"?
Most comments addressed, pending things are not blockers for me.
@sayden Can you rebase this on master? This should fix your CI. |
61fd105
to
4da09c7
Compare
Rebased |
jenkins, test this |
Error was something unrelated about ML in Filebeat. Merging |
Completely renewed Consul Agent module.
It now generates a groups of events based on their labels so the
data.json
shows a full event with no labels (or all labels being the same). But different labeling on the metrics will generate chunks of thisdata.json
file. Fields that aren't filled aren't sent either.