Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul metricbeat module #8631

Merged
merged 14 commits into from
Feb 13, 2019
Merged

Conversation

sayden
Copy link
Contributor

@sayden sayden commented Oct 17, 2018

Completely renewed Consul Agent module.

It now generates a groups of events based on their labels so the data.json shows a full event with no labels (or all labels being the same). But different labeling on the metrics will generate chunks of this data.json file. Fields that aren't filled aren't sent either.

@sayden sayden requested review from ruflin and exekias October 17, 2018 12:15
@sayden sayden added the module label Oct 17, 2018
@sayden sayden added Metricbeat Metricbeat enhancement in progress Pull request is currently in progress. labels Oct 17, 2018
@exekias
Copy link
Contributor

exekias commented Oct 18, 2018

Thanks for taking this one, really looking forward to see this module happening!

About Gauges, Counters and other types on their side: As we don't have a real differentiation on our side, I'm wondering if we should keep the Gauge/Counter/.. prefix, probably not?

MapStr supports nesting, so while you put the metrics on it the hierarchy should be created automatically. I see you are currently creating one event per metric, why not putting all of them together in the same document?

If all these metrics belong to the same logical unit (same node, same table or so) they can be put together in the same document. That has some benefits, like saving space and increasing indexing performance. From what I see here: https://www.consul.io/docs/agent/telemetry.html I think all metrics are giving you information about the agent you are querying.

@sayden sayden mentioned this pull request Oct 23, 2018
12 tasks
@ruflin
Copy link
Collaborator

ruflin commented Oct 25, 2018

For naming also have a look at our conventions here: https://www.elastic.co/guide/en/beats/devguide/current/event-conventions.html

I would not prefix with gauge or counter. I rather hope we can use something like elastic/elasticsearch#33267 in the future.

@sayden
Copy link
Contributor Author

sayden commented Oct 28, 2018

Thanks for your comments @exekias @ruflin

With the last commit, I have used safemapstr and now we have a much nicer tree structure that can be sent in a single event

{
	"catalog": {
		"service": {
			"query": {
				"labels": {
					"service": "alertmanager"
				},
				"value": {
					"labels": {
						"service": "zendesk-widget-lb"
					},
					"value": 1
				}
			}
		}
	},
	"fsm": {
		"coordinate": {
			"batch-update": {
				"count": 2,
				"labels": {},
				"max": 1.622668981552124,
				"mean": 1.523213505744934,
				"min": 1.4237580299377441,
				"rate": 0,
				"stddev": 0.14065128273879726,
				"sum": 3.046427011489868
			}
		},

I have some doubt: Each metric could have a value and a label so it leads to values like the ones you can see in the example above. Here it would be very convenient to have the metadata that @ruflin was suggesting but right now the only way is to have a nested object.

The following things things are still left:

  • Some metric names are CamelCase that need to be lowercased as shown here
  • Integration Tests
  • System tests
  • Commenting + Documentation
  • Kibana Dashboard

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module can be a great addition to metricbeat 👍

I have some comments, but they can be caused by a lack of knowledge about consul, so I can be completely wrong 🙂

I am a bit worried with converting all fields automatically, according to consul docs there seems to be some metrics whose name depends on service name or even type of request, this could lead to an explosion in fields. I think that some of this info available in consul metric names should be moved to fields.

Also, for the same reason, I think that we could have multiple metricsets, a general one for the agent stats, another one for service specific metrics, another one for samples...

It seems that metrics can also be exported in prometheus format, I wonder if we can take advantage of this somehow, have you explored this option?

@ruflin
Copy link
Collaborator

ruflin commented Oct 29, 2018

For the label / metrics organisation, I wonder if we could take some inspiration from how we do it in prometheus (@exekias ? ). Definitively something we should discuss in more detail.

@exekias
Copy link
Contributor

exekias commented Oct 29, 2018

Yes, from the way this looks, it sounds like you can do something really similar to what we do with Prometheus metrics: https://github.com/elastic/beats/blob/master/metricbeat/module/prometheus/collector/collector.go#L88-L102

Where we basically gather all metrics and then combine metrics with the same labels in the same document

@sayden
Copy link
Contributor Author

sayden commented Oct 29, 2018

Where we basically gather all metrics and then combine metrics with the same labels in the same document

I'm not sure if I'm understanding correctly. If a specific metric has 2 labels and shares only one with a second metric, where do you put it if it shares the other label with a third metric? Maybe I'll need a longer explanation 😅

@ruflin
Copy link
Collaborator

ruflin commented Oct 30, 2018

Interestingly the output also allows to enable the prometheus format (but it has to be enabled manually it seems and is not default). So the Prometheus approach should work here.

@sayden Best have a look at the prometheus module or sync up with @exekias . As far as I remember only metrics with fully identical metrics were mixed together, so the above would create 3 different events.

@ruflin ruflin added Team:Integrations Label for the Integrations team and removed Team:Integrations Label for the Integrations team labels Nov 21, 2018
@sayden sayden force-pushed the feature/consul-metricbeat-module branch from b56cb92 to 0f0e6e0 Compare December 13, 2018 13:53
@sayden sayden force-pushed the feature/consul-metricbeat-module branch from 0f0e6e0 to 9d8850d Compare February 1, 2019 13:41
@sayden sayden requested review from a team as code owners February 1, 2019 13:41
@sayden sayden added review and removed in progress Pull request is currently in progress. labels Feb 1, 2019
@sayden
Copy link
Contributor Author

sayden commented Feb 7, 2019

@jsoriano I couldn't find any relevant about this #8631 (comment), just vague information. Feel free to open a follow up PR to add this information in a way that is clear for a user and doesn't lead him to confusions with how Consul and Metricbeat fetches / stores / reports data.

@sayden sayden force-pushed the feature/consul-metricbeat-module branch from 1f0d442 to 6f75436 Compare February 11, 2019 11:02
@sayden
Copy link
Contributor Author

sayden commented Feb 11, 2019

jenkins, test this

@sayden
Copy link
Contributor Author

sayden commented Feb 11, 2019

After a conversation with @jsoriano we are going to leave out everything that aren't Gauges until we find out how to properly collect the Samples and Counters

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this is almost ready to go, as summary of things to at least think about before merging this:

  • Report healthy as boolean?
  • Implement a Key() string method in the value helper so dots are not needed in the units in the mapping definition.
  • Remove unused code (even if it might be used in future PRs).

fields:
- name: healthy
type: long
description: Overall health of the local server cluster. If all servers are considered healthy by Autopilot, this will be set to 1. If any are unhealthy, this will be 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could convert this to true/false, in consul all metrics are numeric, but we have booleans too 🙂

allowedDetailedValues = map[string]valueHelper{
//"consul.raft.apply": {Value: "raft.apply"},
//"consul.raft.commitTime": {Value: "raft.commit_time", Unit: ".ms"},
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused code?

Copy link
Contributor Author

@sayden sayden Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to leave it, as they are the values that we have removed. Will you be ok if I add a comment like "fields to be added, removed temporarly"?

@jsoriano jsoriano dismissed their stale review February 11, 2019 11:38

Most comments addressed, pending things are not blockers for me.

@ruflin
Copy link
Collaborator

ruflin commented Feb 12, 2019

@sayden Can you rebase this on master? This should fix your CI.

@sayden sayden force-pushed the feature/consul-metricbeat-module branch from 61fd105 to 4da09c7 Compare February 12, 2019 14:55
@sayden
Copy link
Contributor Author

sayden commented Feb 12, 2019

Rebased

@sayden
Copy link
Contributor Author

sayden commented Feb 13, 2019

jenkins, test this

@sayden
Copy link
Contributor Author

sayden commented Feb 13, 2019

Error was something unrelated about ML in Filebeat. Merging

@sayden sayden merged commit edc3d33 into elastic:master Feb 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants