Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support metric_relabel_configs in distributor #3329

Merged
merged 6 commits into from
Oct 26, 2020

Conversation

roidelapluie
Copy link
Contributor

@roidelapluie roidelapluie commented Oct 12, 2020

Signed-off-by: Julien Pivotto roidelapluie@inuits.eu

What this PR does:
Support metric_relabel_configs in distributor

Which issue(s) this PR fixes:
Fixes #1507

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳, I am excited for this!

Do you think it is worthwhile to allow this to be specified per user rather than just globally? I imagine a common use case will be to stop very high cardinality or high churn data from a specific tenant without disrupting data for all tenants.

CHANGELOG.md Outdated
@@ -49,6 +49,7 @@
* [CHANGE] Increased default `-<prefix>.redis.timeout` from `100ms` to `500ms`. #3301
* [FEATURE] Added support for shuffle-sharding queriers in the query-frontend. When configured (`-frontend.max-queriers-per-tenant` globally, or using per-tenant limit `max_queriers_per_tenant`), each tenants's requests will be handled by different set of queriers. #3113 #3257
* [FEATURE] Query-frontend: added `compression` config to support results cache with compression. #3217
* [FEATURE] Support `metric_relabel_confis` in distributor. #3329

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configs instead of confis (just a simples typo) ;)

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the configuration into Overrides? Then it can be per-user, and reloaded when changed on disk, without restarting Cortex. WDYT?

@@ -170,6 +171,9 @@ type Config struct {
// when true the distributor does not validate the label name, Cortex doesn't directly use
// this (and should never use it) but this feature is used by other projects built on top of it
SkipLabelNameValidation bool `yaml:"-"`

// Metrics relabeling
MetricRelabelConfigs []*relabel.Config `yaml:"metric_relabel_configs,omitempty" doc:"nocli|description=List of metric relabel configurations. Note that in most situations, it is more effective to use metrics relabeling directly in the Prometheus server, e.g. remote_write.write_relabel_configs."`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note that this is global, for ALL tenants.

@roidelapluie
Copy link
Contributor Author

I was already thinking about #3318 (comment) when I set this up. WDYT? Should we have both?

@pstibrany
Copy link
Contributor

I was already thinking about #3318 (comment) when I set this up. WDYT? Should we have both?

If by "both" you refer to global and per-tenant config, then Overrides / Limits already support that, and yes, I think it makes sense to have both. For example in cloud offering, we wouldn't use global one, but could use per-tenant config in specific cases. On the other hand, some people running Cortex internally may only need global one.

@bboreham
Copy link
Contributor

Thanks! Any idea what the CPU/memory overhead is?

@roidelapluie
Copy link
Contributor Author

That depends on the relabeling rules..

@roidelapluie
Copy link
Contributor Author

I have moved it to limits

Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -455,6 +456,11 @@ func (d *Distributor) Push(ctx context.Context, req *client.WriteRequest) (*clie
removeLabel(labelName, &ts.Labels)
}

if mrc := d.limits.MetricRelabelConfigs(userID); len(mrc) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is there any instrumentation or debug logging that could be added that would be helpful for operators/users when working with this feature?

Copy link
Contributor Author

@roidelapluie roidelapluie Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the metrics distributor_received_samples_total and distributor_samples_in_total cover that already.

CHANGELOG.md Outdated
@@ -49,6 +49,7 @@
* [CHANGE] Increased default `-<prefix>.redis.timeout` from `100ms` to `500ms`. #3301
* [FEATURE] Added support for shuffle-sharding queriers in the query-frontend. When configured (`-frontend.max-queriers-per-tenant` globally, or using per-tenant limit `max_queriers_per_tenant`), each tenants's requests will be handled by different set of queriers. #3113 #3257
* [FEATURE] Query-frontend: added `compression` config to support results cache with compression. #3217
* [FEATURE] Support `metric_relabel_configs` in distributor. #3329
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [FEATURE] Support `metric_relabel_configs` in distributor. #3329
* [FEATURE] Added the support for applying a Prometheus metrics relabel config on series received by the distributor. The `metric_relabel_configs` field was added to the limits config. #3329

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I updated my comment

@pstibrany
Copy link
Contributor

Thank you, this looks very good! Would it be possible to add small test showing parsing of relabeling YAML rules into per-user limits?

@roidelapluie
Copy link
Contributor Author

Thank you, this looks very good! Would it be possible to add small test showing parsing of relabeling YAML rules into per-user limits?

Do we have tests like this already?

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
@pstibrany
Copy link
Contributor

Thank you, this looks very good! Would it be possible to add small test showing parsing of relabeling YAML rules into per-user limits?

Do we have tests like this already?

Until now structures we've put into overrides YAML file were pretty basic. I can only see TestLimitsLoadingFromYaml. Something similar in size but with relabel config to make sure the wiring is all correct would be good to have. WDYT?

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
@roidelapluie
Copy link
Contributor Author

Done :)

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we mark it experimental in docs/configuration/v1-guarantees.md, please?

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
@roidelapluie
Copy link
Contributor Author

done

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this and sorry for the super late review (I would like days of 48h 😢 ). Looks flawless to me. I just left a nit, definitely not a blocker.

@pracucci pracucci requested a review from pstibrany October 26, 2020 18:14
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot Julien!

Co-authored-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@pstibrany pstibrany merged commit 2abac33 into cortexproject:master Oct 26, 2020
@bboreham
Copy link
Contributor

I just used this feature in production to remove a high-cardinality metric; worked super well!

Even better if we had a metric to track how many samples got dropped this way - right now they disappear from cortex_distributor_received_samples_total and don't appear in cortex_discarded_samples_total.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider allowing metric relabelling (or just "drop") during ingest
7 participants