-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support metric_relabel_configs in distributor #3329
Conversation
8f8ee23
to
b2310ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳, I am excited for this!
Do you think it is worthwhile to allow this to be specified per user rather than just globally? I imagine a common use case will be to stop very high cardinality or high churn data from a specific tenant without disrupting data for all tenants.
CHANGELOG.md
Outdated
@@ -49,6 +49,7 @@ | |||
* [CHANGE] Increased default `-<prefix>.redis.timeout` from `100ms` to `500ms`. #3301 | |||
* [FEATURE] Added support for shuffle-sharding queriers in the query-frontend. When configured (`-frontend.max-queriers-per-tenant` globally, or using per-tenant limit `max_queriers_per_tenant`), each tenants's requests will be handled by different set of queriers. #3113 #3257 | |||
* [FEATURE] Query-frontend: added `compression` config to support results cache with compression. #3217 | |||
* [FEATURE] Support `metric_relabel_confis` in distributor. #3329 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
configs instead of confis (just a simples typo) ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the configuration into Overrides? Then it can be per-user, and reloaded when changed on disk, without restarting Cortex. WDYT?
pkg/distributor/distributor.go
Outdated
@@ -170,6 +171,9 @@ type Config struct { | |||
// when true the distributor does not validate the label name, Cortex doesn't directly use | |||
// this (and should never use it) but this feature is used by other projects built on top of it | |||
SkipLabelNameValidation bool `yaml:"-"` | |||
|
|||
// Metrics relabeling | |||
MetricRelabelConfigs []*relabel.Config `yaml:"metric_relabel_configs,omitempty" doc:"nocli|description=List of metric relabel configurations. Note that in most situations, it is more effective to use metrics relabeling directly in the Prometheus server, e.g. remote_write.write_relabel_configs."` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should note that this is global, for ALL tenants.
I was already thinking about #3318 (comment) when I set this up. WDYT? Should we have both? |
If by "both" you refer to global and per-tenant config, then Overrides / Limits already support that, and yes, I think it makes sense to have both. For example in cloud offering, we wouldn't use global one, but could use per-tenant config in specific cases. On the other hand, some people running Cortex internally may only need global one. |
Thanks! Any idea what the CPU/memory overhead is? |
That depends on the relabeling rules.. |
I have moved it to limits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
pkg/distributor/distributor.go
Outdated
@@ -455,6 +456,11 @@ func (d *Distributor) Push(ctx context.Context, req *client.WriteRequest) (*clie | |||
removeLabel(labelName, &ts.Labels) | |||
} | |||
|
|||
if mrc := d.limits.MetricRelabelConfigs(userID); len(mrc) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Is there any instrumentation or debug logging that could be added that would be helpful for operators/users when working with this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO the metrics distributor_received_samples_total and distributor_samples_in_total cover that already.
CHANGELOG.md
Outdated
@@ -49,6 +49,7 @@ | |||
* [CHANGE] Increased default `-<prefix>.redis.timeout` from `100ms` to `500ms`. #3301 | |||
* [FEATURE] Added support for shuffle-sharding queriers in the query-frontend. When configured (`-frontend.max-queriers-per-tenant` globally, or using per-tenant limit `max_queriers_per_tenant`), each tenants's requests will be handled by different set of queriers. #3113 #3257 | |||
* [FEATURE] Query-frontend: added `compression` config to support results cache with compression. #3217 | |||
* [FEATURE] Support `metric_relabel_configs` in distributor. #3329 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* [FEATURE] Support `metric_relabel_configs` in distributor. #3329 | |
* [FEATURE] Added the support for applying a Prometheus metrics relabel config on series received by the distributor. The `metric_relabel_configs` field was added to the limits config. #3329 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I updated my comment
Thank you, this looks very good! Would it be possible to add small test showing parsing of relabeling YAML rules into per-user limits? |
Do we have tests like this already? |
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
96095ee
to
c0ccea9
Compare
Until now structures we've put into overrides YAML file were pretty basic. I can only see |
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Done :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we mark it experimental in docs/configuration/v1-guarantees.md
, please?
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this and sorry for the super late review (I would like days of 48h 😢 ). Looks flawless to me. I just left a nit, definitely not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot Julien!
Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
7939d3b
to
199e07f
Compare
I just used this feature in production to remove a high-cardinality metric; worked super well! Even better if we had a metric to track how many samples got dropped this way - right now they disappear from |
Signed-off-by: Julien Pivotto roidelapluie@inuits.eu
What this PR does:
Support metric_relabel_configs in distributor
Which issue(s) this PR fixes:
Fixes #1507
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]