Skip to content

Commit

Permalink
Distributor: Return 529 for ingestion rate limit when serviceOverload…
Browse files Browse the repository at this point in the history
…ErrorEnabled (#6549)

* Distributor: return also 529 for ingestion rate limit when serviceOverloadErrorEnabled

* update the document

* update doc

* Update pkg/util/validation/limits.go

Co-authored-by: Peter Štibraný <pstibrany@gmail.com>

* update docs

---------

Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
  • Loading branch information
ying-jeanne and pstibrany authored Nov 14, 2023
1 parent 4c6f74f commit bb5ef32
Show file tree
Hide file tree
Showing 7 changed files with 15 additions and 10 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
* [ENHANCEMENT] Server: Add `-ingester.client.report-grpc-codes-in-instrumentation-label-enabled` CLI flag to specify whether gRPC status codes should be used in `status_code` label of `cortex_ingester_client_request_duration_seconds` metric. It defaults to false, meaning that successful and erroneous gRPC status codes are represented with `2xx` and `error` respectively. #6562
* [ENHANCEMENT] Server: Add `-server.http-log-closed-connections-without-response-enabled` option to log details about connections to HTTP server that were closed before any data was sent back. This can happen if client doesn't manage to send complete HTTP headers before timeout. #6612
* [ENHANCEMENT] Query-frontend: include length of query, time since the earliest and latest points of a query in "query stats" logs. Time parameters (start/end/time) are always formatted as RFC3339 now. #6473
* [BUGFIX] Distributor: return server overload error in the event of exceeding the ingestion rate limit. #6549
* [BUGFIX] Ring: Ensure network addresses used for component hash rings are formatted correctly when using IPv6. #6068
* [BUGFIX] Query-scheduler: don't retain connections from queriers that have shut down, leading to gradually increasing enqueue latency over time. #6100 #6145
* [BUGFIX] Ingester: prevent query logic from continuing to execute after queries are canceled. #6085
Expand Down
2 changes: 1 addition & 1 deletion cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -3218,7 +3218,7 @@
"kind": "field",
"name": "service_overload_status_code_on_rate_limit_enabled",
"required": false,
"desc": "If enabled, rate limit errors will be reported to the client with HTTP status code 529 (Service is overloaded). If disabled, status code 429 (Too Many Requests) is used.",
"desc": "If enabled, rate limit errors will be reported to the client with HTTP status code 529 (Service is overloaded). If disabled, status code 429 (Too Many Requests) is used. Enabling -distributor.retry-after-header.enabled before utilizing this option is strongly recommended as it helps prevent premature request retries by the client.",
"fieldValue": null,
"fieldDefaultValue": false,
"fieldFlag": "distributor.service-overload-status-code-on-rate-limit-enabled",
Expand Down
2 changes: 1 addition & 1 deletion cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -1212,7 +1212,7 @@ Usage of ./cmd/mimir/mimir:
-distributor.ring.store string
Backend storage to use for the ring. Supported values are: consul, etcd, inmemory, memberlist, multi. (default "memberlist")
-distributor.service-overload-status-code-on-rate-limit-enabled
[experimental] If enabled, rate limit errors will be reported to the client with HTTP status code 529 (Service is overloaded). If disabled, status code 429 (Too Many Requests) is used.
[experimental] If enabled, rate limit errors will be reported to the client with HTTP status code 529 (Service is overloaded). If disabled, status code 429 (Too Many Requests) is used. Enabling -distributor.retry-after-header.enabled before utilizing this option is strongly recommended as it helps prevent premature request retries by the client.
-distributor.write-requests-buffer-pooling-enabled
[experimental] Enable pooling of buffers used for marshaling write requests.
-enable-go-runtime-metrics
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2919,7 +2919,10 @@ The `limits` block configures default and per-tenant limits imposed by component
# (experimental) If enabled, rate limit errors will be reported to the client
# with HTTP status code 529 (Service is overloaded). If disabled, status code
# 429 (Too Many Requests) is used.
# 429 (Too Many Requests) is used. Enabling
# -distributor.retry-after-header.enabled before utilizing this option is
# strongly recommended as it helps prevent premature request retries by the
# client.
# CLI flag: -distributor.service-overload-status-code-on-rate-limit-enabled
[service_overload_status_code_on_rate_limit_enabled: <boolean> | default = false]
Expand Down
7 changes: 1 addition & 6 deletions pkg/distributor/push.go
Original file line number Diff line number Diff line change
Expand Up @@ -160,12 +160,7 @@ func toHTTPStatus(ctx context.Context, pushErr error, limits *validation.Overrid
switch distributorErr.errorCause() {
case mimirpb.BAD_DATA:
return http.StatusBadRequest
case mimirpb.INGESTION_RATE_LIMITED:
// Return a 429 here to tell the client it is going too fast.
// Client may discard the data or slow down and re-send.
// Prometheus v2.26 added a remote-write option 'retry_on_http_429'.
return http.StatusTooManyRequests
case mimirpb.REQUEST_RATE_LIMITED:
case mimirpb.INGESTION_RATE_LIMITED, mimirpb.REQUEST_RATE_LIMITED:
serviceOverloadErrorEnabled := false
userID, err := tenant.TenantID(ctx)
if err == nil {
Expand Down
6 changes: 6 additions & 0 deletions pkg/distributor/push_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -913,6 +913,12 @@ func TestHandler_ToHTTPStatus(t *testing.T) {
expectedHTTPStatus: http.StatusTooManyRequests,
expectedErrorMsg: ingestionRateLimitedErr.Error(),
},
"an ingestionRateLimitedError with serviceOverloadErrorEnabled gets translated into an HTTP 529": {
err: ingestionRateLimitedErr,
serviceOverloadErrorEnabled: true,
expectedHTTPStatus: StatusServiceOverloaded,
expectedErrorMsg: ingestionRateLimitedErr.Error(),
},
"a DoNotLogError of an ingestionRateLimitedError gets translated into an HTTP 429": {
err: middleware.DoNotLogError{Err: ingestionRateLimitedErr},
expectedHTTPStatus: http.StatusTooManyRequests,
Expand Down
2 changes: 1 addition & 1 deletion pkg/util/validation/limits.go
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ func (l *Limits) RegisterFlags(f *flag.FlagSet) {
_ = l.CreationGracePeriod.Set("10m")
f.Var(&l.CreationGracePeriod, CreationGracePeriodFlag, "Controls how far into the future incoming samples and exemplars are accepted compared to the wall clock. Any sample or exemplar will be rejected if its timestamp is greater than '(now + grace_period)'. This configuration is enforced in the distributor, ingester and query-frontend (to avoid querying too far into the future).")
f.BoolVar(&l.EnforceMetadataMetricName, "validation.enforce-metadata-metric-name", true, "Enforce every metadata has a metric name.")
f.BoolVar(&l.ServiceOverloadStatusCodeOnRateLimitEnabled, "distributor.service-overload-status-code-on-rate-limit-enabled", false, "If enabled, rate limit errors will be reported to the client with HTTP status code 529 (Service is overloaded). If disabled, status code 429 (Too Many Requests) is used.")
f.BoolVar(&l.ServiceOverloadStatusCodeOnRateLimitEnabled, "distributor.service-overload-status-code-on-rate-limit-enabled", false, "If enabled, rate limit errors will be reported to the client with HTTP status code 529 (Service is overloaded). If disabled, status code 429 (Too Many Requests) is used. Enabling -distributor.retry-after-header.enabled before utilizing this option is strongly recommended as it helps prevent premature request retries by the client.")

f.IntVar(&l.MaxGlobalSeriesPerUser, MaxSeriesPerUserFlag, 150000, "The maximum number of in-memory series per tenant, across the cluster before replication. 0 to disable.")
f.IntVar(&l.MaxGlobalSeriesPerMetric, MaxSeriesPerMetricFlag, 0, "The maximum number of in-memory series per metric name, across the cluster before replication. 0 to disable.")
Expand Down

0 comments on commit bb5ef32

Please sign in to comment.