Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add feature to configurate promhttp error handling #411

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,22 +78,23 @@ If you are still using the legacy [Access scopes][access-scopes], the `https://w

| Flag | Required | Default | Description |
| ----------------------------------- | -------- |---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `google.project-ids` | No | GCloud SDK auto-discovery | Repeatable flag of Google Project IDs |
| `google.projects.filter` | No | | GCloud projects filter expression. See more [here](https://cloud.google.com/sdk/gcloud/reference/projects/list). |
| `google.project-ids` | No | GCloud SDK auto-discovery | Repeatable flag of Google Project IDs |
| `google.projects.filter` | No | | GCloud projects filter expression. See more [here](https://cloud.google.com/sdk/gcloud/reference/projects/list). |
| `monitoring.metrics-ingest-delay` | No | | Offsets metric collection by a delay appropriate for each metric type, e.g. because bigquery metrics are slow to appear |
| `monitoring.drop-delegated-projects` | No | No | Drop metrics from attached projects and fetch `project_id` only. |
| `monitoring.metrics-prefixes` | Yes | | Repeatable flag of Google Stackdriver Monitoring Metric Type prefixes (see [example][metrics-prefix-example] and [available metrics][metrics-list]) |
| `monitoring.metrics-prefixes` | Yes | | Repeatable flag of Google Stackdriver Monitoring Metric Type prefixes (see [example][metrics-prefix-example] and [available metrics][metrics-list]) |
| `monitoring.metrics-interval` | No | `5m` | Metric's timestamp interval to request from the Google Stackdriver Monitoring Metrics API. Only the most recent data point is used |
| `monitoring.metrics-offset` | No | `0s` | Offset (into the past) for the metric's timestamp interval to request from the Google Stackdriver Monitoring Metrics API, to handle latency in published metrics |
| `monitoring.filters` | No | | Additonal filters to be sent on the Monitoring API call. Add multiple filters by providing this parameter multiple times. See [monitoring.filters](#using-filters) for more info. |
| `monitoring.filters` | No | | Additonal filters to be sent on the Monitoring API call. Add multiple filters by providing this parameter multiple times. See [monitoring.filters](#using-filters) for more info. |
| `monitoring.aggregate-deltas` | No | | If enabled will treat all DELTA metrics as an in-memory counter instead of a gauge. Be sure to read [what to know about aggregating DELTA metrics](#what-to-know-about-aggregating-delta-metrics) |
| `monitoring.aggregate-deltas-ttl` | No | `30m` | How long should a delta metric continue to be exported and stored after GCP stops producing it. Read [slow moving metrics](#slow-moving-metrics) to understand the problem this attempts to solve |
| `monitoring.descriptor-cache-ttl` | No | `0s` | How long should the metric descriptors for a prefixed be cached for |
| `promhttp.error-handling` | No | `httpErrorOnError` | Defines how errors are handled by promhttp.Handler while serving metrics. Possible values: `httpErrorOnError`, `continueOnError`, `panicOnError` are mapped to [available options][promhttp-error-handling-opts] |
| `stackdriver.max-retries` | No | `0` | Max number of retries that should be attempted on 503 errors from stackdriver. |
| `stackdriver.http-timeout` | No | `10s` | How long should stackdriver_exporter wait for a result from the Stackdriver API. |
| `stackdriver.http-timeout` | No | `10s` | How long should stackdriver_exporter wait for a result from the Stackdriver API. |
| `stackdriver.max-backoff=` | No | | Max time between each request in an exp backoff scenario. |
| `stackdriver.backoff-jitter` | No | `1s` | The amount of jitter to introduce in a exp backoff scenario. |
| `stackdriver.retry-statuses` | No | `503` | The HTTP statuses that should trigger a retry. |
| `stackdriver.backoff-jitter` | No | `1s` | The amount of jitter to introduce in a exp backoff scenario. |
| `stackdriver.retry-statuses` | No | `503` | The HTTP statuses that should trigger a retry. |
| `web.config.file` | No | | [EXPERIMENTAL] Path to configuration file that can enable TLS or authentication. |
| `web.listen-address` | No | `:9255` | Address to listen on for web interface and telemetry Repeatable for multiple addresses. |
| `web.systemd-socket` | No | | Use systemd socket activation listeners instead of port listeners (Linux only). |
Expand Down Expand Up @@ -247,4 +248,5 @@ Apache License 2.0, see [LICENSE][license].
[monitored-resources]: https://cloud.google.com/monitoring/api/resources
[prometheus]: https://prometheus.io/
[prometheus-boshrelease]: https://github.com/cloudfoundry-community/prometheus-boshrelease
[promhttp-error-handling-opts]: https://github.com/prometheus/client_golang/blob/main/prometheus/promhttp/http.go#L323
[stackdriver]: https://cloud.google.com/monitoring/
19 changes: 18 additions & 1 deletion stackdriver_exporter.go
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,10 @@ var (
monitoringDescriptorCacheOnlyGoogle = kingpin.Flag(
"monitoring.descriptor-cache-only-google", "Only cache descriptors for *.googleapis.com metrics",
).Default("true").Bool()

promHttpErrorHandling = kingpin.Flag(
"promhttp.error-handling", "Defines how errors are handled by promhttp.Handler while serving metrics",
).Default("httpErrorOnError").Enum("httpErrorOnError", "continueOnError", "panicOnError")
)

func init() {
Expand Down Expand Up @@ -277,7 +281,10 @@ func (h *handler) innerHandler(filters map[string]bool) http.Handler {
registry,
}
}
opts := promhttp.HandlerOpts{ErrorLog: slog.NewLogLogger(h.logger.Handler(), slog.LevelError)}
opts := promhttp.HandlerOpts{
ErrorLog: slog.NewLogLogger(h.logger.Handler(), slog.LevelError),
ErrorHandling: getPromHttpErrorHandlingOpt(*promHttpErrorHandling),
}
// Delegate http serving to Prometheus client library, which will call collector.Collect.
return promhttp.HandlerFor(gatherers, opts)
}
Expand Down Expand Up @@ -464,3 +471,13 @@ func parseMetricExtraFilters() []collectors.MetricFilter {
}
return extraFilters
}

func getPromHttpErrorHandlingOpt(flagOpt string) promhttp.HandlerErrorHandling {
if flagOpt == "continueOnError" {
return promhttp.ContinueOnError
}
if flagOpt == "panicOnError" {
return promhttp.PanicOnError
}
return promhttp.HTTPErrorOnError
}