Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout option for webhook notifier. #4137

Merged
merged 2 commits into from
Dec 2, 2024

Conversation

stevesg
Copy link
Contributor

@stevesg stevesg commented Nov 27, 2024

We have witnessed cases where POST requests hang, because the receiver has become temporarily unresponsive, leading to failed notification attempts. If the request times out before the dispatcher timeout, because of some aspect of how the HTTP client is configured, then the request is retried, but this does not happen reliably.

To cover all eventualities, it makes sense to have a shorter per-request timeout. I think this timeout needs to be opt-in from the user perspective, as it requires prior knowledge of how fast the webhook should respond, and that it could lead to duplicate notifications.

@stevesg stevesg force-pushed the webhook-timeout branch 3 times, most recently from 86432b0 to 1c13650 Compare November 28, 2024 11:25
Signed-off-by: Steve Simpson <steve@grafana.com>
Copy link
Contributor

@grobinson-grafana grobinson-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One nit and one question 😄

resp, err := notify.PostJSON(ctx, n.client, url, &buf)
if err != nil {
if errors.Is(err, context.DeadlineExceeded) && ctx.Err() != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for ctx.Err() to be nil when the returned error is context.DeadlineExceeded? I don't think so? Isn't the ctx.Err() check redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, it's redundant, but I'm going to do the check on Err instead, in case the call returns something else.

I also tweaked the error message a bit, because I think actually maintaining the original error might be valuable, in case some extra information was added there.

@@ -535,6 +535,9 @@ type WebhookConfig struct {
// Alerts exceeding this threshold will be truncated. Setting this to 0
// allows an unlimited number of alerts.
MaxAlerts uint64 `yaml:"max_alerts" json:"max_alerts"`

// Timeout is the maximum time allowed to invoke the webhook.
Timeout *time.Duration `yaml:"timeout" json:"timeout"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking change this from *time.Duration to just time.Duration. In this case, a zero value will mean infinite timeout. We do the same for MaxAlerts on Line 535 where instead of using a *uint64 we use the zero value to mean no limit.

Signed-off-by: Steve Simpson <steve@grafana.com>
Copy link
Member

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thank you very much for your contribution.

@gotjosh gotjosh merged commit 0f65e8f into prometheus:main Dec 2, 2024
11 checks passed
SuperQ added a commit that referenced this pull request Dec 19, 2024
* [CHANGE] Templating errors in the SNS integration now return an error. #3531 #3879
* [CHANGE] Adopt log/slog, drop go-kit/log #4089
* [FEATURE] Add a new Microsoft Teams integration based on Flows #4024
* [FEATURE] Add a new Rocket.Chat integration #3600
* [FEATURE] Add a new Jira integration #3590 #3931
* [FEATURE] Add support for `GOMEMLIMIT`, enable it via the feature flag `--enable-feature=auto-gomemlimit`. #3895
* [FEATURE] Add support for `GOMAXPROCS`, enable it via the feature flag `--enable-feature=auto-gomaxprocs`. #3837
* [FEATURE] Add support for limits of silences including the maximum number of active and pending silences, and the maximum size per silence (in bytes). You can use the flags `--silences.max-silences` and `--silences.max-silence-size-bytes` to set them accordingly #3852 #3862 #3866 #3885 #3886 #3877
* [FEATURE] Muted alerts now show whether they are suppressed or not in both the `/api/v2/alerts` endpoint and the Alertmanager UI. #3793 #3797 #3792
* [ENHANCEMENT] Add support for `content`, `username` and `avatar_url` in the Discord integration. `content` and `username` also support templating. #4007
* [ENHANCEMENT] Only invalidate the silences cache if a new silence is created or an existing silence replaced - should improve latency on both `GET api/v2/alerts` and `POST api/v2/alerts` API endpoint. #3961
* [ENHANCEMENT] Add image source label to Dockerfile. To get changelogs shown when using Renovate #4062
* [ENHANCEMENT] Build using go 1.23 #4071
* [ENHANCEMENT] Support setting a global SMTP TLS configuration. #3732
* [ENHANCEMENT] The setting `room_id` in the WebEx integration can now be templated to allow for dynamic room IDs. #3801
* [ENHANCEMENT] Enable setting `message_thread_id` for the Telegram integration. #3638
* [ENHANCEMENT] Support the `since` and `humanizeDuration` functions to templates. This means users can now format time to more human-readable text. #3863
* [ENHANCEMENT] Support the `date` and `tz` functions to templates. This means users can now format time in a specified format and also change the timezone to their specific locale. #3812
* [ENHANCEMENT] Latency metrics now support native histograms. #3737
* [ENHANCEMENT] Add timeout option for webhook notifier. #4137
* [BUGFIX] Fix the SMTP integration not correctly closing an SMTP submission, which may lead to unsuccessful dispatches being marked as successful. #4006
* [BUGFIX]  The `ParseMode` option is now set explicitly in the Telegram integration. If we don't HTML tags had not been parsed by default. #4027
* [BUGFIX] Fix a memory leak that was caused by updates silences continuously. #3930
* [BUGFIX] Fix hiding secret URLs when the URL is incorrect. #3887
* [BUGFIX] Fix a race condition in the alerts - it was more of a hypothetical race condition that could have occurred in the alert reception pipeline. #3648
* [BUGFIX] Fix a race condition in the alert delivery pipeline that would cause a firing alert that was delivered earlier to be deleted from the aggregation group when instead it should have been delivered again. #3826
* [BUGFIX] Fix version in APIv1 deprecation notice. #3815
* [BUGFIX] Fix crash errors when using `url_file` in the Webhook integration. #3800
* [BUGFIX] fix `Route.ID()` returns conflicting IDs. #3803
* [BUGFIX] Fix deadlock on the alerts memory store. #3715
* [BUGFIX] Fix `amtool template render` when using the default values. #3725
* [BUGFIX] Fix `webhook_url_file` for both the Discord and Microsoft Teams integrations. #3728 #3745
* [BUGFIX] Fix wechat api link #4084
* [BUGFIX] Fix build info metric #4166

Signed-off-by: SuperQ <superq@gmail.com>
@SuperQ SuperQ mentioned this pull request Dec 19, 2024
SuperQ added a commit that referenced this pull request Jan 15, 2025
* [CHANGE] Templating errors in the SNS integration now return an error. #3531 #3879
* [CHANGE] Adopt log/slog, drop go-kit/log #4089
* [FEATURE] Add a new Microsoft Teams integration based on Flows #4024
* [FEATURE] Add a new Rocket.Chat integration #3600
* [FEATURE] Add a new Jira integration #3590 #3931
* [FEATURE] Add support for `GOMEMLIMIT`, enable it via the feature flag `--enable-feature=auto-gomemlimit`. #3895
* [FEATURE] Add support for `GOMAXPROCS`, enable it via the feature flag `--enable-feature=auto-gomaxprocs`. #3837
* [FEATURE] Add support for limits of silences including the maximum number of active and pending silences, and the maximum size per silence (in bytes). You can use the flags `--silences.max-silences` and `--silences.max-silence-size-bytes` to set them accordingly #3852 #3862 #3866 #3885 #3886 #3877
* [FEATURE] Muted alerts now show whether they are suppressed or not in both the `/api/v2/alerts` endpoint and the Alertmanager UI. #3793 #3797 #3792
* [ENHANCEMENT] Add support for `content`, `username` and `avatar_url` in the Discord integration. `content` and `username` also support templating. #4007
* [ENHANCEMENT] Only invalidate the silences cache if a new silence is created or an existing silence replaced - should improve latency on both `GET api/v2/alerts` and `POST api/v2/alerts` API endpoint. #3961
* [ENHANCEMENT] Add image source label to Dockerfile. To get changelogs shown when using Renovate #4062
* [ENHANCEMENT] Build using go 1.23 #4071
* [ENHANCEMENT] Support setting a global SMTP TLS configuration. #3732
* [ENHANCEMENT] The setting `room_id` in the WebEx integration can now be templated to allow for dynamic room IDs. #3801
* [ENHANCEMENT] Enable setting `message_thread_id` for the Telegram integration. #3638
* [ENHANCEMENT] Support the `since` and `humanizeDuration` functions to templates. This means users can now format time to more human-readable text. #3863
* [ENHANCEMENT] Support the `date` and `tz` functions to templates. This means users can now format time in a specified format and also change the timezone to their specific locale. #3812
* [ENHANCEMENT] Latency metrics now support native histograms. #3737
* [ENHANCEMENT] Add full width to adaptive card for msteamsv2 #4135
* [ENHANCEMENT] Add timeout option for webhook notifier. #4137
* [ENHANCEMENT] Update config to allow showing secret values when marshaled #4158
* [ENHANCEMENT] Enable templating for Jira project and issue_type #4159
* [BUGFIX] Fix the SMTP integration not correctly closing an SMTP submission, which may lead to unsuccessful dispatches being marked as successful. #4006
* [BUGFIX]  The `ParseMode` option is now set explicitly in the Telegram integration. If we don't HTML tags had not been parsed by default. #4027
* [BUGFIX] Fix a memory leak that was caused by updates silences continuously. #3930
* [BUGFIX] Fix hiding secret URLs when the URL is incorrect. #3887
* [BUGFIX] Fix a race condition in the alerts - it was more of a hypothetical race condition that could have occurred in the alert reception pipeline. #3648
* [BUGFIX] Fix a race condition in the alert delivery pipeline that would cause a firing alert that was delivered earlier to be deleted from the aggregation group when instead it should have been delivered again. #3826
* [BUGFIX] Fix version in APIv1 deprecation notice. #3815
* [BUGFIX] Fix crash errors when using `url_file` in the Webhook integration. #3800
* [BUGFIX] fix `Route.ID()` returns conflicting IDs. #3803
* [BUGFIX] Fix deadlock on the alerts memory store. #3715
* [BUGFIX] Fix `amtool template render` when using the default values. #3725
* [BUGFIX] Fix `webhook_url_file` for both the Discord and Microsoft Teams integrations. #3728 #3745
* [BUGFIX] Fix wechat api link #4084
* [BUGFIX] Fix build info metric #4166
* [BUGFIX] Fix UTF-8 not allowed in Equal field for inhibition rules #4177

Signed-off-by: SuperQ <superq@gmail.com>
SuperQ added a commit that referenced this pull request Jan 15, 2025
* [CHANGE] Templating errors in the SNS integration now return an error. #3531 #3879
* [CHANGE] Adopt log/slog, drop go-kit/log #4089
* [FEATURE] Add a new Microsoft Teams integration based on Flows #4024
* [FEATURE] Add a new Rocket.Chat integration #3600
* [FEATURE] Add a new Jira integration #3590 #3931
* [FEATURE] Add support for `GOMEMLIMIT`, enable it via the feature flag `--enable-feature=auto-gomemlimit`. #3895
* [FEATURE] Add support for `GOMAXPROCS`, enable it via the feature flag `--enable-feature=auto-gomaxprocs`. #3837
* [FEATURE] Add support for limits of silences including the maximum number of active and pending silences, and the maximum size per silence (in bytes). You can use the flags `--silences.max-silences` and `--silences.max-silence-size-bytes` to set them accordingly #3852 #3862 #3866 #3885 #3886 #3877
* [FEATURE] Muted alerts now show whether they are suppressed or not in both the `/api/v2/alerts` endpoint and the Alertmanager UI. #3793 #3797 #3792
* [ENHANCEMENT] Add support for `content`, `username` and `avatar_url` in the Discord integration. `content` and `username` also support templating. #4007
* [ENHANCEMENT] Only invalidate the silences cache if a new silence is created or an existing silence replaced - should improve latency on both `GET api/v2/alerts` and `POST api/v2/alerts` API endpoint. #3961
* [ENHANCEMENT] Add image source label to Dockerfile. To get changelogs shown when using Renovate #4062
* [ENHANCEMENT] Build using go 1.23 #4071
* [ENHANCEMENT] Support setting a global SMTP TLS configuration. #3732
* [ENHANCEMENT] The setting `room_id` in the WebEx integration can now be templated to allow for dynamic room IDs. #3801
* [ENHANCEMENT] Enable setting `message_thread_id` for the Telegram integration. #3638
* [ENHANCEMENT] Support the `since` and `humanizeDuration` functions to templates. This means users can now format time to more human-readable text. #3863
* [ENHANCEMENT] Support the `date` and `tz` functions to templates. This means users can now format time in a specified format and also change the timezone to their specific locale. #3812
* [ENHANCEMENT] Latency metrics now support native histograms. #3737
* [ENHANCEMENT] Add full width to adaptive card for msteamsv2 #4135
* [ENHANCEMENT] Add timeout option for webhook notifier. #4137
* [ENHANCEMENT] Update config to allow showing secret values when marshaled #4158
* [ENHANCEMENT] Enable templating for Jira project and issue_type #4159
* [BUGFIX] Fix the SMTP integration not correctly closing an SMTP submission, which may lead to unsuccessful dispatches being marked as successful. #4006
* [BUGFIX]  The `ParseMode` option is now set explicitly in the Telegram integration. If we don't HTML tags had not been parsed by default. #4027
* [BUGFIX] Fix a memory leak that was caused by updates silences continuously. #3930
* [BUGFIX] Fix hiding secret URLs when the URL is incorrect. #3887
* [BUGFIX] Fix a race condition in the alerts - it was more of a hypothetical race condition that could have occurred in the alert reception pipeline. #3648
* [BUGFIX] Fix a race condition in the alert delivery pipeline that would cause a firing alert that was delivered earlier to be deleted from the aggregation group when instead it should have been delivered again. #3826
* [BUGFIX] Fix version in APIv1 deprecation notice. #3815
* [BUGFIX] Fix crash errors when using `url_file` in the Webhook integration. #3800
* [BUGFIX] fix `Route.ID()` returns conflicting IDs. #3803
* [BUGFIX] Fix deadlock on the alerts memory store. #3715
* [BUGFIX] Fix `amtool template render` when using the default values. #3725
* [BUGFIX] Fix `webhook_url_file` for both the Discord and Microsoft Teams integrations. #3728 #3745
* [BUGFIX] Fix wechat api link #4084
* [BUGFIX] Fix build info metric #4166
* [BUGFIX] Fix UTF-8 not allowed in Equal field for inhibition rules #4177

Signed-off-by: SuperQ <superq@gmail.com>
SuperQ added a commit that referenced this pull request Jan 15, 2025
* [CHANGE] Templating errors in the SNS integration now return an error. #3531 #3879
* [CHANGE] Adopt log/slog, drop go-kit/log #4089
* [FEATURE] Add a new Microsoft Teams integration based on Flows #4024
* [FEATURE] Add a new Rocket.Chat integration #3600
* [FEATURE] Add a new Jira integration #3590 #3931
* [FEATURE] Add support for `GOMEMLIMIT`, enable it via the feature flag `--enable-feature=auto-gomemlimit`. #3895
* [FEATURE] Add support for `GOMAXPROCS`, enable it via the feature flag `--enable-feature=auto-gomaxprocs`. #3837
* [FEATURE] Add support for limits of silences including the maximum number of active and pending silences, and the maximum size per silence (in bytes). You can use the flags `--silences.max-silences` and `--silences.max-silence-size-bytes` to set them accordingly #3852 #3862 #3866 #3885 #3886 #3877
* [FEATURE] Muted alerts now show whether they are suppressed or not in both the `/api/v2/alerts` endpoint and the Alertmanager UI. #3793 #3797 #3792
* [ENHANCEMENT] Add support for `content`, `username` and `avatar_url` in the Discord integration. `content` and `username` also support templating. #4007
* [ENHANCEMENT] Only invalidate the silences cache if a new silence is created or an existing silence replaced - should improve latency on both `GET api/v2/alerts` and `POST api/v2/alerts` API endpoint. #3961
* [ENHANCEMENT] Add image source label to Dockerfile. To get changelogs shown when using Renovate #4062
* [ENHANCEMENT] Build using go 1.23 #4071
* [ENHANCEMENT] Support setting a global SMTP TLS configuration. #3732
* [ENHANCEMENT] The setting `room_id` in the WebEx integration can now be templated to allow for dynamic room IDs. #3801
* [ENHANCEMENT] Enable setting `message_thread_id` for the Telegram integration. #3638
* [ENHANCEMENT] Support the `since` and `humanizeDuration` functions to templates. This means users can now format time to more human-readable text. #3863
* [ENHANCEMENT] Support the `date` and `tz` functions to templates. This means users can now format time in a specified format and also change the timezone to their specific locale. #3812
* [ENHANCEMENT] Latency metrics now support native histograms. #3737
* [ENHANCEMENT] Add full width to adaptive card for msteamsv2 #4135
* [ENHANCEMENT] Add timeout option for webhook notifier. #4137
* [ENHANCEMENT] Update config to allow showing secret values when marshaled #4158
* [ENHANCEMENT] Enable templating for Jira project and issue_type #4159
* [BUGFIX] Fix the SMTP integration not correctly closing an SMTP submission, which may lead to unsuccessful dispatches being marked as successful. #4006
* [BUGFIX]  The `ParseMode` option is now set explicitly in the Telegram integration. If we don't HTML tags had not been parsed by default. #4027
* [BUGFIX] Fix a memory leak that was caused by updates silences continuously. #3930
* [BUGFIX] Fix hiding secret URLs when the URL is incorrect. #3887
* [BUGFIX] Fix a race condition in the alerts - it was more of a hypothetical race condition that could have occurred in the alert reception pipeline. #3648
* [BUGFIX] Fix a race condition in the alert delivery pipeline that would cause a firing alert that was delivered earlier to be deleted from the aggregation group when instead it should have been delivered again. #3826
* [BUGFIX] Fix version in APIv1 deprecation notice. #3815
* [BUGFIX] Fix crash errors when using `url_file` in the Webhook integration. #3800
* [BUGFIX] fix `Route.ID()` returns conflicting IDs. #3803
* [BUGFIX] Fix deadlock on the alerts memory store. #3715
* [BUGFIX] Fix `amtool template render` when using the default values. #3725
* [BUGFIX] Fix `webhook_url_file` for both the Discord and Microsoft Teams integrations. #3728 #3745
* [BUGFIX] Fix wechat api link #4084
* [BUGFIX] Fix build info metric #4166
* [BUGFIX] Fix UTF-8 not allowed in Equal field for inhibition rules #4177

Signed-off-by: SuperQ <superq@gmail.com>
SuperQ added a commit that referenced this pull request Jan 15, 2025
* [CHANGE] Templating errors in the SNS integration now return an error. #3531 #3879
* [CHANGE] Adopt log/slog, drop go-kit/log #4089
* [FEATURE] Add a new Microsoft Teams integration based on Flows #4024
* [FEATURE] Add a new Rocket.Chat integration #3600
* [FEATURE] Add a new Jira integration #3590 #3931
* [FEATURE] Add support for `GOMEMLIMIT`, enable it via the feature flag `--enable-feature=auto-gomemlimit`. #3895
* [FEATURE] Add support for `GOMAXPROCS`, enable it via the feature flag `--enable-feature=auto-gomaxprocs`. #3837
* [FEATURE] Add support for limits of silences including the maximum number of active and pending silences, and the maximum size per silence (in bytes). You can use the flags `--silences.max-silences` and `--silences.max-silence-size-bytes` to set them accordingly #3852 #3862 #3866 #3885 #3886 #3877
* [FEATURE] Muted alerts now show whether they are suppressed or not in both the `/api/v2/alerts` endpoint and the Alertmanager UI. #3793 #3797 #3792
* [ENHANCEMENT] Add support for `content`, `username` and `avatar_url` in the Discord integration. `content` and `username` also support templating. #4007
* [ENHANCEMENT] Only invalidate the silences cache if a new silence is created or an existing silence replaced - should improve latency on both `GET api/v2/alerts` and `POST api/v2/alerts` API endpoint. #3961
* [ENHANCEMENT] Add image source label to Dockerfile. To get changelogs shown when using Renovate #4062
* [ENHANCEMENT] Build using go 1.23 #4071
* [ENHANCEMENT] Support setting a global SMTP TLS configuration. #3732
* [ENHANCEMENT] The setting `room_id` in the WebEx integration can now be templated to allow for dynamic room IDs. #3801
* [ENHANCEMENT] Enable setting `message_thread_id` for the Telegram integration. #3638
* [ENHANCEMENT] Support the `since` and `humanizeDuration` functions to templates. This means users can now format time to more human-readable text. #3863
* [ENHANCEMENT] Support the `date` and `tz` functions to templates. This means users can now format time in a specified format and also change the timezone to their specific locale. #3812
* [ENHANCEMENT] Latency metrics now support native histograms. #3737
* [ENHANCEMENT] Add full width to adaptive card for msteamsv2 #4135
* [ENHANCEMENT] Add timeout option for webhook notifier. #4137
* [ENHANCEMENT] Update config to allow showing secret values when marshaled #4158
* [ENHANCEMENT] Enable templating for Jira project and issue_type #4159
* [BUGFIX] Fix the SMTP integration not correctly closing an SMTP submission, which may lead to unsuccessful dispatches being marked as successful. #4006
* [BUGFIX]  The `ParseMode` option is now set explicitly in the Telegram integration. If we don't HTML tags had not been parsed by default. #4027
* [BUGFIX] Fix a memory leak that was caused by updates silences continuously. #3930
* [BUGFIX] Fix hiding secret URLs when the URL is incorrect. #3887
* [BUGFIX] Fix a race condition in the alerts - it was more of a hypothetical race condition that could have occurred in the alert reception pipeline. #3648
* [BUGFIX] Fix a race condition in the alert delivery pipeline that would cause a firing alert that was delivered earlier to be deleted from the aggregation group when instead it should have been delivered again. #3826
* [BUGFIX] Fix version in APIv1 deprecation notice. #3815
* [BUGFIX] Fix crash errors when using `url_file` in the Webhook integration. #3800
* [BUGFIX] fix `Route.ID()` returns conflicting IDs. #3803
* [BUGFIX] Fix deadlock on the alerts memory store. #3715
* [BUGFIX] Fix `amtool template render` when using the default values. #3725
* [BUGFIX] Fix `webhook_url_file` for both the Discord and Microsoft Teams integrations. #3728 #3745
* [BUGFIX] Fix wechat api link #4084
* [BUGFIX] Fix build info metric #4166
* [BUGFIX] Fix UTF-8 not allowed in Equal field for inhibition rules #4177

Signed-off-by: SuperQ <superq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants