Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all receivers present in /api/v2/alerts/groups responses #1959

Closed
prymitive opened this issue Jul 11, 2019 · 7 comments · Fixed by #1964
Closed

Not all receivers present in /api/v2/alerts/groups responses #1959

prymitive opened this issue Jul 11, 2019 · 7 comments · Fixed by #1964

Comments

@prymitive
Copy link
Contributor

prymitive commented Jul 11, 2019

What did you do?

Query /api/v2/alerts/groups responses continuesly

What did you expect to see?

Based on my configuration alerts should go to all receivers, so groups should be generated for each receiver as well. Alerts are tagged with all configured receivers but groups for some receivers are missing.

What did you see instead? Under which circumstances?

With the configuration as below:

Configuration
global:
  resolve_timeout: 30s
route:
  group_by: ["alertname"]
  group_wait: 5s
  group_interval: 10s
  repeat_interval: 999h
  receiver: "default"
  routes:
    - receiver: "default"
      group_by: []
      match_re:
        alertname: .*
      continue: true
    - receiver: "by-cluster-service"
      group_by: ["alertname", "cluster", "service"]
      match_re:
        alertname: .*
      continue: true
    - receiver: "by-name"
      group_by: [alertname]
      match_re:
        alertname: .*
      continue: true
    - receiver: "by-cluster"
      group_by: [cluster]
      match_re:
        alertname: .*
      continue: true

inhibit_rules:
  - source_match:
      severity: "critical"
    target_match:
      severity: "warning"
    # Apply inhibition if the alertname and cluster is the same in both
    equal: ["alertname", "cluster"]

receivers:
  - name: "default"
  - name: "by-cluster-service"
  - name: "by-name"
  - name: "by-cluster"

When I query api/v2/alerts/groups I will get alert groups for either default or by-name receiver, but never both. Responses keep switching between one or the other.
See captured respones with only Receiver field from the group object and from alert objects.
1.txt
2.txt
One shows default receiver groups present, the other by-name receiver groups.

Environment

  • System information:

Linux 5.1.16-300.fc30.x86_64 x86_64

  • Alertmanager version:
Version Information
Branch: HEAD
BuildDate: 20190708-14:31:49
BuildUser: root@868685ed3ed0
GoVersion: go1.12.6
Revision: 1ace0f76b7101cccc149d7298022df36039858ca
Version: 0.18.0

Same behaviour observed with 0.17.0

  • Prometheus version:

N/A

  • Alertmanager configuration file:
Configuration
global:
  resolve_timeout: 30s
route:
  group_by: ["alertname"]
  group_wait: 5s
  group_interval: 10s
  repeat_interval: 999h
  receiver: "default"
  routes:
    - receiver: "default"
      group_by: []
      match_re:
        alertname: .*
      continue: true
    - receiver: "by-cluster-service"
      group_by: ["alertname", "cluster", "service"]
      match_re:
        alertname: .*
      continue: true
    - receiver: "by-name"
      group_by: [alertname]
      match_re:
        alertname: .*
      continue: true
    - receiver: "by-cluster"
      group_by: [cluster]
      match_re:
        alertname: .*
      continue: true

inhibit_rules:
  - source_match:
      severity: "critical"
    target_match:
      severity: "warning"
    # Apply inhibition if the alertname and cluster is the same in both
    equal: ["alertname", "cluster"]

receivers:
  - name: "default"
  - name: "by-cluster-service"
  - name: "by-name"
  - name: "by-cluster"
  • Logs:
    Didn't find any logs that would show API response information relevant to this
@prymitive
Copy link
Contributor Author

prymitive commented Jul 11, 2019

I see that this check https://github.com/prometheus/alertmanager/blob/master/dispatch/dispatch.go#L170-L171 skips some receivers because alert groups with 2 different receiver have same fingerprint

[...]
[2b351b33b2751c5e] ag {default} {alertname="Always On Alert"} seen=>false
[2b351b33b2751c5e] ag {by-name} {alertname="Always On Alert"} seen=>true
[...]

and that seems to be caused by the fact that it uses labels (and nothing else) to calculate fingerprint.
So if we have 2 receivers with same group_by they will end up with same fingerprint and that will skip one receiver tagged groups or the other (I guess depending on the ordering of groups while iterating).

Is this seen map and following checks really needed?

@simonpasquier
Copy link
Member

I'll have a look at it tomorrow but it feels related to #1875

@prymitive
Copy link
Contributor Author

Anything I can do to help here?

@simonpasquier
Copy link
Member

Sorry for the lag, I've just submitted #1964. It would be nice if you could have a look. It is just a quick fix as it doesn't address the UI issue (ref #1875).

@prymitive
Copy link
Contributor Author

No worries. I think that just dropping that whole seen mapping check is the best way forward, I was confused why we need to dedup anything at that level, hence my question earlier.
Even if we can get dups there it feels to me that it's not the best place to deal with it, ideally it should be deduped earlier.

@simonpasquier
Copy link
Member

The main reason for the seen map (I assume) was that the previous code (eg v1 API) had it in the first place but it used a completely different structure for the response...

@prymitive
Copy link
Contributor Author

Makes sense, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants