Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow user to request consistent ordering / sorting of grouped alerts #1178

Open
tzz opened this issue Jan 2, 2018 · 16 comments
Open

allow user to request consistent ordering / sorting of grouped alerts #1178

tzz opened this issue Jan 2, 2018 · 16 comments

Comments

@tzz
Copy link
Contributor

tzz commented Jan 2, 2018

alertmanager currently groups alerts without a consistent ordering, which is tough to follow for large groups.

The user should be able to specify a sort order on some label or annotation, which would give consistency and locally meaningful sorting. This could be done with a Go sort function in the template or with a alertmanager configuration entry perhaps.

@brian-brazil
Copy link
Contributor

I propose that at a minimum the Alerts in the template structure should come with some ordering on it, like we have done elsewhere.

@stuartnelson3
Copy link
Contributor

Are you referring to the /api/v1/alerts/groups endpoint? we're currently sorting by internal identifier in /api/v1/alerts (https://github.com/prometheus/alertmanager/blob/master/api/api.go#L405-L407)

@brian-brazil
Copy link
Contributor

This is about notification templates.

@tzz
Copy link
Contributor Author

tzz commented Jan 31, 2018

I don't mind writing some code if it will help move this forward. For me it would be enough to have a function or alert manager option to simply sort alerts lexicographically by label or annotation.

@brian-brazil
Copy link
Contributor

I think we should start with a consistent ordering on the Alerts we provide to notification templates. That way users get something okay without having to do extra work.

@tzz
Copy link
Contributor Author

tzz commented Jan 31, 2018

For a default ordering, how about lexicographical by (alert.annotations.summary, alert.creation_time)? That's the simplest thing I can see that could be generally useful. Or maybe even look for a special sort_key annotation that users can define?

@brian-brazil
Copy link
Contributor

You can't presume that any particular annotation exists, nor that creation times are stable. I'd suggest working entirely off alert labels. This should not be configurable at this level.

@tzz
Copy link
Contributor Author

tzz commented Jan 31, 2018

OK, what alert labels would make for a sensible default sort key? (job, instance)?

@brian-brazil
Copy link
Contributor

You'll need to use all of them, otherwise it won't be consistent. Moving job and instance to the front is probably wise.

@tzz
Copy link
Contributor Author

tzz commented Jan 31, 2018

Looking at dispatch/dispatch.go, there seem to be two ways to go: either modify aggrGroup to always have a sorted list of Alert structs instead of a map, or sort the alertsSlice when alerts are flushed. The alertsSlice sort seems easier to implement.

Does that make sense or am I misunderstanding your intention or the code?

@brian-brazil
Copy link
Contributor

Yes, the slice would be the one to sort somewhere along that codepath.

@tzz
Copy link
Contributor Author

tzz commented Feb 7, 2018

The proposed change is in #1234 but if Alertmanager will allow arbitrary sorting from the user in the future, then the LabelSet.Before() method should probably be extended to take a list of label names, in which case the change becomes trivial. I didn't propose that API extension because I don't know if it's right, and wanted to keep the scope as small as possible.

@tzz
Copy link
Contributor Author

tzz commented Mar 28, 2018

@brian-brazil the request was for the user to be able to request a specific sort order for alerts and I thought #1234 was just the first step. Does closing this request mean it won't be done or that it will be implemented elsewhere?

@brian-brazil brian-brazil reopened this Mar 28, 2018
@brian-brazil
Copy link
Contributor

Ah, I'd missed that. I'm personally hoping we can avoid having to implement that.

@tzz tzz changed the title allow user to request consistent ordering of grouped alerts allow user to request consistent ordering / sorting of grouped alerts Apr 7, 2018
tzz added a commit to tzz/alertmanager that referenced this issue Apr 23, 2018
tzz added a commit to tzz/alertmanager that referenced this issue Apr 23, 2018
tzz added a commit to tzz/alertmanager that referenced this issue Apr 23, 2018
simonpasquier pushed a commit to simonpasquier/alertmanager that referenced this issue May 2, 2018
stuartnelson3 pushed a commit that referenced this issue Jun 14, 2018
* Sort dispatched alerts by job+instance in the correct order (#1178)

Signed-off-by: Ted Zlatanov <tzz@lifelogs.com>

* dispatch: add unit test for alerts sorting

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
hh pushed a commit to ii/alertmanager that referenced this issue Nov 30, 2018
* Convert to Go modules

* Update promu config.
* Convert to Go modules.
* Update vendoring.
* Update Makefile.common.
* Update circleci config.
* Use Prometheus release tar for promtool.
* Fixup unpack

* Use temp dir for unpacking tools.
* Use BSD compatible tar command.
* OpenBSD mkdir doesn't support `-v`.

Signed-off-by: Ben Kochie <superq@gmail.com>
hh pushed a commit to ii/alertmanager that referenced this issue Nov 30, 2018
* collector/diskstats: don't fail if there are extra stats, just ignore… (prometheus#1125)

* collector/diskstats: don't fail if there are extra stats, just ignore them

Signed-off-by: Paul Gier <pgier@redhat.com>
Signed-off-by: Ben Kochie <superq@gmail.com>

* collector/hwmon_linux: handle temperature sensor file which doesn't have item suffix (prometheus#1123)

In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes prometheus#1122

Signed-off-by: Paul Gier <pgier@redhat.com>
Signed-off-by: Ben Kochie <superq@gmail.com>

* Handle 'Unknown' as measurement value. (prometheus#1113)

We use the output-compatible perccli and storcli.py does not handle 'Unknown' as a result:
```
sg="Error parsing \"/var/lib/node_exporter/perccli.prom\": text format parsing error in line 222: expected float as value, got \"Unknown\"" source="textfile.go:212"
```
I know, the perccli should not return 'Unknown' but this error breaks all other useful measurements because the prom file is not parsable. My if condition fixes this.

Signed-off-by: Andreas Wirooks <4233401+nudgegoonies@users.noreply.github.com>
Signed-off-by: Ben Kochie <superq@gmail.com>

* circleci: switch to 2.1 config

Signed-off-by: Ben Kochie <superq@gmail.com>

* Convert to Go modules (prometheus#1178)

* Convert to Go modules

* Update promu config.
* Convert to Go modules.
* Update vendoring.
* Update Makefile.common.
* Update circleci config.
* Use Prometheus release tar for promtool.
* Fixup unpack

* Use temp dir for unpacking tools.
* Use BSD compatible tar command.
* OpenBSD mkdir doesn't support `-v`.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add fallback for missing /proc/1/mounts (prometheus#1172)

* Add fallback for missing /proc/1/mounts

On some systems, `/proc/1/mounts` is hidden from non-root users due to
the `hidepid` procfs feature. Attempt to fallback to `/proc/mounts` if
`/proc/1/mounts` is not found.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add tests.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add CHANGELOG entry.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Release v0.17.0 (prometheus#1168)

* Update CHANGELOG
* Update VERSION

Signed-off-by: Ben Kochie <superq@gmail.com>
hh pushed a commit to ii/alertmanager that referenced this issue Dec 3, 2018
* Convert to Go modules

* Update promu config.
* Convert to Go modules.
* Update vendoring.
* Update Makefile.common.
* Update circleci config.
* Use Prometheus release tar for promtool.
* Fixup unpack

* Use temp dir for unpacking tools.
* Use BSD compatible tar command.
* OpenBSD mkdir doesn't support `-v`.

Signed-off-by: Ben Kochie <superq@gmail.com>
@aclowkey
Copy link

I'm curious about the way the API returns the result. is there any sorting logic?

@OlafKocanda
Copy link

I have a suggestion:
Couldn't you just order in the sequence of the group by list?

e.g.
group_by: ['severity', 'alertname']
leads to first grouping by severity and then by alertnames

group_by: ['alertname','severity']
first groups by alertnames then by severity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants