Skip to content

Commit

Permalink
Add v1beta3 spec docs
Browse files Browse the repository at this point in the history
Signed-off-by: Sunny <darkowlzz@protonmail.com>
  • Loading branch information
darkowlzz committed Sep 14, 2023
1 parent 43e29a2 commit 9562888
Show file tree
Hide file tree
Showing 5 changed files with 1,831 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/spec/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## API Specification

* [v1](v1/README.md)
* [v1beta3](v1beta3/README.md)
* [v1beta2](v1beta2/README.md)
* [v1beta1](v1beta1/README.md)
* [v1alpha1](v1alpha1/README.md)
13 changes: 13 additions & 0 deletions docs/spec/v1beta3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# notification.toolkit.fluxcd.io/v1beta3

This is the v1beta3 API specification for defining events handling.

## Specification

* [Alerts](alerts.md)
* [Events](events.md)
* [Providers](providers.md)

## Go Client

* [github.com/fluxcd/pkg/runtime/events](https://pkg.go.dev/github.com/fluxcd/pkg/runtime/events)
250 changes: 250 additions & 0 deletions docs/spec/v1beta3/alerts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
# Alerts

<!-- menuweight:10 -->

The `Alert` API defines how events are filtered by severity and involved object, and what provider to use for dispatching.

## Example

The following is an example of how to send alerts to Slack when Flux fails to reconcile the `flux-system` namespace.

```yaml
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
name: slack-bot
namespace: flux-system
spec:
type: slack
channel: general
address: https://slack.com/api/chat.postMessage
secretRef:
name: slack-bot-token
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
name: slack
namespace: flux-system
spec:
summary: "Cluster addons impacted in us-east-2"
providerRef:
name: slack-bot
eventSeverity: error
eventSources:
- kind: GitRepository
name: '*'
- kind: Kustomization
name: '*'
```
In the above example:
- A Provider named `slack-bot` is created, indicated by the
`Provider.metadata.name` field.
- An Alert named `slack` is created, indicated by the
`Alert.metadata.name` field.
- The Alert references the `slack-bot` provider, indicated by the
`Alert.spec.providerRef` field.
- The notification-controller starts listening for events sent for
all GitRepositories and Kustomizations in the `flux-system` namespace.
- When an event with severity `error` is received, the controller posts
a message on Slack channel from `.spec.channel`,
containing the `summary` text and the reconciliation error.

You can run this example by saving the manifests into `slack-alerts.yaml`.

1. First create a secret with the Slack bot token:

```sh
kubectl -n flux-system create secret generic slack-bot-token --from-literal=token=xoxb-YOUR-TOKEN
```

2. Apply the resources on the cluster:

```sh
kubectl -n flux-system apply --server-side -f slack-alerts.yaml
```

## Writing an Alert spec

As with all other Kubernetes config, an Alert needs `apiVersion`,
`kind`, and `metadata` fields. The name of an Alert object must be a
valid [DNS subdomain name](https://kubernetes.io/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).

An Alert also needs a
[`.spec` section](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status).

### Summary

`.spec.summary` is an optional field to specify a short description of the
impact and affected cluster.

The summary max length can't be greater than 255 characters.

### Provider reference

`.spec.providerRef.name` is a required field to specify a name reference to a
[Provider](providers.md) in the same namespace as the Alert.

### Event sources

`.spec.eventSources` is a required field to specify a list of references to
Flux objects for which events are forwarded to the alert provider API.

To select events issued by Flux objects, each entry in the `.spec.eventSources` list
must contain the following fields:

- `kind` is the Flux Custom Resource Kind such as GitRepository, HelmRelease, Kustomization, etc.
- `name` is the Flux Custom Resource `.metadata.name`, or it can be set to the `*` wildcard.
- `namespace` is the Flux Custom Resource `.metadata.namespace`.
When not specified, the Alert `.metadata.namespace` is used instead.

#### Select objects by name

To select events issued by a single Flux object, set the `kind`, `name` and `namespace`:

```yaml
eventSources:
- kind: GitRepository
name: webapp
namespace: apps
```

#### Select all objects in a namespace

The `*` wildcard can be used to select events issued by all Flux objects of a particular `kind` in a `namespace`:

```yaml
eventSources:
- kind: HelmRelease
name: '*'
namespace: apps
```

#### Select objects by label

To select events issued by all Flux objects of a particular `kind` with specific `labels`:

```yaml
eventSources:
- kind: HelmRelease
name: '*'
namespace: apps
matchLabels:
team: app-dev
```

#### Disable cross-namespace selectors

**Note:** On multi-tenant clusters, platform admins can disable cross-namespace references by
starting the controller with the `--no-cross-namespace-refs=true` flag.
When this flag is set, alerts can only refer to event sources in the same namespace as the alert object,
preventing tenants from subscribing to another tenant's events.

### Event metadata

`.spec.eventMetadata` is an optional field for adding metadata to events dispatched by
the controller. This can be used for enhancing the context of the event. If a field
would override one already present on the original event as generated by the emitter,
then the override doesn't happen, i.e. the original value is preserved, and an info
log is printed.

#### Example

Add metadata fields to successful `HelmRelease` events:

```yaml
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
name: <name>
spec:
eventSources:
- kind: HelmRelease
name: '*'
inclusionList:
- ".*succeeded.*"
eventMetadata:
app.kubernetes.io/env: "production"
app.kubernetes.io/cluster: "my-cluster"
app.kubernetes.io/region: "us-east-1"
```

### Event severity

`.spec.eventSeverity` is an optional field to filter events based on severity. When not specified, or
when the value is set to `info`, all events are forwarded to the alert provider API, including errors.
To receive alerts only on errors, set the field value to `error`.

### Event exclusion

`.spec.exclusionList` is an optional field to specify a list of regex expressions to filter
events based on message content. The event will be excluded if the message matches at least
one of the expressions in the list.

#### Example

Skip alerting if the message matches a [Go regex](https://golang.org/pkg/regexp/syntax)
from the exclusion list:

```yaml
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
name: <name>
spec:
eventSources:
- kind: GitRepository
name: '*'
exclusionList:
- "waiting.*socket"
```

The above definition will not send alerts for transient Git clone errors like:

```text
unable to clone 'ssh://git@ssh.dev.azure.com/v3/...', error: SSH could not read data: Error waiting on socket
```

### Event inclusion

`.spec.inclusionList` is an optional field to specify a list of regex expressions to filter
events based on message content. The event will be sent if the message matches at least one
of the expressions in the list, and discarded otherwise. If the message matches one of the
expressions in the inclusion list but also matches one of the expressions in the exclusion
list, then the event is still discarded (exclusion is stronger than inclusion).

#### Example

Alert if the message matches a [Go regex](https://golang.org/pkg/regexp/syntax)
from the inclusion list:

```yaml
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
name: <name>
spec:
eventSources:
- kind: HelmRelease
name: '*'
inclusionList:
- ".*succeeded.*"
exclusionList:
- ".*uninstall.*"
- ".*test.*"
```

The above definition will send alerts for successful Helm installs, upgrades and rollbacks,
but not uninstalls and tests.

### Suspend

`.spec.suspend` is an optional field to suspend the altering.
When set to `true`, the controller will stop processing events.
When the field is set to `false` or removed, it will resume.
64 changes: 64 additions & 0 deletions docs/spec/v1beta3/events.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Events

<!-- menuweight:20 -->

The `Event` API defines the structure of the events issued by Flux controllers.

Flux controllers use the [fluxcd/pkg/runtime/events](https://github.com/fluxcd/pkg/tree/main/runtime/events)
package to push events to the notification-controller API.

## Example

The following is an example of an event sent by kustomize-controller to report a reconciliation error.

```json
{
"involvedObject": {
"apiVersion": "kustomize.toolkit.fluxcd.io/v1",
"kind": "Kustomization",
"name": "webapp",
"namespace": "apps",
"uid": "7d0cdc51-ddcf-4743-b223-83ca5c699632"
},
"metadata": {
"kustomize.toolkit.fluxcd.io/revision": "main/731f7eaddfb6af01cb2173e18f0f75b0ba780ef1"
},
"severity":"error",
"reason": "ValidationFailed",
"message":"service/apps/webapp validation error: spec.type: Unsupported value: Ingress",
"reportingController":"kustomize-controller",
"timestamp":"2022-10-28T07:26:19Z"
}
```

In the above example:

- An event is issued by kustomize-controller for a specific object, indicated in the
`involvedObject` field.
- The notification-controller receives the event and finds the [alerts](alerts.md)
that match the `involvedObject` and `severity` values.
- For all matching alerts, the controller posts the `message` and the source revision
extracted from `metadata` to the alert provider API.

## Event structure

The Go type that defines the event structure can be found in the
[fluxcd/pkg/apis/event/v1beta1](https://github.com/fluxcd/pkg/blob/main/apis/event/v1beta1/event.go)
package.

## Rate limiting

Events received by notification-controller are subject to rate limiting to reduce the
amount of duplicate alerts sent to external systems like Slack, Sentry, etc.

Events are rate limited based on `involvedObject.name`, `involvedObject.namespace`,
`involvedObject.kind`, `message`, and `metadata`.
The interval of the rate limit is set by default to `5m` but can be configured
with the `--rate-limit-interval` controller flag.

The event server exposes HTTP request metrics to track the amount of rate limited events.
The following promql will get the rate at which requests are rate limited:

```
rate(gotk_event_http_request_duration_seconds_count{code="429"}[30s])
```
Loading

0 comments on commit 9562888

Please sign in to comment.