Skip to content

Commit

Permalink
Alerting plugin - experimental cross cluster monitor support document…
Browse files Browse the repository at this point in the history
…ation (#6350)

* Added documentation for supporting cluster metrics monitors that can monitor remote clusters. Added documentation for configuring query and bucket monitors through the UI that can query remote indexes. These are experimental for v2.12.

Signed-off-by: AWSHurneyt <hurneyt@amazon.com>

* Update _observing-your-data/alerting/per-query-bucket-monitors.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: AWSHurneyt <hurneyt@amazon.com>

* Update _observing-your-data/alerting/per-cluster-metrics-monitors.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: AWSHurneyt <hurneyt@amazon.com>

* Update _observing-your-data/alerting/per-cluster-metrics-monitors.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: AWSHurneyt <hurneyt@amazon.com>

* Update _observing-your-data/alerting/per-cluster-metrics-monitors.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _observing-your-data/alerting/per-cluster-metrics-monitors.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _observing-your-data/alerting/per-query-bucket-monitors.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update per-cluster-metrics-monitors.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

---------

Signed-off-by: AWSHurneyt <hurneyt@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
  • Loading branch information
3 people authored Feb 6, 2024
1 parent 47013f9 commit 729f492
Show file tree
Hide file tree
Showing 5 changed files with 15 additions and 7 deletions.
17 changes: 10 additions & 7 deletions _observing-your-data/alerting/per-cluster-metrics-monitors.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ has_children: false

# Per cluster metrics monitors

Per cluster metrics monitors are a type of alert monitor that collects and analyzes metrics from a single cluster, providing insights into the cluster's performance and health. You can set alerts to monitor certain conditions, such as when:
_Per cluster metrics monitors_ are a type of alert monitor that collects and analyzes metrics from a single cluster, providing insights into the cluster's performance and health. You can set alerts to monitor certain conditions, such as when:

- Cluster health reaches yellow or red status.
- Cluster-level metrics---for example, CPU usage and JVM memory usage---reach specified thresholds.
Expand Down Expand Up @@ -51,7 +51,7 @@ Trigger conditions use responses from the following API endpoints. Most APIs tha

If you want to hide fields from the API response and not expose them for alerting, reconfigure the [supported_json_payloads.json](https://github.com/opensearch-project/alerting/blob/main/alerting/src/main/resources/org/opensearch/alerting/settings/supported_json_payloads.json) file inside the Alerting plugin. The file functions as an allow list for the API fields you want to use in an alert. By default, all APIs and their parameters can be used for monitors and trigger conditions.

However, you can modify the file so that cluster metric monitors can only be created for APIs referenced. Furthermore, only fields referenced in the supported files can create trigger conditions. This `supported_json_payloads.json` allows for a cluster metrics monitor to be created for the `_cluster/stats` API, and triggers conditions for the `indices.shards.total` and `indices.shards.index.shards.min` fields.
However, you can modify the file so that cluster metrics monitors can only be created for APIs referenced. Furthermore, only fields referenced in the supported files can create trigger conditions. This `supported_json_payloads.json` allows for a cluster metrics monitor to be created for the `_cluster/stats` API, and triggers conditions for the `indices.shards.total` and `indices.shards.index.shards.min` fields.

```json
"/_cluster/stats": {
Expand All @@ -68,7 +68,9 @@ Painless scripts define triggers for cluster metrics monitors, similar to per qu

The cluster metrics monitor supports up to **ten** triggers.

In the following example, a JSON object creates a trigger that sends an alert when the cluster health is yellow. `script` points the `source` to the Painless script `ctx.results[0].status == \"yellow\`.
In the following example, the monitor is configured to call the Cluster Health API for two clusters, `cluster-1` and `cluster-2`. The trigger condition will create an alert when either of the clusters' `status` is not `green`.

The `script` parameter points the `source` to the Painless script `for (cluster in ctx.results[0].keySet()) if (ctx.results[0][cluster].status != \"green\") return true`. See [Trigger variables]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/triggers/#trigger-variables) for more `painless ctx` variable options.

```json
{
Expand All @@ -88,7 +90,8 @@ In the following example, a JSON object creates a trigger that sends an alert wh
"api_type": "CLUSTER_HEALTH",
"path": "_cluster/health/",
"path_params": "",
"url": "http://localhost:9200/_cluster/health/"
"url": "http://localhost:9200/_cluster/health/",
"cluster": ["cluster-1", "cluster-2"]
}
}
],
Expand All @@ -100,7 +103,7 @@ In the following example, a JSON object creates a trigger that sends an alert wh
"severity": "1",
"condition": {
"script": {
"source": "ctx.results[0].status == \"yellow\"",
"source": "for (cluster in ctx.results[0].keySet()) if (ctx.results[0][cluster].status != \"green\") return true",
"lang": "painless"
}
},
Expand All @@ -110,14 +113,14 @@ In the following example, a JSON object creates a trigger that sends an alert wh
]
}
```
The dashboards interface supports the selection of clusters to be monitored and the desired API. A view of the interface is shown in the following image.

See [Trigger variables]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/triggers/#trigger-variables) for more `painless ctx` variable options.
<img src="{{site.url}}{{site.baseurl}}/images/alerting/cross-cluster-cluster-metrics-monitors.png" alt="Cluster metrics monitor" width="700"/>

### Limitations

Per cluster metrics monitors have the following limitations:

- You cannot create monitors for remote clusters.
- The OpenSearch cluster must be in a state where an index's conditions can be monitored and actions can be executed against the index.
- Removing resource permissions from a user will not prevent that user’s preexisting monitors for that resource from executing.
- Users with permissions to create monitors are not blocked from creating monitors for resources for which they do not have permissions; however, those monitors will not run.
4 changes: 4 additions & 0 deletions _observing-your-data/alerting/per-query-bucket-monitors.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ Per query monitors are a type of alert monitor that can be used to identify and

Per bucket monitors are a type of alert monitor that can be used to identify and alert on specific buckets of data that are created by a query against an OpenSearch index.

Both monitor types support querying remote indexes using the same `cluster-name:index-name` pattern used by [cross-cluster search](https://opensearch.org/docs/latest/security/access-control/cross-cluster-search/) or by using OpenSearch Dashboards 2.12 or later.

<img src="{{site.url}}{{site.baseurl}}/images/alerting/cross-cluster-per-query-per-bucket-monitors.png" alt="Cluster metrics monitor" width="700"/>

## Creating a per query or per bucket monitor

To create a per query monitor, follow these steps:
Expand Down
1 change: 1 addition & 0 deletions _observing-your-data/alerting/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ Setting | Default | Description
`plugins.alerting.alert_history_retention_period` | 60d | The amount of time to keep history indexes before automatically deleting them.
`plugins.alerting.destination.allow_list` | ["chime", "slack", "custom_webhook", "email", "test_action"] | The list of allowed destinations. If you don't want to allow users to a certain type of destination, you can remove it from this list, but we recommend leaving this setting as-is.
`plugins.alerting.filter_by_backend_roles` | "false" | Restricts access to monitors by backend role. See [Alerting security]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/security/).
`plugins.alerting.remote_monitoring_enabled` | "false" | Toggles whether cluster metrics monitors support executing against remote clusters.
`plugins.scheduled_jobs.sweeper.period` | 5m | The alerting feature uses its "job sweeper" component to periodically check for new or updated jobs. This setting is the rate at which the sweeper checks to see if any jobs (monitors) have changed and need to be rescheduled.
`plugins.scheduled_jobs.sweeper.page_size` | 100 | The page size for the sweeper. You shouldn't need to change this value.
`plugins.scheduled_jobs.sweeper.backoff_millis` | 50ms | The amount of time the sweeper waits between retries---increases exponentially after each failed retry.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 729f492

Please sign in to comment.