Skip to content

Commit

Permalink
Merge pull request #4360 from esl/instrumentation/documentation_fixes
Browse files Browse the repository at this point in the history
Instrumentation documentation fixes
  • Loading branch information
chrzaszcz authored Aug 21, 2024
2 parents 229b40d + 9a8a509 commit 2bf2b51
Show file tree
Hide file tree
Showing 12 changed files with 233 additions and 207 deletions.
22 changes: 21 additions & 1 deletion doc/configuration/instrumentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ They are mainly used for the purpose of metrics.

Instrumentation events are acted upon by handlers. Available instrumentation handlers are:

* `prometheus` - exposes a metrics endpoint for [Prometheus](https://prometheus.io/).
* `prometheus` - collects metrics for the purpose of [Prometheus](https://prometheus.io/). Endpoint to access them has to be configured in the [listener section](../listeners/listen-http.md#handler-types-prometheus-mongoose_prometheus_handler).
* `exometer` - starts [Exometer](https://github.com/esl/exometer_core), a metrics server capable of exporting metrics using reporters. Currently available is a [Graphite](https://graphiteapp.org/) reporter.
* `log` - logs instrumentation events to disk.

Expand All @@ -29,6 +29,10 @@ General options for the Exometer reporter:
* **Default:** `false`
* **Example:** `all_metrics_are_global = true`

When enabled, all per host type metrics are merged into global equivalents.
The option should be used if you have exceptionally many [host types](../configuration/general.md#generalhost_types) or [static hosts (static domains)](../configuration/general.md#generalhosts).
It is recommended when the number of host types or static domains is in the hundreds, as it significantly reduces CPU usage and (especially) memory footprint in those setups.

## Exometer reporter options

Multiple reporters can be configured.
Expand Down Expand Up @@ -85,6 +89,22 @@ A prefix to prepend all metric names with before they are sent to the graphite s
Specifies an environmental variable name from which an additional prefix will be taken.
In case both `prefix` and `env_prefix` are defined, it will be placed before the `prefix` and separated with a dot.

## Log handler options

### `instrumentation.log.level`
* **Syntax:** string, one of `"none"`, `"emergency"`, `"alert"`, `"critical"`, `"error"`, `"warning"`, `"notice"`, `"info"`, `"debug"`, `"all"`.
* **Default:** `"debug"`
* **Example:** `loglevel = "error"`

Base severity level at which all the events will be logged.
Note that for some events, the level may be different, and this option overridden (for example lower for events meant only for debugging purposes).

!!! note

In order for instrumentation events to appear in logs, the [`general.loglevel` option](../configuration/general.md#generalloglevel) has to be set to the same or lower level.
However, this may make the logs overly verbose, as most of the events important for a MongooseIM operator are logged anyway with appropriete severity levels.
The main purpose of this option is debugging, and is not recommended for production systems, thus the default `"debug"` value.

## Example Prometheus configuration

This configuration enables `prometheus`, and `log` handlers:
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/mod_global_distrib.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ Global distribution modules expose several per-datacenter metrics that can be us
| `mod_global_distrib_mapping_cache_misses_count` | counter | Number of fetches of session table entries that hit the database. |
| `mod_global_distrib_delivered_with_ttl_value` | histogram | A histogram of packets' TTL values recorded when the global routing layer decides to route them locally (but not due to TTL = 0). |
| `mod_global_distrib_stop_ttl_zero_count` | counter | A number of packets that weren't processed by global routing due to TTL=0. |
| `mod_global_distrib_bounce_queue_size` | counter | A number of messages enqueued for rerouting (the value of this metric is individual per MongooseIM node!). |
| `mod_global_distrib_bounce_queue_size` | counter | A number of messages enqueued for rerouting (the value of this metric is individual per MongooseIM node!). This metric is updated periodically, every [`instrumentation.probe_interval`](../configuration/instrumentation.md#instrumentationprobe_interval). |

=== "Exometer"

Expand Down
10 changes: 5 additions & 5 deletions doc/modules/mod_last.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,15 @@ Backend in the action name can be either `rdbms` or `mnesia`.
| `mod_last_Backend_count` | counter | `set_last_info` | A timestamp is stored in the database. |
| `mod_last_Backend_time` | histogram | `set_last_info` | Time spent storing a timestamp in the database. |
| `mod_last_Backend_count` | counter | `session_cleanup` | A session is cleaned up from the database. |
| `mod_last_Backend_time` | histogram | `session_cleanup` | Time spent cleaning up a from the database. |
| `mod_last_Backend_time` | histogram | `session_cleanup` | Time spent cleaning up a session from the database. |

=== "Exometer"

| Backend action | Type | Description (when it gets incremented) |
| -------------- | ---- | -------------------------------------- |
| `[HostType, mod_last_Backend, get_last, count]` | counter | A timestamp is fetched from the database. |
| `[HostType, mod_last_Backend, get_last, count]` | spiral | A timestamp is fetched from the database. |
| `[HostType, mod_last_Backend, get_last, time]` | histogram | Time spent fetching a timestamp from the database. |
| `[HostType, mod_last_Backend, set_last_info, count]` | counter | A timestamp is stored in the database. |
| `[HostType, mod_last_Backend, set_last_info, count]` | spiral | A timestamp is stored in the database. |
| `[HostType, mod_last_Backend, set_last_info, time]` | histogram | Time spent storing a timestamp in the database. |
| `[HostType, mod_last_Backend, session_cleanup, count]` | counter | A session is cleaned up from the database. |
| `[HostType, mod_last_Backend, session_cleanup, time]` | histogram | Time spent cleaning up a from the database. |
| `[HostType, mod_last_Backend, session_cleanup, count]` | spiral | A session is cleaned up from the database. |
| `[HostType, mod_last_Backend, session_cleanup, time]` | histogram | Time spent cleaning up a session from the database. |
8 changes: 4 additions & 4 deletions doc/modules/mod_muc.md
Original file line number Diff line number Diff line change
Expand Up @@ -498,8 +498,8 @@ Since Exometer doesn't support labels, the host types, or word `global`, are par
| `mod_muc_deep_hibernations_count` | counter | A room process is stopped (applies only to persistent rooms). |
| `mod_muc_process_recreations_count` | counter | A room process is recreated from a persisted state. |
| `mod_muc_hibernations_count` | counter | A room process becomes hibernated (garbage collected and put in wait state). |
| `mod_muc_rooms_hibernated` | gauge | How many rooms are in hibernated state. Does not include rooms in "deep hibernation". |
| `mod_muc_rooms_online` | gauge | How many rooms have running processes (includes rooms in a hibernated state). |
| `mod_muc_rooms_hibernated` | gauge | How many rooms are in hibernated state. Does not include rooms in "deep hibernation". This metric is updated periodically, every [`instrumentation.probe_interval`](../configuration/instrumentation.md#instrumentationprobe_interval). |
| `mod_muc_rooms_online` | gauge | How many rooms have running processes (includes rooms in a hibernated state). This metric is updated periodically, every [`instrumentation.probe_interval`](../configuration/instrumentation.md#instrumentationprobe_interval). |

=== "Exometer"

Expand All @@ -508,5 +508,5 @@ Since Exometer doesn't support labels, the host types, or word `global`, are par
| `[HostType, mod_muc_deep_hibernations, count]` | spiral | A room process is stopped (applies only to persistent rooms). |
| `[HostType, mod_muc_process_recreations, count]` | spiral | A room process is recreated from a persisted state. |
| `[HostType, mod_muc_hibernations, count]` | spiral | A room process becomes hibernated (garbage collected and put in wait state). |
| `[HostType, mod_muc_rooms, hibernated]` | gauge | How many rooms are in hibernated state. Does not include rooms in "deep hibernation". |
| `[HostType, mod_muc_rooms, online]` | gauge | How many rooms have running processes (includes rooms in a hibernated state). |
| `[HostType, mod_muc_rooms, hibernated]` | gauge | How many rooms are in hibernated state. Does not include rooms in "deep hibernation". This metric is updated periodically, every [`instrumentation.probe_interval`](../configuration/instrumentation.md#instrumentationprobe_interval). |
| `[HostType, mod_muc_rooms, online]` | gauge | How many rooms have running processes (includes rooms in a hibernated state). This metric is updated periodically, every [`instrumentation.probe_interval`](../configuration/instrumentation.md#instrumentationprobe_interval). |
Loading

0 comments on commit 2bf2b51

Please sign in to comment.