Merge pull request #4360 from esl/instrumentation/documentation_fixes

Instrumentation documentation fixes
esl · Aug 21, 2024 · 2bf2b51 · 2bf2b51
2 parents 229b40d + 9a8a509
commit 2bf2b51
Show file tree

Hide file tree

Showing 12 changed files with 233 additions and 207 deletions.
diff --git a/doc/configuration/instrumentation.md b/doc/configuration/instrumentation.md
@@ -4,7 +4,7 @@ They are mainly used for the purpose of metrics.
 
 Instrumentation events are acted upon by handlers. Available instrumentation handlers are:
 
-* `prometheus` - exposes a metrics endpoint for [Prometheus](https://prometheus.io/).
+* `prometheus` - collects metrics for the purpose of [Prometheus](https://prometheus.io/). Endpoint to access them has to be configured in the [listener section](../listeners/listen-http.md#handler-types-prometheus-mongoose_prometheus_handler).
 * `exometer` - starts [Exometer](https://github.com/esl/exometer_core), a metrics server capable of exporting metrics using reporters. Currently available is a [Graphite](https://graphiteapp.org/) reporter.
 * `log` - logs instrumentation events to disk.
 
@@ -29,6 +29,10 @@ General options for the Exometer reporter:
 * **Default:** `false`
 * **Example:** `all_metrics_are_global = true`
 
+When enabled, all per host type metrics are merged into global equivalents.
+The option should be used if you have exceptionally many [host types](../configuration/general.md#generalhost_types) or [static hosts (static domains)](../configuration/general.md#generalhosts).
+It is recommended when the number of host types or static domains is in the hundreds, as it significantly reduces CPU usage and (especially) memory footprint in those setups.
+
 ## Exometer reporter options
 
 Multiple reporters can be configured.
@@ -85,6 +89,22 @@ A prefix to prepend all metric names with before they are sent to the graphite s
 Specifies an environmental variable name from which an additional prefix will be taken.
 In case both `prefix` and `env_prefix` are defined, it will be placed before the `prefix` and separated with a dot.
 
+## Log handler options
+
+### `instrumentation.log.level`
+* **Syntax:** string, one of `"none"`, `"emergency"`, `"alert"`, `"critical"`, `"error"`, `"warning"`, `"notice"`, `"info"`, `"debug"`, `"all"`.
+* **Default:** `"debug"`
+* **Example:** `loglevel = "error"`
+
+Base severity level at which all the events will be logged.
+Note that for some events, the level may be different, and this option overridden (for example lower for events meant only for debugging purposes).
+
+!!! note
+
+    In order for instrumentation events to appear in logs, the [`general.loglevel` option](../configuration/general.md#generalloglevel) has to be set to the same or lower level.
+    However, this may make the logs overly verbose, as most of the events important for a MongooseIM operator are logged anyway with appropriete severity levels.
+    The main purpose of this option is debugging, and is not recommended for production systems, thus the default `"debug"` value.
+
 ## Example Prometheus configuration
 
 This configuration enables `prometheus`, and `log` handlers:

diff --git a/doc/modules/mod_global_distrib.md b/doc/modules/mod_global_distrib.md
@@ -99,7 +99,7 @@ Global distribution modules expose several per-datacenter metrics that can be us
     | `mod_global_distrib_mapping_cache_misses_count` | counter | Number of fetches of session table entries that hit the database. |
     | `mod_global_distrib_delivered_with_ttl_value` | histogram | A histogram of packets' TTL values recorded when the global routing layer decides to route them locally (but not due to TTL = 0). |
     | `mod_global_distrib_stop_ttl_zero_count` | counter | A number of packets that weren't processed by global routing due to TTL=0. |
-    | `mod_global_distrib_bounce_queue_size` | counter | A number of messages enqueued for rerouting (the value of this metric is individual per MongooseIM node!). |
+    | `mod_global_distrib_bounce_queue_size` | counter | A number of messages enqueued for rerouting (the value of this metric is individual per MongooseIM node!). This metric is updated periodically, every [`instrumentation.probe_interval`](../configuration/instrumentation.md#instrumentationprobe_interval). |
 
 === "Exometer"
 

diff --git a/doc/modules/mod_last.md b/doc/modules/mod_last.md
@@ -46,15 +46,15 @@ Backend in the action name can be either `rdbms` or `mnesia`.
     | `mod_last_Backend_count` | counter | `set_last_info` |  A timestamp is stored in the database. |
     | `mod_last_Backend_time` | histogram | `set_last_info` | Time spent storing a timestamp in the database. |
     | `mod_last_Backend_count` | counter | `session_cleanup` | A session is cleaned up from the database. |
-    | `mod_last_Backend_time` | histogram | `session_cleanup` | Time spent cleaning up a from the database. |
+    | `mod_last_Backend_time` | histogram | `session_cleanup` | Time spent cleaning up a session from the database. |
 
 === "Exometer"
 
     | Backend action | Type | Description (when it gets incremented) |
     | -------------- | ---- | -------------------------------------- |
-    | `[HostType, mod_last_Backend, get_last, count]` | counter | A timestamp is fetched from the database. |
+    | `[HostType, mod_last_Backend, get_last, count]` | spiral | A timestamp is fetched from the database. |
     | `[HostType, mod_last_Backend, get_last, time]` | histogram | Time spent fetching a timestamp from the database. |
-    | `[HostType, mod_last_Backend, set_last_info, count]` | counter |  A timestamp is stored in the database. |
+    | `[HostType, mod_last_Backend, set_last_info, count]` | spiral |  A timestamp is stored in the database. |
     | `[HostType, mod_last_Backend, set_last_info, time]` | histogram | Time spent storing a timestamp in the database. |
-    | `[HostType, mod_last_Backend, session_cleanup, count]` | counter | A session is cleaned up from the database. |
-    | `[HostType, mod_last_Backend, session_cleanup, time]` | histogram | Time spent cleaning up a from the database. |
+    | `[HostType, mod_last_Backend, session_cleanup, count]` | spiral | A session is cleaned up from the database. |
+    | `[HostType, mod_last_Backend, session_cleanup, time]` | histogram | Time spent cleaning up a session from the database. |
diff --git a/doc/modules/mod_muc.md b/doc/modules/mod_muc.md
@@ -498,8 +498,8 @@ Since Exometer doesn't support labels, the host types, or word `global`, are par
   | `mod_muc_deep_hibernations_count` | counter | A room process is stopped (applies only to persistent rooms). |
   | `mod_muc_process_recreations_count` | counter | A room process is recreated from a persisted state. |
   | `mod_muc_hibernations_count` | counter | A room process becomes hibernated (garbage collected and put in wait state). |
-  | `mod_muc_rooms_hibernated` | gauge | How many rooms are in hibernated state. Does not include rooms in "deep hibernation". |
-  | `mod_muc_rooms_online` | gauge | How many rooms have running processes (includes rooms in a hibernated state). |
+  | `mod_muc_rooms_hibernated` | gauge | How many rooms are in hibernated state. Does not include rooms in "deep hibernation". This metric is updated periodically, every [`instrumentation.probe_interval`](../configuration/instrumentation.md#instrumentationprobe_interval). |
+  | `mod_muc_rooms_online` | gauge | How many rooms have running processes (includes rooms in a hibernated state). This metric is updated periodically, every [`instrumentation.probe_interval`](../configuration/instrumentation.md#instrumentationprobe_interval). |
 
 === "Exometer"
 
@@ -508,5 +508,5 @@ Since Exometer doesn't support labels, the host types, or word `global`, are par
   | `[HostType, mod_muc_deep_hibernations, count]` | spiral | A room process is stopped (applies only to persistent rooms). |
   | `[HostType, mod_muc_process_recreations, count]` | spiral | A room process is recreated from a persisted state. |
   | `[HostType, mod_muc_hibernations, count]` | spiral | A room process becomes hibernated (garbage collected and put in wait state). |
-  | `[HostType, mod_muc_rooms, hibernated]` | gauge | How many rooms are in hibernated state. Does not include rooms in "deep hibernation". |
-  | `[HostType, mod_muc_rooms, online]` | gauge | How many rooms have running processes (includes rooms in a hibernated state). |
+  | `[HostType, mod_muc_rooms, hibernated]` | gauge | How many rooms are in hibernated state. Does not include rooms in "deep hibernation". This metric is updated periodically, every [`instrumentation.probe_interval`](../configuration/instrumentation.md#instrumentationprobe_interval). |
+  | `[HostType, mod_muc_rooms, online]` | gauge | How many rooms have running processes (includes rooms in a hibernated state). This metric is updated periodically, every [`instrumentation.probe_interval`](../configuration/instrumentation.md#instrumentationprobe_interval). |