From 46c6b6eda469e0f61116cd9c27b714b142919178 Mon Sep 17 00:00:00 2001
From: Mikhail Uvarov <arcusfelis@gmail.com>
Date: Mon, 25 Sep 2023 19:18:52 +0200
Subject: [PATCH] =?UTF-8?q?=F0=9F=93=96=20Document=20CETS=20as=20an=20alte?=
 =?UTF-8?q?rnative=20to=20Mnesia?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add information to User’s Guide / Tutorials how to set up MIM with RDBMS + CETS.
Update all docs relying on Mnesia configuration and commands like running_db_nodes to also show the CETS alternatives.
Uses tabs in documentation.
---
 .../database-backends-configuration.md        |  27 ++-
 doc/configuration/general.md                  |   3 +-
 ...uster-configuration-and-node-management.md | 206 +++++++++++++-----
 .../MongooseIM-metrics.md                     |  21 ++
 doc/tutorials/CETS-configure.md               |  36 +++
 mkdocs.yml                                    |   1 +
 priv/graphql/schemas/admin/cets.gql           |   2 +-
 7 files changed, 226 insertions(+), 70 deletions(-)
 create mode 100644 doc/tutorials/CETS-configure.md

diff --git a/doc/configuration/database-backends-configuration.md b/doc/configuration/database-backends-configuration.md
index 7a2c81617bd..3659126b7ac 100644
--- a/doc/configuration/database-backends-configuration.md
+++ b/doc/configuration/database-backends-configuration.md
@@ -21,14 +21,16 @@ Subsequent sections go into more depth on each database: what they are suitable
 
 Transient data:
 
-* Mnesia - we highly recommend Mnesia (a highly available and distributed database) over Redis for storing **transient** data.
- Being an Erlang-based database, it's the default persistence option for most modules in MongooseIM.
-  
-    !!! Warning
-        We **strongly recommend** keeping **persistent** data in an external DB (RDBMS) for production.
-        Mnesia is not suitable for the volumes of **persistent** data which some modules may require.
-        Sooner or later a migration will be needed which may be painful.
-        It is possible to store all data in Mnesia, but only for testing purposes, not for any serious deployments.
+* CETS - a library to synchronise records from the ETS tables between nodes.
+  A new choice to share the live data across the MongooseIM cluster.
+  We recommend to use this backend for transient data.
+  This backend requires an RDBMS database configured because we use an external database to discover nodes in the cluster.
+  Check for CETS config example in [tutorials](../tutorials/CETS-configure.md).
+
+* Mnesia - a built-in Erlang Database.
+  Mnesia is fine for the cluster of the fixed size with reliable networking between nodes and with nodes rarely restarted.
+  There are some issues when nodes are restarting or joining the cluster. So, we recommend to use CETS instead.
+  Mnesia is still a default backend for modules for config compatibility reasons.
 
 * Redis - A fantastic choice for storing live data.
  It's highly scalable and it can be easily shared by multiple MongooseIM nodes.
@@ -38,6 +40,12 @@ Transient data:
 
 Persistent Data:
 
+    !!! Warning
+        We **strongly recommend** keeping **persistent** data in an external DB (RDBMS) for production.
+        Mnesia is not suitable for the volumes of **persistent** data which some modules may require.
+        Sooner or later a migration will be needed which may be painful.
+        It is possible to store all data in Mnesia, but only for testing purposes, not for any serious deployments.
+
 * RDBMS - MongooseIM has a strong backend support for relational databases.
  Reliable and battle proven, they are a great choice for regular MongooseIM use cases and features like `privacy lists`, `vcards`, `roster`, `private storage`, `last activity` and `message archive`.
  Never loose your data.
@@ -47,12 +55,13 @@ Persistent Data:
 
 * ElasticSearch - Only for MAM (Message Archive Management).
 
+* Mnesia - some backends support Mnesia to store data, but it is not recommended.
+
 
 User Data:
 
 * LDAP -  Used for: users, shared rosters, vCards
 
-
 ## RDBMS
 
 ### MySQL
diff --git a/doc/configuration/general.md b/doc/configuration/general.md
index cd712fb3aff..6e60fa9b83d 100644
--- a/doc/configuration/general.md
+++ b/doc/configuration/general.md
@@ -147,7 +147,8 @@ These options can be used to configure the way MongooseIM manages user sessions.
 * **Example:** `sm_backend = "redis"`
 
 Backend for storing user session data. All nodes in a cluster must have access to a complete session database.
-Mnesia is sufficient in most cases, use Redis only in large deployments when you notice issues with the mnesia backend. Requires a redis pool with the `default` tag defined in the `outgoing_pools` section.
+CETS is a new backend, requires RDBMS configured to work properly.
+Mnesia is a legacy backend, sufficient in most cases, use Redis only in large deployments when you notice issues with the mnesia backend. Requires a redis pool with the `default` tag defined in the `outgoing_pools` section.
 See the section about [redis connection setup](./outgoing-connections.md#redis-specific-options) for more information.
 
 !!! Warning
diff --git a/doc/operation-and-maintenance/Cluster-configuration-and-node-management.md b/doc/operation-and-maintenance/Cluster-configuration-and-node-management.md
index 89bae04b589..2d139de94c4 100644
--- a/doc/operation-and-maintenance/Cluster-configuration-and-node-management.md
+++ b/doc/operation-and-maintenance/Cluster-configuration-and-node-management.md
@@ -64,98 +64,186 @@ Checklist:
 - the same cookie across all nodes (`vm.args` `-setcookie` parameter)
 - each node should be able to ping other nodes using its sname
    (ex. `net_adm:ping('mongoose@localhost')`)
+- RDBMS backend is configured, so CETS could discover nodes
 
 ### Initial node
 
-There is no action required on the initial node.
+=== "CETS"
 
-Just start MongooseIM using `mongooseim start` or `mongooseim live`.
+    Clustering is automatic. There is no difference between nodes.
+
+=== "Mnesia"
+
+    There is no action required on the initial node.
+
+     Just start MongooseIM using `mongooseim start` or `mongooseim live`.
 
 ### New node - joining cluster
 
+=== "CETS"
 
-```bash
-mongooseimctl start
-mongooseimctl started #waits until MongooseIM starts
-mongooseimctl join_cluster ClusterMember
-```
+    Clustering is automatic.
 
-`ClusterMember` is the name of a running node set in `vm.args` file, for example `mongooseim@localhost`.
-This node has to be part of the cluster we'd like to join.
+=== "Mnesia"
 
-First, MongooseIM will display a warning and a question if the operation should proceed:
+    ```bash
+    mongooseimctl start
+    mongooseimctl started #waits until MongooseIM starts
+    mongooseimctl join_cluster ClusterMember
+    ```
 
-```text
-Warning. This will drop all current connections and will discard all persistent data from Mnesia. Do you want to continue? (yes/no)
-```
+    `ClusterMember` is the name of a running node set in `vm.args` file, for example `mongooseim@localhost`.
+    This node has to be part of the cluster we'd like to join.
 
-If you type `yes` MongooseIM will start joining the cluster.
-Successful output may look like the following:
+    First, MongooseIM will display a warning and a question if the operation should proceed:
 
-```text
-You have successfully joined the node mongooseim2@localhost to the cluster with node member mongooseim@localhost
-```
+    ```text
+    Warning. This will drop all current connections and will discard all persistent data from Mnesia. Do you want to continue? (yes/no)
+    ```
 
-In order to skip the question you can add option `-f` which will perform the action
-without displaying the warning and waiting for the confirmation.
+    If you type `yes` MongooseIM will start joining the cluster.
+    Successful output may look like the following:
+
+    ```text
+    You have successfully joined the node mongooseim2@localhost to the cluster with node member mongooseim@localhost
+    ```
+
+    In order to skip the question you can add option `-f` which will perform the action
+    without displaying the warning and waiting for the confirmation.
 
 ### Leaving cluster
 
-To leave a running node from the cluster, call:
+=== "CETS"
 
-```bash
-mongooseimctl leave_cluster
-```
+    Stopping the node is enough to leave the cluster.
+    If you want to avoid the node to join the cluster again, you have to specify a different `cluster_name`
+    option in the CETS backend configuration. A different Erlang cookie is a good idea too.
 
-It only makes sense to use it if the node is the part of a cluster, e.g `join_cluster` was called from that node before.
+=== "Mnesia"
 
-Similarly to `join_cluster` a warning and a question will be displayed unless the option `-f` is added to the command.
+    To leave a running node from the cluster, call:
 
-The successful output from the above command may look like the following:
+    ```bash
+    mongooseimctl leave_cluster
+    ```
 
-```text
-The node mongooseim2@localhost has successfully left the cluster
-```
+    It only makes sense to use it if the node is the part of a cluster, e.g `join_cluster` was called from that node before.
 
-### Removing a node from the cluster
+    Similarly to `join_cluster` a warning and a question will be displayed unless the option `-f` is added to the command.
 
-To remove another node from the cluster, call the following command from one of the cluster members:
+    The successful output from the above command may look like the following:
 
-```bash
-mongooseimctl remove_from_cluster RemoteNodeName
-```
+    ```text
+    The node mongooseim2@localhost has successfully left the cluster
+    ```
 
-where `RemoteNodeName` is a name of the node that we'd like to remove from our cluster.
-This command could be useful when the node is dead and not responding and we'd like to remove it remotely.
-The successful output from the above command may look like the following:
+### Removing a node from the cluster
 
-```text
-The node mongooseim2@localhost has been removed from the cluster
-```
+=== "CETS"
 
-### Cluster status
+    A stopped node would be automatically removed from the node discovery table in RDBMS database after some time.
+    It is needed so other nodes would not try to connect to the stopped node.
 
-You can use the following commands on any of the running nodes to examine the cluster
-or to see if a newly added node is properly clustered:
+=== "Mnesia"
 
-```bash
-mongooseimctl mnesia info | grep "running db nodes"
-```
+    To remove another node from the cluster, call the following command from one of the cluster members:
 
-This command shows all running nodes.
-A healthy cluster should contain all nodes here.
-For example:
-```bash
-running db nodes = [mongooseim@node1, mongooseim@node2]
-```
-To see stopped or misbehaving nodes following command can be useful:
+    ```bash
+    mongooseimctl remove_from_cluster RemoteNodeName
+    ```
 
-```bash
-mongooseimctl mnesia info | grep "stopped db nodes"
-```
+    where `RemoteNodeName` is a name of the node that we'd like to remove from our cluster.
+    This command could be useful when the node is dead and not responding and we'd like to remove it remotely.
+    The successful output from the above command may look like the following:
+
+    ```text
+    The node mongooseim2@localhost has been removed from the cluster
+    ```
+
+### Cluster status
 
-This command shows which nodes are considered stopped.
-This does not necessarily indicate that they are down but might be a symptom of a communication problem.
+=== "CETS"
+
+    Run the command:
+
+    ```bash
+    mongooseimctl cets systemInfo
+    ```
+
+    `joinedNodes` should contain a list of properly joined nodes:
+
+    ```json
+    "joinedNodes" : [
+      "mongooseim@node1",
+      "mongooseim@node2"
+    ]
+    ```
+
+    It should generally be equal to the list of `discoveredNodes`.
+
+    If it is not equal, you could have some configuration or networking issues.
+    You can check `unavailableNodes`, `remoteNodesWithUnknownTables`,
+    `remoteNodesWithMissingTables` lists for more information (generally, these lists should be empty).
+
+    You can read the description for other fields of `systemInfo` in the GraphQL schema file.
+
+    For a properly configured 2 nodes cluster the metrics would show something like that:
+
+    ```json
+    mongooseimctl metric getMetrics --name '["global", "cets", "system"]'
+    {
+      "data" : {
+        "metric" : {
+          "getMetrics" : [
+            {
+              "unavailable_nodes" : 0,
+              "type" : "cets_system",
+              "remote_unknown_tables" : 0,
+              "remote_nodes_without_disco" : 0,
+              "remote_nodes_with_unknown_tables" : 0,
+              "remote_nodes_with_missing_tables" : 0,
+              "remote_missing_tables" : 0,
+              "name" : [
+                "global",
+                "cets",
+                "system"
+              ],
+              "joined_nodes" : 2,
+              "discovery_works" : 1,
+              "discovered_nodes" : 2,
+              "conflict_tables" : 0,
+              "conflict_nodes" : 0,
+              "available_nodes" : 2
+            }
+          ]
+        }
+      }
+    }
+    ```
+
+=== "Mnesia"
+
+    You can use the following commands on any of the running nodes to examine the cluster
+    or to see if a newly added node is properly clustered:
+
+    ```bash
+    mongooseimctl mnesia info | grep "running db nodes"
+    ```
+
+    This command shows all running nodes.
+    A healthy cluster should contain all nodes here.
+    For example:
+    ```bash
+    running db nodes = [mongooseim@node1, mongooseim@node2]
+    ```
+    To see stopped or misbehaving nodes following command can be useful:
+
+    ```bash
+    mongooseimctl mnesia info | grep "stopped db nodes"
+    ```
+
+    This command shows which nodes are considered stopped.
+    This does not necessarily indicate that they are down but might be a symptom of a communication problem.
 
 ## Load Balancing
 
diff --git a/doc/operation-and-maintenance/MongooseIM-metrics.md b/doc/operation-and-maintenance/MongooseIM-metrics.md
index 639246c4371..e2b6159ac5b 100644
--- a/doc/operation-and-maintenance/MongooseIM-metrics.md
+++ b/doc/operation-and-maintenance/MongooseIM-metrics.md
@@ -179,6 +179,27 @@ Metrics specific to an extension, e.g. Message Archive Management, are described
 | `[global, data, dist]` | proplist | Network stats for an Erlang distributed communication. A proplist with values: `recv_oct`, `recv_cnt`, `recv_max`, `send_oct`, `send_max`, `send_cnt`, `send_pend`, `connections`. |
 | `[global, data, rdbms, PoolName]` | proplist | For every RDBMS pool defined, an instance of this metric is available. It is a proplist with values `workers`, `recv_oct`, `recv_cnt`, `recv_max`, `send_oct`, `send_max`, `send_cnt`, `send_pend`. |
 
+### CETS system metrics
+
+| Metric name | Type | Description |
+| ----------- | ---- | ----------- |
+| `[global, cets, system]` | proplist | A proplist with a list of stats. Description is below. |
+
+| Stat Name | Description |
+| ----------- | ----------- |
+| `available_nodes` | Available nodes (nodes that are connected to us and have the CETS disco process started). |
+| `unavailable_nodes` | Unavailable nodes (nodes that do not respond to our pings). |
+| `joined_nodes` | Joined nodes (nodes that have our local tables running). |
+| `discovered_nodes` | Discovered nodes (nodes that are extracted from the discovery backend). |
+| `remote_nodes_without_disco` | Nodes that have more tables registered than the local node. |
+| `remote_nodes_with_unknown_tables` | Nodes that have more tables registered than the local node. |
+| `remote_unknown_tables` | Unknown remote tables. |
+| `remote_nodes_with_missing_tables` | Nodes that are available, but do not host some of our local tables. |
+| `remote_missing_tables` | Nodes that replicate at least one of our local tables to a different list of nodes. |
+| `conflict_nodes` | Nodes that replicate at least one of our local tables to a different list of nodes. |
+| `conflict_tables` | Tables that have conflicting replication destinations. |
+| `discovery_works` | Returns 1 if the last discovery attempt is successful (otherwise returns 0). |
+
 ### VM metrics
 
 | Metric name | Type | Description |
diff --git a/doc/tutorials/CETS-configure.md b/doc/tutorials/CETS-configure.md
new file mode 100644
index 00000000000..bfeffa66358
--- /dev/null
+++ b/doc/tutorials/CETS-configure.md
@@ -0,0 +1,36 @@
+## CETS Config Example
+
+[CETS](https://github.com/esl/cets/) is a library, which allows to replicate in-memory data
+across the MongooseIM cluster. It could be used to store a list of online XMPP sessions, a list
+of outgoung S2S connections, steam management session IDs, a list of online MUC rooms.
+
+If you want to use CETS instead of Mnesia, ensure that these options are set:
+
+```ini
+[general]
+  sm_backend = "cets"
+  component_backend = "cets"
+  s2s_backend = "cets"
+
+[internal_databases.cets]
+
+# The list of modules that use CETS
+# You should enable only modules that you use
+[modules.mod_stream_management]
+  backend = "cets"
+
+[modules.mod_bosh]
+  backend = "cets"
+
+[modules.mod_muc]
+  online_backend = "cets"
+
+[modules.mod_jingle_sip]
+  backend = "cets"
+```
+
+Ensure that `outgoing_pools` are configured with RDBMS, so CETS could get a list of MongooseIM nodes, which use the same
+relational database and cluster them together.
+
+A preferred way to install MongooseIM is [Helm Charts](https://github.com/esl/MongooseHelm/) on Kubernetes, so it allows
+to set `volatileDatabase` to `cets` and the values would be applied using Helm's templates
diff --git a/mkdocs.yml b/mkdocs.yml
index 35a43c3e7b8..65204b15412 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -53,6 +53,7 @@ nav:
   - Tutorials:
       - 'How to Build MongooseIM from source code': 'tutorials/How-to-build.md'
       - 'How to build and run MongooseIM docker image': 'tutorials/Docker-build.md'
+      - 'How to configure MongooseIM to use CETS instead of Mnesia': 'tutorials/CETS-configure.md'
       - 'How to Set up Push Notifications': 'tutorials/push-notifications/Push-notifications.md'
       - 'How to Set up Push Notifications on the client side': 'tutorials/push-notifications/Push-notifications-client-side.md'
       - 'How to Set up MongoosePush': 'tutorials/push-notifications/MongoosePush-setup.md'
diff --git a/priv/graphql/schemas/admin/cets.gql b/priv/graphql/schemas/admin/cets.gql
index c2fea7bb91b..8f4f3b86acd 100644
--- a/priv/graphql/schemas/admin/cets.gql
+++ b/priv/graphql/schemas/admin/cets.gql
@@ -26,7 +26,7 @@ type CETSSystemInfo {
   unavailableNodes: [String]
   "Joined nodes (nodes that have our local tables running)"
   joinedNodes: [String]
-  "Discovered nodes (nodes that are extracted from the discovery backend)."
+  "Discovered nodes (nodes that are extracted from the discovery backend)"
   discoveredNodes: [String]
   "Nodes with stopped CETS discovery"
   remoteNodesWithoutDisco: [String]