Merge pull request #64 from jrasell/gh-47

docs: add high availability feature documentation.
jrasell · Oct 9, 2019 · 35f1a4b · 35f1a4b
2 parents d0eecb2 + fcfb18f
commit 35f1a4b
Show file tree

Hide file tree

Showing 4 changed files with 65 additions and 1 deletion.
diff --git a/docs/api/README.md b/docs/api/README.md
@@ -4,6 +4,16 @@ The Sherpa HTTP API gives you full access to a Sherpa server via HTTP. Every asp
 
 All API routes are prefixed with /v1/, which is the current API version.
 
+## Leadership
+
+When calling the Sherpa cluster leaders API, the call will work as expected. When calling a non-leader server, calls to the policy and scale endpoints will result in a redirect response which will contain the advertised address of the leader. The system and UI endpoints will always return information about the targeted Sherpa server.
+
+Example redirect return:
+```
+< HTTP/1.1 307 Temporary Redirect
+< Location: http://127.0.0.1:9000/v1/policies
+```
+
 ## HTTP Status Codes
 The following HTTP status codes are used throughout the API. Sherpa tries to adhere to these whenever possible.
 

diff --git a/docs/api/system.md b/docs/api/system.md
@@ -1,5 +1,31 @@
 # System API
 
+## Get Server Leader
+
+This endpoint can  be used to identify the current cluster leader and the storage HA capability.
+
+| Method   | Path                         |
+| :--------------------------- | :--------------------- |
+| `GET`    | `/v1/system/leader`              | `200 application/binary` |
+
+### Sample Request
+
+```
+$ curl \
+    http://127.0.0.1:8000/v1/system/leader
+```
+
+### Sample Response
+
+```json
+{
+  "IsSelf": true,
+  "HAEnabled": true,
+  "LeaderAddress": "127.0.0.1:8000",
+  "LeaderClusterAddress": "http://127.0.0.1:8000"
+}
+```
+
 ## Get Server Health
 
 This endpoint can be used to query the Sherpa server health status.

diff --git a/docs/configuration/README.md b/docs/configuration/README.md
@@ -7,8 +7,10 @@ The Sherpa server can be configured by supplying either CLI flags or using envir
 * `--autoscaler-enabled` (bool: false) - Enable the internal autoscaling engine.
 * `--autoscaler-evaluation-interval` (int: 60) - The time period in seconds between autoscaling evaluation runs.
 * `--autoscaler-num-threads` (int: 3) - Specifies the number of parallel autoscaler threads to run.
-* `--bind-addr` (string: "127.0.0.1) - The HTTP server address to bind to.
+* `--bind-addr` (string: "127.0.0.1") - The HTTP server address to bind to.
 * `--bind-port` (uint16: 8000) - The HTTP server port to bind to.
+* `--cluster-advertise-addr` (string: "http://127.0.0.1:8000") - The Sherpa server advertise address used for NAT traversal on HTTP redirects.
+* `--cluster-name` (string: "") - Specifies the identifier for the Sherpa cluster.
 * `--log-format` (string: "auto") - Specify the log format ("auto", "zerolog" or "human").
 * `--log-level` (string: "info") - Change the level used for logging.
 * `--log-use-color` (bool: true) - Use ANSI colors in logging output.

diff --git a/docs/guides/high-availability.md b/docs/guides/high-availability.md
@@ -0,0 +1,26 @@
+# Sherpa High Availability
+
+Sherpa supports a multi-server mode for high availability. This mode protects against outages by running multiple Sherpa servers. High availability mode is automatically enabled when using a data store that supports it.
+
+You can tell if a data store supports high availability mode by starting the server and seeing the `HAEnabled` return value from the `system/leader` endpoint. If it is, then Sherpa will automatically use HA mode.
+
+To be highly available, one of the Sherpa server nodes grabs a lock within the data store. The successful server node then becomes the active node; all other nodes become standby nodes. At this point, if the standby nodes receive a request, they will redirect the client depending on the current configuration and state of the cluster. Due to this architecture, HA does not enable increased scalability. In general, the bottleneck of Sherpa is the data store itself, not Sherpa core.
+
+## Client Redirection
+
+The standby nodes will redirect the client using a 307 status code to the active node's redirect address.
+
+What the `cluster-advertise-addr` value should be set to depends on how Sherpa is set up. There are two common scenarios: Sherpa servers accessed directly by clients, and Sherpa servers accessed via a load balancer.
+
+In both cases, the `cluster-advertise-addr` should be a full URL including scheme (http/https), not simply an IP address and port.
+
+### Direct Access
+
+When clients are able to access Sherpa directly, the `cluster-advertise-addr` for each node should be that node's address. For instance, if there are two Sherpa nodes A (accessed via https://a.sherpa.mycompany.com:8000) and B (accessed via https://b.sherpa.mycompany.com:8000), node A would set its `cluster-advertise-addr` to https://a.sherpa.mycompany.com:8000 and node B would set its `cluster-advertise-addr` to https://b.sherpa.mycompany.com:8000.
+
+This way, when A is the active node, any requests received by node B will cause it to redirect the client to node A's `cluster-advertise-addr` at https://a.sherpa.mycompany.com, and vice-versa.
+
+### Behind Load Balancers
+Sometimes clients use load balancers as an initial method to access one of the Sherpa servers, but actually have direct access to each Sherpa node. In this case, the Sherpa servers should actually be set up as described in the above section, since for redirection purposes the clients have direct access.
+
+If the only access to the Sherpa servers is via the load balancer, the `cluster-advertise-addr` on each node should be the same: the address of the load balancer. Clients that reach a standby node will be redirected back to the load balancer; at that point hopefully the load balancer's configuration will have been updated to know the address of the current leader. This can cause a redirect loop and as such is not a recommended setup when it can be avoided.