Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concepts: Document caveats with HPAs and PDBs #461

Merged
merged 15 commits into from
Oct 6, 2023
Merged
15 changes: 10 additions & 5 deletions modules/concepts/pages/operations/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -31,18 +31,23 @@ Sometimes you want to quickly shut down a product or update the Stackable operat
restarting at the same time. You can achieve this using the following methods:

1. Quickly stop and start a whole product using `stopped` as described in xref:operations/cluster_operations.adoc[].
2. Prevent any changes to your deployed product using `reconcilePaused` as described in xref:operations/cluster_operations.adoc[].
2. Prevent any changes to your deployed product using `reconciliationPaused` as described in xref:operations/cluster_operations.adoc[].

== Performance

1. You can configure the available resource every product has using xref:concepts:resources.adoc[]. The defaults are
1. *Compute resources*: You can configure the available resource every product has using xref:concepts:resources.adoc[]. The defaults are
very restrained, as you should be able to spin up multiple products running on your Laptop.
2. You can not only use xref:operations/pod_placement.adoc[] to achieve more resilience, but also to co-locate products
2. *Autoscaling*: Although not supported by the platform natively yet, you can use
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale[HorizontalPodAutoscaler] to autoscale the number of Pods
running for a given rolegroup dynamically based upon resource usage. To achieve this you need to *not* configure any replicas on the rolegroup.
Afterwards you can deploy a HorizontalPodAutoscaler as usual. Please note that doing so is experimental and not officially support by the
platform. Later platform versions will support autoscaling natively with sensible defaults and make it easy to enable and configure.
3. *Co-location*: You can not only use xref:operations/pod_placement.adoc[] to achieve more resilience, but also to co-locate products
that communicate frequently with each other. One example is placing HBase regionservers on the same Kubernetes node
as the HDFS datanodes. Our operators already take this into account and co-locate connected services. However, if
you are not satisfied with the automatically created affinities you can use ref:operations/pod_placement.adoc[] to
you are not satisfied with the automatically created affinities you can use xref:operations/pod_placement.adoc[] to
configure your own.
3. If you want to have certain services running on dedicated nodes you can also use xref:operations/pod_placement.adoc[]
4. *Dedicated nodes*: If you want to have certain services running on dedicated nodes you can also use xref:operations/pod_placement.adoc[]
to force the Pods to be scheduled on certain nodes. This is especially helpful if you e.g. have Kubernetes nodes with
16 cores and 64 GB, as you could allocate nearly 100% of these node resources to your Spark executors or Trino workers.
In this case it is important that you https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/[taint]
Expand Down
21 changes: 21 additions & 0 deletions modules/concepts/pages/operations/pod_disruptions.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,27 @@ The defaults depend on the individual product and can be found below the "Operat
They are based on our knowledge of each product's fault tolerance.
In some cases they may be a little pessimistic, but they can be adjusted as documented in the following sections.

In general we split product roles into the following two categories, which serve as guidelines for the default values we apply:
sbernauer marked this conversation as resolved.
Show resolved Hide resolved

=== Multiple replicas to increase availability

For these roles (e.g. Zookeeper servers, HDFS journal + namenodes or HBase masters) we only allow a single Pod to be unavailable. One example would be 7 Zookeeper Nodes, you need 4 to form a quorum. If we would allow 2 to be unavailable,
there is no single point of failure (as we have at least 5 nodes available), but also we only have a single spare node left. The reason why you did choose 7 instead of 5
Zookeeper replicas might be, that you always want at least 2 spares. So by increasing the number of allowed disruptions when you increase the number of replicas probably
is not what you are trying to achieve when you increase the replicas to increase availability.

=== Multiple replicas to increase performance

For these roles (e.g. HDFS datanodes, HBase regionservers or Trino workers) we allow more than a single Pod to be unavailable, as otherwise rolling re-deployments could take very long.

IMPORTANT: The operators calculate the number of Pods for a give role by adding the number of replicas of every rolegroup that is part of that role.
In case their are no replicas defined on a rolegroup, one Pod will be assumed for this rolegroup, as the created Kubernetes objects
sbernauer marked this conversation as resolved.
Show resolved Hide resolved
(StatefulSets or Deployments) will default to a single replica as well. However, in case there are
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/[HorizontalPodAutoscaler] in place, the number of replicas of the rolegroup
can change dynamically. In this case the operators might falsely assume that rolegroups have less Pods than they actually have. This is a pessimistic approach,
sbernauer marked this conversation as resolved.
Show resolved Hide resolved
as the number of allowed disruption normally stays the same or even increases when the number of Pods increases. So this should be save, but in some cases
more Pods *could* have been allowed to be unavailable, so rolling re-deployments can take a bit longer than needed.

== Influencing and disabling PDBs

You can configure
Expand Down