Skip to content

Commit

Permalink
ci: convert k8s charts deployment -> statefulset
Browse files Browse the repository at this point in the history
Updates the helm charts used for testnet deployments to use a
StatefulSet [0], rather than a Deployment [1], as the representation
for a Penumbra fullnode/validator. The goal is to leverage the k8s API
as best as possible for our workloads, which are indeed stateful in the
sense that they require attached storage and cannot maintain their
identity absent that storage.

We also benefit from ordered rollouts, meaning that future minor version
bumps will be applied sequentially, and paused if any node fails to
become ready. This will ensure more predictable behavior as we move
toward chain upgrades.

When performing a chain upgrade, the manual steps taken by a human
operator are now significantly simpler. In addition to the conversion to
Statefulsets, the relevant charts now boast a new future called
"maintenanceMode", defaulting to false, which will place nodes in a
suspended state so that a human operator can run `pd migrate`. This mode
encapsulates a number of finicky manual steps: override command to be
"sleep infinity", for both pd and cometbft, alter securityContext
to run as root user for volume permissions, and then undo all that in
the reverse order.

[0] https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
[1] https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
  • Loading branch information
conorsch committed Jan 22, 2024
1 parent cd3db0c commit 1423b35
Show file tree
Hide file tree
Showing 21 changed files with 504 additions and 469 deletions.
14 changes: 14 additions & 0 deletions deployments/charts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Helm charts for Penumbra

These helm charts are used to deploy test infrastructure via CI.
A given network deployment is composed of three charts:

* `penumbra-network`, which runs `pd testnet generate` to create genesis
and configure genesis validators
* `penumbra-node`, which runs fullnodes joined to the network, and also
exposes HTTPS frontends so their RPCs are accessible.
* `penumbra-metrics`, which runs a grafana/prometheus setup scraping
the metrics endpoints of the nodes and validators, and exposes
the grafana dashboards over HTTPS.

These charts are posted publicly as a reference.
19 changes: 1 addition & 18 deletions deployments/charts/penumbra-network/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,24 +1,7 @@
apiVersion: v2
name: penumbra-network
description: A Helm chart for Kubernetes
description: Generate a fresh network config for Penumbra, and deploy its genesis validators.

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.0

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "1.16.0"
13 changes: 0 additions & 13 deletions deployments/charts/penumbra-network/TODO

This file was deleted.

Empty file.
12 changes: 8 additions & 4 deletions deployments/charts/penumbra-network/templates/job-generate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ spec:
# to each PV after network generation.
{{ range $i,$e := until $count }}
{{ $val_name := printf "%s-val-%d" (include "penumbra-network.fullname" $) $i }}
{{ $pvc_name := printf "%s-config" $val_name }}
# The `pvc_name` must match the volumes created by the StatefulSet.
{{ $pvc_name := printf "penumbra-config-%s" $val_name }}
- name: {{ $val_name }}-config
{{- if $.Values.persistence.enabled }}
persistentVolumeClaim:
Expand Down Expand Up @@ -81,20 +82,23 @@ spec:
--proposal-voting-blocks {{ .Values.network.proposal_voting_blocks }} \
{{- end }}
--validators-input-file /penumbra/validators.json \
{{- if .Values.network.external_addresses }}
--external-addresses {{ .Values.network.external_addresses }}
{{- end }}
echo "Looks like we're dealing with '{{ $count }}' vals"
# copy validator configs to volume mounts
{{ range $i,$e := until $count }}
{{ $val_name := printf "%s-val-%d" (include "penumbra-network.fullname" $) $i }}
cp -av /penumbra-config/testnet_data/node{{ $i }}/ /penumbra-config/{{ $val_name }}/
>&2 printf 'Configuring validator %d/%d...\n' "{{ $i }}" "{{ $count }}"
# rename subdir to "node0" so we don't have to look up val ordinal when specifying homedir.
mv -v /penumbra-config/testnet_data/node{{ $i }} /penumbra-config/{{ $val_name }}/node0
# set ownership for pd user
chown -R 1000:1000 /penumbra-config/{{ $val_name }}
# set ownership for cometbft configs to match cometbft container "tmuser" uid/gid
chown -R 100:1000 /penumbra-config/{{ $val_name }}/node{{ $i }}/cometbft
chown -R 100:1000 /penumbra-config/{{ $val_name }}/node0/cometbft
ls -lsR /penumbra-config
{{ end }}
Expand Down
12 changes: 8 additions & 4 deletions deployments/charts/penumbra-network/templates/pvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
{{- if .Values.persistence.enabled }}
# Shared volume for generating network data. Per-validator configs
# will be copied out of this shared volume, into separate PVCs.
{{ $pvc_name := printf "%s-shared-config" (include "penumbra-network.fullname" .) }}
{{ $shared_pvc_name := printf "%s-shared-config" (include "penumbra-network.fullname" .) }}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ $pvc_name }}
name: {{ $shared_pvc_name }}
labels:
app.kubernetes.io/component: genesis-generator
{{- include "penumbra-network.labels" . | nindent 4 }}
Expand All @@ -28,11 +28,15 @@ spec:
storageClassName: {{ .Values.persistence.storageClassName }}
{{- end }}

# Per-validator config for state.
# Also provision PVCs for each validator. Normally we'd let the StatefulSet
# volumeClaimTemplate handle this, but we need the PVCs available in a pre-install hook,
# so we create them with helm annotations in a loop. The names of the PVCs must match
# those in the VCTs.
{{ $count := (.Values.network.num_validators | int) }}
{{ range $i,$e := until $count }}
{{ $val_name := printf "%s-val-%d" (include "penumbra-network.fullname" $) $i }}
{{ $pvc_name := printf "%s-config" $val_name }}
# The `pvc_name` must match the PVC created by the StatefulSet.
{{ $pvc_name := printf "penumbra-config-%s" $val_name }}
---
apiVersion: v1
kind: PersistentVolumeClaim
Expand Down
4 changes: 2 additions & 2 deletions deployments/charts/penumbra-network/templates/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ metadata:
spec:
type: ClusterIP
selector:
app: {{ $val_name }}
statefulset.kubernetes.io/pod-name: {{ $val_name }}
{{- include "penumbra-network.selectorLabels" $ | nindent 4 }}
ports:
- protocol: TCP
Expand Down Expand Up @@ -58,7 +58,7 @@ spec:
protocol: TCP
targetPort: 26656
selector:
app: {{ $val_name }}
statefulset.kubernetes.io/pod-name: {{ $val_name }}
{{- include "penumbra-network.selectorLabels" $ | nindent 4 }}
type: LoadBalancer
{{ end }}
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
{{- if .Values.only_lb_svc }}
{{- else }}
{{ $val0_name := printf "%s-val-0" (include "penumbra-network.fullname" $) }}
{{ $count := (.Values.network.num_validators | int) }}
{{ range $i,$e := until $count }}
{{ $val_name := printf "%s-val-%d" (include "penumbra-network.fullname" $) $i }}
{{ $val_name := printf "%s-val" (include "penumbra-network.fullname" $) }}
{{ $pvc_name := "penumbra-config" }}
---
apiVersion: apps/v1
kind: Deployment
kind: StatefulSet
metadata:
name: {{ $val_name }}
labels:
Expand All @@ -15,13 +14,23 @@ metadata:
app.kubernetes.io/part-of: {{ include "penumbra-network.part_of" $ }}
{{- include "penumbra-network.labels" $ | nindent 4 }}
spec:
replicas: 1
replicas: {{ $count }}
volumeClaimTemplates:
- metadata:
name: {{ $pvc_name }}
labels:
app.kubernetes.io/component: genesis-validator
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: {{ .Values.persistence.size }}
selector:
matchLabels:
app.kubernetes.io/component: genesis-validator
{{- include "penumbra-network.selectorLabels" $ | nindent 6 }}
strategy:
type: Recreate
updateStrategy:
type: RollingUpdate
template:
metadata:
{{- with $.Values.podAnnotations }}
Expand Down Expand Up @@ -58,30 +67,24 @@ spec:
serviceAccountName: {{ include "penumbra-network.serviceAccountName" $ }}
securityContext:
{{- toYaml $.Values.podSecurityContext | nindent 8 }}
volumes:
- name: {{ $val_name }}-config
{{- if $.Values.persistence.enabled }}
persistentVolumeClaim:
claimName: {{ $val_name }}-config
{{- else }}
emptyDir: {}
{{- end }}

containers:
- name: pd
securityContext:
{{- toYaml $.Values.securityContext | nindent 12 }}
image: "{{ $.Values.image.repository }}:{{ $.Values.image.tag | default $.Chart.AppVersion }}"
imagePullPolicy: {{ $.Values.image.pullPolicy }}
command:
{{- if .Values.maintenanceMode }}
- sleep
- infinity
{{- else }}
- /usr/bin/pd
- start
- --grpc-bind
- "0.0.0.0:8080"
- --metrics-bind
- "0.0.0.0:9000"
- --home
- "/penumbra-config/{{ $val_name }}/node{{ $i }}/pd"
- "/penumbra-config/{{ $val_name }}/node0/pd"
{{- end }}
env:
{{- toYaml $.Values.containerEnv | nindent 12 }}
ports:
Expand All @@ -101,17 +104,24 @@ spec:
initialDelaySeconds: 20
resources:
{{- toYaml $.Values.resources | nindent 12 }}
securityContext:
runAsUser: {{ .Values.maintenanceMode | ternary 0 .Values.securityContext.runAsUser }}
volumeMounts:
- name: {{ $val_name }}-config
- name: {{ $pvc_name }}
mountPath: /penumbra-config/{{ $val_name }}

- name: cometbft
image: "{{ $.Values.cometbft.image.repository }}:{{ $.Values.cometbft.image.tag }}"
imagePullPolicy: {{ $.Values.cometbft.image.pullPolicy }}
command:
{{- if .Values.maintenanceMode }}
- sleep
- infinity
{{- else }}
- cometbft
- start
- --proxy_app=tcp://127.0.0.1:26658
{{- end }}
ports:
- name: tm-p2p
containerPort: 26656
Expand All @@ -130,8 +140,8 @@ spec:
resources:
{{- toYaml $.Values.resources | nindent 12 }}
volumeMounts:
- name: {{ $val_name }}-config
subPath: node{{ $i }}/cometbft
- name: {{ $pvc_name }}
subPath: node0/cometbft
mountPath: /cometbft
{{- with $.Values.nodeSelector }}
nodeSelector:
Expand All @@ -146,4 +156,3 @@ spec:
{{- toYaml $ | nindent 8 }}
{{- end }}
{{- end }}
{{- end }}
13 changes: 11 additions & 2 deletions deployments/charts/penumbra-network/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ service:
externalTrafficPolicy: Local
port: 26656

# Whether to place the application in "maintenance mode", effectively stopping pd and cometbft,
# allowing an administrator to inspect and munge local state, e.g. to perform a chain upgrade.
# Makes two changes: 1) sets the `command` for the containers to `sleep infinity`; and 2) sets
# the uid for the pd container to 0/root
maintenanceMode: false

# configure PVCs for disk data
persistence:
enabled: false
Expand Down Expand Up @@ -105,13 +111,16 @@ podAnnotations: {}
podSecurityContext: {}
# fsGroup: 2000

securityContext: {}
securityContext:
# The Penumbra container sets 1000 as default UID. We'll use that by default.
# See also `maintenanceMode=true`, which overrides this to 0.
runAsUser: 1000

# capabilities:
# drop:
# - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000

# N.B. Only `IngressRoute`, a custom CRD specific to Traefik ingress controller
# is supported. This is because a traditional Ingress object doesn't allow us
Expand Down
92 changes: 92 additions & 0 deletions deployments/charts/penumbra-node/files/pd-init
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
#!/bin/bash
# Entrypoint script to build args for Penumbra's pd,
# based on StatefulSet k8s ordinal.
set -euo pipefail


if ! [[ $HOSTNAME =~ -([0-9]+)$ ]] ; then
>&2 echo "ERROR: hostname did not have a numeric suffix"
exit 1
fi


# Disable shellcheck for unused variable; it can't tell we use the var
# in the jq command below.
# shellcheck disable=SC2034
statefulset_ordinal="${BASH_REMATCH[1]}"

# Raw Helm vars translated to JSON representation in this file.
node_info_filepath="/opt/penumbra/nodes.json"

>&2 echo "Configuring node '$statefulset_ordinal' with node info:"
jq < "$node_info_filepath"

# Unpack the JSON Helm vars as Bash env vas.
function get_var() {
local v
local json_address
json_address="${1:-}"
shift 1
v="$(jq -r ".[$statefulset_ordinal].$json_address" "$node_info_filepath")"
if [[ $v = "null" ]]; then
v=""
fi
echo "$v"
}

external_address_flag=""
external_address="$(get_var "external_address")"
if [[ -n $external_address ]] ; then
external_address_flag="--external-address $external_address"
fi

moniker_flag=""
moniker="$(get_var "moniker")"
if [[ -n $moniker ]] ; then
moniker_flag="--moniker $moniker"
fi

seed_mode="$(get_var "seed_mode")"
if [[ "$seed_mode" = "true" ]] ; then
seed_mode="true"
else
seed_mode="false"
fi

# we must write into a subdir of the volumeMount, because the "--testnet-dir" arg
# to "pd testnet join" must point to a non-existent directory, and the volumeMount
# will always exist.
#
if ! test -d /penumbra-config/testnet_data ; then
echo "No pre-existing testnet data, pulling fresh info"
# shellcheck disable=SC2086
pd testnet --testnet-dir /penumbra-config/testnet_data join \
--tendermint-p2p-bind 0.0.0.0:26656 \
--tendermint-rpc-bind 0.0.0.0:26657 \
$external_address_flag \
$moniker_flag \
"$PENUMBRA_BOOTSTRAP_URL"

if [[ "$PENUMBRA_COMETBFT_INDEXER" = "psql" ]] ; then
sed -i -e "s#^indexer.*#indexer = \"psql\"\\npsql-conn = \"$COMETBFT_POSTGRES_CONNECTION_URL\"#" \
"/penumbra-config/testnet_data/node0/cometbft/config/config.toml"
fi
fi

# set ownership for pd user
chown -R 1000:1000 /penumbra-config/testnet_data

# apply external address. useful for a two-pass deploy, in which external ips
# are created after first deploy.
sed -i -e "s/external_address.*/external_address = \"$external_address\"/" /penumbra-config/testnet_data/node0/cometbft/config/config.toml
sed -i -e "s/moniker.*/moniker = \"$moniker\"/" /penumbra-config/testnet_data/node0/cometbft/config/config.toml

# configure peer settings
sed -i -e "s/max_num_inbound_peers.*/max_num_inbound_peers = $COMETBFT_CONFIG_P2P_MAX_NUM_INBOUND_PEERS/" /penumbra-config/testnet_data/node0/cometbft/config/config.toml
sed -i -e "s/max_num_outbound_peers.*/max_num_outbound_peers = $COMETBFT_CONFIG_P2P_MAX_NUM_OUTBOUND_PEERS/" /penumbra-config/testnet_data/node0/cometbft/config/config.toml

# configure seed node, defaulting to false if unspecified.
sed -i -e "s/^seed_mode.*/seed_mode = \"$seed_mode\"/" /penumbra-config/testnet_data/node0/cometbft/config/config.toml

# set ownership for cometbft configs to match cometbft container "tmuser" uid/gid
chown -R 100:1000 /penumbra-config/testnet_data/node0/cometbft
Loading

0 comments on commit 1423b35

Please sign in to comment.