Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dynamic reconfiguration and scaling without rolling restarts. #16

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ It requires Kubernetes 1.7 or greater.

## Limitations
1. Scaling is not currently supported. An ensemble's membership can not be updated in a safe way in
ZooKeeper 3.4.10 (The current stable release).
ZooKeeper 3.5.5 (The current stable release).
1. Observers are currently not supported. Contributions are welcome.
1. Persistent Volumes must be used. emptyDirs will likely result in a loss of data.

Expand Down Expand Up @@ -96,7 +96,7 @@ spec:
containers:
- name: kubernetes-zookeeper
imagePullPolicy: Always
image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10"
image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.5.5"
resources:
requests:
memory: "1Gi"
Expand All @@ -121,7 +121,7 @@ spec:
containers:
- name: kubernetes-zookeeper
imagePullPolicy: Always
image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10"
image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.5.5"
resources:
requests:
memory: "4Gi"
Expand Down
37 changes: 10 additions & 27 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,36 +5,20 @@ ZK_DATA_LOG_DIR=/var/lib/zookeeper/log \
ZK_LOG_DIR=/var/log/zookeeper \
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

ARG GPG_KEY=C823E3E5B12AF29C67F81976F5CECB3CB5E9BD2D
ARG ZK_DIST=zookeeper-3.4.10
ARG ZK_VER=3.5.5
ARG ZK_DIST="zookeeper-$ZK_VER"

RUN set -x \
&& apt-get update \
&& apt-get install -y openjdk-8-jre-headless wget netcat-openbsd \
&& wget -q "http://www.apache.org/dist/zookeeper/$ZK_DIST/$ZK_DIST.tar.gz" \
&& wget -q "http://www.apache.org/dist/zookeeper/$ZK_DIST/$ZK_DIST.tar.gz.asc" \
&& export GNUPGHOME="$(mktemp -d)" \
&& gpg --keyserver ha.pool.sks-keyservers.net --recv-key "$GPG_KEY" \
&& gpg --batch --verify "$ZK_DIST.tar.gz.asc" "$ZK_DIST.tar.gz" \
&& tar -xzf "$ZK_DIST.tar.gz" -C /opt \
&& rm -r "$GNUPGHOME" "$ZK_DIST.tar.gz" "$ZK_DIST.tar.gz.asc" \
&& apt-get install -y openjdk-8-jre-headless wget netcat-openbsd --no-install-recommends apt-utils \
&& wget -q "http://www.apache.org/dist/zookeeper/stable/apache-$ZK_DIST-bin.tar.gz" \
&& tar -xzf "apache-$ZK_DIST-bin.tar.gz" \
&& rm ./apache-*.gz \
&& mv ./*$ZK_DIST* ./$ZK_DIST \
&& mv ./$ZK_DIST /opt \
&& ln -s /opt/$ZK_DIST /opt/zookeeper \
&& rm -rf /opt/zookeeper/CHANGES.txt \
/opt/zookeeper/README.txt \
/opt/zookeeper/NOTICE.txt \
/opt/zookeeper/CHANGES.txt \
/opt/zookeeper/README_packaging.txt \
/opt/zookeeper/build.xml \
/opt/zookeeper/config \
/opt/zookeeper/contrib \
/opt/zookeeper/dist-maven \
&& rm -rf /opt/zookeeper/*.txt \
/opt/zookeeper/docs \
/opt/zookeeper/ivy.xml \
/opt/zookeeper/ivysettings.xml \
/opt/zookeeper/recipes \
/opt/zookeeper/src \
/opt/zookeeper/$ZK_DIST.jar.asc \
/opt/zookeeper/$ZK_DIST.jar.md5 \
/opt/zookeeper/$ZK_DIST.jar.sha1 \
&& apt-get autoremove -y wget \
&& rm -rf /var/lib/apt/lists/*

Expand All @@ -51,5 +35,4 @@ RUN set -x \
&& chown -R "$ZK_USER:$ZK_USER" /opt/$ZK_DIST $ZK_DATA_DIR $ZK_LOG_DIR $ZK_DATA_LOG_DIR /tmp/zookeeper \
&& ln -s /opt/zookeeper/conf/ /usr/etc/zookeeper \
&& ln -s /opt/zookeeper/bin/* /usr/bin \
&& ln -s /opt/zookeeper/$ZK_DIST.jar /usr/share/zookeeper/ \
&& ln -s /opt/zookeeper/lib/* /usr/share/zookeeper
2 changes: 1 addition & 1 deletion docker/Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
VERSION=1.0-3.4.10
VERSION=1.0-3.5.5
PROJECT_ID=google_containers
PROJECT=gcr.io/${PROJECT_ID}

Expand Down
2 changes: 1 addition & 1 deletion docker/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Docker Image
The docker image contained in this repository is comprised of a base Ubuntu 16.04 image using the latest release of the
OpenJDK JRE based on the 1.8 JVM and the latest stable release of ZooKeeper, 3.4.10. Ubuntu is a much larger image than
OpenJDK JRE based on the 1.8 JVM and the latest stable release of ZooKeeper, 3.5.5. Ubuntu is a much larger image than
BusyBox or Alpine, but these images contain mucl or ulibc. This requires a custom version of OpenJDK to be built
against a libc runtime other than glibc. No vendor of the ZooKeeper software supplies or verifies the software against
such a JVM, and, while Alpine or BusyBox would provide smaller images, we have prioritized a well known environment.
Expand Down
63 changes: 57 additions & 6 deletions docker/scripts/start-zookeeper
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,10 @@
# --log_level The log level for the zookeeeper server. Either FATAL,
# ERROR, WARN, INFO, DEBUG. The default is INFO.

# --skip_acl Skips ACL checks. This results in a boost in throughput,
# but opens up full access to the data tree to everyone.
# The default is false.


USER=`whoami`
HOST=`hostname -s`
Expand All @@ -93,6 +97,9 @@ MAX_CLIENT_CNXNS=60
SNAP_RETAIN_COUNT=3
PURGE_INTERVAL=0
SERVERS=1
STANDALONE_ENABLED=false
RECONFIG_ENABLED=true
SKIP_ACL=false

function print_usage() {
echo "\
Expand Down Expand Up @@ -152,8 +159,23 @@ Starts a ZooKeeper server based on the supplied options.
--min_session_timeout The minimum time in milliseconds for a client session
timeout. The default value is 20 * tick time.

--standalone_enabled When set to false, a single server can be started in
replicated mode, a lone participant can run with observers,
and a cluster can reconfigure down to one node, and up
from one node. The default value is true. The default is
false.

--reconfig_enabled This controls the enabling or disabling of Dynamic
Reconfiguration feature. The default value is true.

--log_level The log level for the zookeeeper server. Either FATAL,
ERROR, WARN, INFO, DEBUG. The default is INFO.

--skipACL When set to false, a single server can be started in
replicated mode, a lone participant can run with observers,
and a cluster can reconfigure down to one node, and up
from one node. The default value is true. The default is
false.
"
}

Expand All @@ -172,22 +194,24 @@ function create_data_dirs() {
mkdir -p $LOG_DIR
chown -R $USER:$USER $LOG_DIR
fi
if [ ! -f $ID_FILE ] && [ $SERVERS -gt 1 ]; then
echo $MY_ID >> $ID_FILE
fi
}

function print_servers() {
if [[ $MY_ID -gt $SERVERS ]]; then
SERVERS=$MY_ID
fi

for (( i=1; i<=$SERVERS; i++ ))
do
echo "server.$i=$NAME-$((i-1)).$DOMAIN:$SERVER_PORT:$ELECTION_PORT"
echo "server.$i=$NAME-$((i-1)).$DOMAIN:$SERVER_PORT:$ELECTION_PORT:participant;$CLIENT_PORT"
done
}

function create_config() {
rm -f $CONFIG_FILE
local DYNAMIC_CONFIG_FILE="$CONFIG_FILE.dynamic"
echo "#This file was autogenerated DO NOT EDIT" >> $CONFIG_FILE
echo "clientPort=$CLIENT_PORT" >> $CONFIG_FILE
echo "dataDir=$DATA_DIR" >> $CONFIG_FILE
echo "dataLogDir=$DATA_LOG_DIR" >> $CONFIG_FILE
echo "tickTime=$TICK_TIME" >> $CONFIG_FILE
Expand All @@ -198,8 +222,23 @@ function create_config() {
echo "maxSessionTimeout=$MAX_SESSION_TIMEOUT" >> $CONFIG_FILE
echo "autopurge.snapRetainCount=$SNAP_RETAIN_COUNT" >> $CONFIG_FILE
echo "autopurge.purgeInteval=$PURGE_INTERVAL" >> $CONFIG_FILE
if [ $SERVERS -gt 1 ]; then
echo "standaloneEnabled=$STANDALONE_ENABLED" >> $CONFIG_FILE
echo "reconfigEnabled=$RECONFIG_ENABLED" >> $CONFIG_FILE

# clientPort is somewhat deprecated, however .zkServer.sh status still checks for it.
echo "clientPort=$CLIENT_PORT" >> $CONFIG_FILE

if [ "$SKIP_ACL" = true ]; then
echo "skipACL=yes" >> $CONFIG_FILE
fi
if [ "$RECONFIG_ENABLED" = false ]; then
# Append Servers to Config if Dynamic Reconfiguration is Not Enabled
print_servers >> $CONFIG_FILE
else
echo "reconfigEnabled=$RECONFIG_ENABLED" >> $CONFIG_FILE
echo "dynamicConfigFile=$DYNAMIC_CONFIG_FILE" >> $CONFIG_FILE
# Append Servers to Dynamic Config File
print_servers >> $DYNAMIC_CONFIG_FILE
fi
cat $CONFIG_FILE >&2
}
Expand All @@ -208,6 +247,9 @@ function create_jvm_props() {
rm -f $JAVA_ENV_FILE
echo "ZOO_LOG_DIR=$LOG_DIR" >> $JAVA_ENV_FILE
echo "JVMFLAGS=\"-Xmx$HEAP -Xms$HEAP\"" >> $JAVA_ENV_FILE
if [ "$SKIP_ACL" = true ]; then
echo "SERVER_JVMFLAGS=-Dzookeeper.skipACL=yes" >> $JAVA_ENV_FILE
fi
}

function create_log_props() {
Expand Down Expand Up @@ -279,9 +321,18 @@ while getopts "$optspec" optchar; do
min_session_timeout=*)
MIN_SESSION_TIMEOUT=${OPTARG##*=}
;;
standalone_enabled=*)
STANDALONE_ENABLED=${OPTARG##*=}
;;
reconfig_enabled=*)
RECONFIG_ENABLED=${OPTARG##*=}
;;
log_level=*)
LOG_LEVEL=${OPTARG##*=}
;;
skip_acl=*)
SKIP_ACL=${OPTARG##*=}
;;
*)
echo "Unknown option --${OPTARG}" >&2
exit 1
Expand Down Expand Up @@ -312,7 +363,7 @@ if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then
NAME=${BASH_REMATCH[1]}
ORD=${BASH_REMATCH[2]}
else
echo "Fialed to parse name and ordinal of Pod"
echo "Failed to parse name and ordinal of Pod"
exit 1
fi

Expand Down
86 changes: 86 additions & 0 deletions docker/scripts/zookeeper-dynamic-reconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
#!/usr/bin/env bash

# This script assumes you have skipACL configured

MYID=$(tail -n 1 $ZK_DATA_DIR/myid)
HOST=`hostname -s`
DOMAIN=`hostname -d`
CLIENT_PORT=$ZK_CS_SERVICE_PORT
SERVER_PORT=2888
ELECTION_PORT=3888
ACTION=add

if [[ $MYID -eq "1" ]]
then
# No need to dynamically add initial zk participant
exit 0
fi

# Sync with Leader
/opt/zookeeper/bin/zkCli.sh sync

function print_usage() {
echo "\
Usage: zookeeper-dynamic-reconfig [ACTION=add|remove] [OPTIONS]
Add or remove member to Zookeeper cluster.
--election_port The port on which the ZooKeeper process will perform
leader election. The default is 3888.

--server_port The port on which the ZooKeeper process will listen for
requests from other servers in the ensemble. The
default is 2888.
"
}


shopt -s nocasematch

if [[ $# -eq 0 ]]; then
print_usage
exit;
elif [[ $1 =~ "add" ]]; then
ACTION=add
elif [[ $1 =~ "remove" ]]; then
ACTION=remove
else
print_usage
exit;
fi

optspec=":hv-:"
while getopts "$optspec" optchar; do

case "${optchar}" in
-)
case "${OPTARG}" in
election_port=*)
ELECTION_PORT=${OPTARG##*=}
;;
server_port=*)
SERVER_PORT=${OPTARG##*=}
;;
*)
echo "Unknown option --${OPTARG}" >&2
exit 1
;;
esac;;
h)
print_usage
exit
;;
v)
echo "Parsing option: '-${optchar}'" >&2
;;
*)
if [ "$OPTERR" != 1 ] || [ "${optspec:0:1}" = ":" ]; then
echo "Non-option argument: '-${OPTARG}'" >&2
fi
;;
esac
done

if [[ $1 =~ "add" ]]; then
/opt/zookeeper/bin/zkCli.sh reconfig -add server.$MYID=$HOSTNAME.$DOMAIN:$SERVER_PORT:$ELECTION_PORT\;$CLIENT_PORT
else
/opt/zookeeper/bin/zkCli.sh reconfig -remove $MYID
fi
6 changes: 4 additions & 2 deletions helm/zookeeper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,12 +182,14 @@ This parameter controls when the image is pulled from the repository.

## Scaling

ZooKeeper can not be safely scaled in versions prior to 3.5.x. There are manual procedures for scaling an ensemble, but
Zookeeper can now safely scale, with the ability to perform dynamic ensemble reconfiguration, due to the recent stable release of 3.5.5.

~~ZooKeeper can not be safely scaled in versions prior to 3.5.x. There are manual procedures for scaling an ensemble, but
as noted in the [ZooKeeper 3.5.2 documentation](https://zookeeper.apache.org/doc/r3.5.2-alpha/zookeeperReconfig.html) these
procedures require a rolling restart, are known to be error prone, and often result in a data loss.

While ZooKeeper 3.5.x does allow for dynamic ensemble reconfiguration (including scaling membership), the current status
of the release is still alpha, and it is not recommended for production use.
of the release is still alpha, and it is not recommended for production use.~~

## Limitations
* StatefulSet and PodDisruptionBudget are beta resources.
Expand Down
2 changes: 1 addition & 1 deletion helm/zookeeper/values-micro.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Storage: "10Gi"
ServerPort: 2888
LeaderElectionPort: 3888
ClientPort: 2181
Image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10"
Image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.5.5"
ImagePullPolicy: "Always"
TickTimeMs: 2000
InitTicks: 10
Expand Down
2 changes: 1 addition & 1 deletion helm/zookeeper/values-mini.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Storage: "10Gi"
ServerPort: 2888
LeaderElectionPort: 3888
ClientPort: 2181
Image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10"
Image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.5.5"
ImagePullPolicy: "Always"
TickTimeMs: 2000
InitTicks: 10
Expand Down
2 changes: 1 addition & 1 deletion helm/zookeeper/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Storage: "250Gi"
ServerPort: 2888
LeaderElectionPort: 3888
ClientPort: 2181
Image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10"
Image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.5.5"
ImagePullPolicy: "Always"
TickTimeMs: 2000
InitTicks: 10
Expand Down
4 changes: 2 additions & 2 deletions manifests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ zookeeper_mini.yaml manifest to decrease the memory resource request and the jvm
```yaml
name: kubernetes-zookeeper
imagePullPolicy: Always
image: "gcr.io/google_samples/kubernetes-zookeeper:1.0-3.4.10"
image: "gcr.io/google_samples/kubernetes-zookeeper:1.0-3.5.5"
resources:
requests:
memory: "512Gi"
Expand Down Expand Up @@ -276,7 +276,7 @@ spec:
containers:
- name: kubernetes-zookeeper
imagePullPolicy: Always
image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10"
image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.5.5"
resources:
requests:
memory: "4Gi"
Expand Down
Loading