Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support restart of service pods #2378

Merged
merged 2 commits into from
Nov 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Part of the above parameters are described as follows:
| :------------------------------------- | :------------------------------ | :----------------------------------------- |
| `image.nebulaOperator.image` | `vesoft/nebula-operator:{{operator.tag}}` | The image of NebulaGraph Operator, version of which is {{operator.release}}. |
| `image.nebulaOperator.imagePullPolicy` | `IfNotPresent` | The image pull policy in Kubernetes. |
| `imagePullSecrets` | - | The image pull secret in Kubernetes. |
| `imagePullSecrets` | - | The image pull secret in Kubernetes. For example `imagePullSecrets[0].name="vesoft"`.|
| `kubernetesClusterDomain` | `cluster.local` | The cluster domain. |
| `controllerManager.create` | `true` | Whether to enable the controller-manager component. |
| `controllerManager.replicas` | `2` | The number of controller-manager replicas. |
Expand All @@ -78,4 +78,18 @@ The following example shows how to enable AdmissionWebhook when you install Nebu
helm install nebula-operator nebula-operator/nebula-operator --namespace=<nebula-operator-system> --set admissionWebhook.create=true
```

For more information about `helm install`, see [Helm Install](https://helm.sh/docs/helm/helm_install/).
Check whether the specified configuration of NebulaGraph Operator is installed successfully:

```bash
helm get values nebula-operator -n <nebula-operator-system>
```

Example output:

```yaml
USER-SUPPLIED VALUES:
admissionWebhook:
create: true
```

For more information about `helm install`, see [Helm Install](https://helm.sh/docs/helm/helm_install/).
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,18 @@ This topic introduces how to update the configuration of NebulaGraph Operator.
helm upgrade nebula-operator nebula-operator/nebula-operator --namespace=nebula-operator-system --version={{operator.release}} --set admissionWebhook.create=true
```

For more information, see [Helm upgrade](https://helm.sh/docs/helm/helm_update/).
For more information, see [Helm upgrade](https://helm.sh/docs/helm/helm_upgrade/).

4. Check whether the configuration of NebulaGraph Operator is updated successfully.

```bash
helm get values nebula-operator -n nebula-operator-system
```

Example output:

```yaml
USER-SUPPLIED VALUES:
admissionWebhook:
create: true
```
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@

Output:

```bash
```yaml
Release "nebula-operator" has been upgraded. Happy Helming!
NAME: nebula-operator
LAST DEPLOYED: Tue Apr 16 02:21:08 2022
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,9 @@ Using NebulaGraph Operator to install NebulaGraph clusters enables automated clu
--version={{operator.release}} \
# Specify the namespace for the NebulaGraph cluster.
--namespace="${NEBULA_CLUSTER_NAMESPACE}" \
# Customize the chart release name.
# Configure the Secret for pulling images from the private repository.
--set imagePullSecrets[0].name="{<image-pull-secret>}" \
# Customize the cluster name.
--set nameOverride="${NEBULA_CLUSTER_NAME}" \
--set nebula.storageClassName="${STORAGE_CLASS_NAME}" \
# Specify the version for the NebulaGraph cluster.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Operator triggers a rolling update of the NebulaGraph cluster under the followin

- The version of the NebulaGraph cluster changes.
- The configuration of the NebulaGraph cluster changes.
- NebulaGraph cluster services are restarted.

## Specify a rolling update strategy

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
# Restart service Pods in a NebulaGraph cluster on K8s

!!! note

Restarting NebulaGraph cluster service Pods is a feature in the Alpha version.

During routine maintenance, it might be necessary to restart a specific service Pod in the NebulaGraph cluster, for instance, when the Pod's status is abnormal or to enforce a restart. Restarting a Pod essentially means restarting the service process. To ensure high availability, NebulaGraph Operator supports gracefully restarting all Pods of the Graph, Meta, or Storage service respectively and gracefully restarting an individual Pod of the Storage service.

## Prerequisites

A NebulaGraph cluster is created in a K8s environment. For details, see [Create a NebulaGraph cluster](../4.1.installation/4.1.1.cluster-install.md).

## Restart all Pods of a certain service type

To gracefully roll restart all Pods of a certain service type in the cluster, you can add an annotation (`nebula-graph.io/restart-timestamp`) with the current time to the configuration of the StatefulSet controller of the corresponding service.

When NebulaGraph Operator detects that the StatefulSet controller of the corresponding service has the annotation `nebula-graph.io/restart-timestamp` and its value is changed, it triggers the graceful rolling restart operation for all Pods of that service type in the cluster.

In the following example, the annotation is added for all Graph services so that all Pods of these Graph services are restarted one by one.

Assume that the cluster name is `nebula` and the cluster resources are in the `default` namespace. Run the following command:

1. Check the name of the StatefulSet controller.

```bash
kubectl get statefulset
```

Sample output:

```bash
NAME READY AGE
nebula-graphd 2/2 33s
nebula-metad 3/3 69s
nebula-storaged 3/3 69s
```

2. Get the current timestamp.

```bash
date -u +%s
```

Example output:

```bash
1700547115
```

3. Overwrite the timestamp annotation of the StatefulSet controller to trigger the graceful rolling restart operation.

```bash
kubectl annotate statefulset nebula-graphd nebula-graph.io/restart-timestamp="1700547115" --overwrite
```

Example output:

```bash
statefulset.apps/nebula-graphd annotate
```

4. Observe the restart process.

```bash
kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=graphd -w
```

Example output:

```bash
NAME READY STATUS RESTARTS AGE
nebula-graphd-0 1/1 Running 0 9m37s
nebula-graphd-1 0/1 Running 0 17s
nebula-graphd-1 1/1 Running 0 20s
nebula-graphd-0 1/1 Terminating 0 9m40s
nebula-graphd-0 0/1 Terminating 0 9m41s
nebula-graphd-0 0/1 Terminating 0 9m42s
nebula-graphd-0 0/1 Terminating 0 9m42s
nebula-graphd-0 0/1 Terminating 0 9m42s
nebula-graphd-0 0/1 Pending 0 0s
nebula-graphd-0 0/1 Pending 0 0s
nebula-graphd-0 0/1 ContainerCreating 0 0s
nebula-graphd-0 0/1 Running 0 2s
```

This above output shows the status of Graph service Pods during the restart process.

5. Verify that the StatefulSet controller annotation is updated.

```bash
kubectl get statefulset nebula-graphd -o yaml | grep "nebula-graph.io/restart-timestamp"

```

Example output:

```yaml
nebula-graph.io/last-applied-configuration: '{"persistentVolumeClaimRetentionPolicy":{"whenDeleted":"Retain","whenScaled":"Retain"},"podManagementPolicy":"Parallel","replicas":2,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"serviceName":"nebula-graphd-headless","template":{"metadata":{"annotations":{"nebula-graph.io/cm-hash":"7c55c0e5ac74e85f","nebula-graph.io/restart-timestamp":"1700547815"},"creationTimestamp":null,"labels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"spec":{"containers":[{"command":["/bin/sh","-ecx","exec
nebula-graph.io/restart-timestamp: "1700547115"
nebula-graph.io/restart-timestamp: "1700547815"
```

The above output indicates that the annotation of the StatefulSet controller has been updated, and all graph service Pods has been restarted.


## Restart a single Storage service Pod

To gracefully roll restart a single Storage service Pod, you can add an annotation (`nebula-graph.io/restart-ordinal`) with the value set to the ordinal number of the Storage service Pod you want to restart. This triggers a graceful restart or state transition for that specific Storage service Pod. The added annotation will be automatically removed after the Storage service Pod is restarted.

In the following example, the annotation is added for the Pod with ordinal number `1`, indicating a graceful restart for the `nebula-storaged-1` Storage service Pod.

Assume that the cluster name is `nebula`, and the cluster resources are in the `default` namespace. Run the following commands:

1. Check the name of the StatefulSet controller.

```bash
kubectl get statefulset
```

Example output:

```bash
NAME READY AGE
nebula-graphd 2/2 33s
nebula-metad 3/3 69s
nebula-storaged 3/3 69s
```

2. Get the ordinal number of the Storage service Pod.

```bash
kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=storaged
```

Example output:

```bash
NAME READY STATUS RESTARTS AGE
nebula-storaged-0 1/1 Running 0 13h
nebula-storaged-1 1/1 Running 0 13h
nebula-storaged-2 1/1 Running 0 13h
nebula-storaged-3 1/1 Running 0 13h
nebula-storaged-4 1/1 Running 0 13h
nebula-storaged-5 1/1 Running 0 13h
nebula-storaged-6 1/1 Running 0 13h
nebula-storaged-7 1/1 Running 0 13h
nebula-storaged-8 1/1 Running 0 13h
```

3. Add the annotation for the `nebula-storaged-1` pod to trigger a graceful restart for that specific pod.

```bash
kubectl annotate statefulset nebula-storaged nebula-graph.io/restart-ordinal="1"
```

Example output:

```bash
statefulset.apps/nebula-storaged annotate
```

4. Observe the restart process.

```bash
kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=storaged -w
```

Example output:

```bash
NAME READY STATUS RESTARTS AGE
nebula-storaged-0 1/1 Running 0 13h
nebula-storaged-1 1/1 Running 0 13h
nebula-storaged-2 1/1 Running 0 13h
nebula-storaged-3 1/1 Running 0 13h
nebula-storaged-4 1/1 Running 0 13h
nebula-storaged-5 1/1 Running 0 12h
nebula-storaged-6 1/1 Running 0 12h
nebula-storaged-7 1/1 Running 0 12h
nebula-storaged-8 1/1 Running 0 12h


nebula-storaged-1 1/1 Running 0 13h
nebula-storaged-1 1/1 Terminating 0 13h
nebula-storaged-1 0/1 Terminating 0 13h
nebula-storaged-1 0/1 Terminating 0 13h
nebula-storaged-1 0/1 Terminating 0 13h
nebula-storaged-1 0/1 Terminating 0 13h
nebula-storaged-1 0/1 Pending 0 0s
nebula-storaged-1 0/1 Pending 0 0s
nebula-storaged-1 0/1 ContainerCreating 0 0s
nebula-storaged-1 0/1 Running 0 1s
nebula-storaged-1 1/1 Running 0 10s

The above output indicates that the `nebula-storaged-1` Storage service Pod has been successfully restarted.
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ scheduler:
| :------------------------------------- | :------------------------------ | :----------------------------------------- |
| `image.nebulaOperator.image` | `vesoft/nebula-operator:{{operator.tag}}` | NebulaGraph Operator 的镜像,版本为{{operator.release}}。 |
| `image.nebulaOperator.imagePullPolicy` | `IfNotPresent` | 镜像拉取策略。 |
| `imagePullSecrets` | `[]` | 镜像拉取密钥 |
| `imagePullSecrets` | `[]` | 镜像拉取密钥,例如`imagePullSecrets[0].name="vesoft"`。|
| `kubernetesClusterDomain` | `cluster.local` | 集群域名。 |
| `controllerManager.create` | `true` | 是否启用 controller-manager。 |
| `controllerManager.replicas` | `2` | controller-manager 副本数。 |
Expand All @@ -77,4 +77,20 @@ scheduler:

```bash
helm install nebula-operator nebula-operator/nebula-operator --namespace=<nebula-operator-system> --set admissionWebhook.create=true
```
```

验证是否开启 AdmissionWebhook:

```bash
helm get values nebula-operator -n <nebula-operator-system>
```

示例输出:

```yaml
USER-SUPPLIED VALUES:
admissionWebhook:
create: true
```

关于`helm install`命令更多信息, 参见 [Helm Install](https://helm.sh/docs/helm/helm_install/)。
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,20 @@
helm upgrade nebula-operator nebula-operator/nebula-operator --namespace=nebula-operator-system --version={{operator.release}} --set admissionWebhook.create=true
```

更多信息,参考 [Helm 升级](https://helm.sh/docs/helm/helm_update/)。
更多信息,参考 [Helm 升级](https://helm.sh/docs/helm/helm_upgrade/)。

4. 查看 NebulaGraph Operator 的配置是否更新成功。

```bash
helm get values nebula-operator -n nebula-operator-system
```

示例输出:

```yaml
USER-SUPPLIED VALUES:
admissionWebhook:
create: true
```


Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@

输出:

```bash
```yaml
Release "nebula-operator" has been upgraded. Happy Helming!
NAME: nebula-operator
LAST DEPLOYED: Tue Nov 16 02:21:08 2021
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -260,13 +260,14 @@
--version={{operator.release}} \
# 指定 {{nebula.name}} 集群所处的命名空间。
--namespace="${NEBULA_CLUSTER_NAMESPACE}" \
# 自定义 chart 发布名称。
# 配置拉取私有仓库中镜像的 Secret。
--set imagePullSecrets[0].name="{<image-pull-secret>}" \
# 自定义集群名称。
--set nameOverride="${NEBULA_CLUSTER_NAME}" \
--set nebula.storageClassName="${STORAGE_CLASS_NAME}" \
# 指定 {{nebula.name}} 集群的版本。
--set nebula.version=v{{nebula.release}}
```

```

7. 查看 {{nebula.name}} 集群 Pod 的启动状态。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Operator 会在以下情况下触发 {{nebula.name}} 集群的滚动更新:

- {{nebula.name}} 集群的版本发生变化。
- {{nebula.name}} 集群的配置发生变化。
- {{nebula.name}} 集群的服务执行重启操作。

## 配置滚动更新策略

Expand Down
Loading