Skip to content

Commit

Permalink
[ISSUE-48][FEATURE][FOLLOW UP] add docs for operator (#261)
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
For issue #48 
I add docs about design and usage for operator.

### Why are the changes needed?
Add doc for operator

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Just doc
  • Loading branch information
wangao1236 authored Oct 13, 2022
1 parent 1ee6820 commit 8be8390
Show file tree
Hide file tree
Showing 15 changed files with 514 additions and 4 deletions.
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,13 @@ The jar for MapReduce is located in <RSS_HOME>/jars/client/mr/rss-client-mr-XXXX
Note that the RssMRAppMaster will automatically disable slow start (i.e., `mapreduce.job.reduce.slowstart.completedmaps=1`)
and job recovery (i.e., `yarn.app.mapreduce.am.job.recovery.enable=false`)

### Deploy In Kubernetes

We have provided operator of uniffle used for deploying it in kubernetes environments.

For details, see the following document:

[operator docs](docs/operator)

## Configuration

Expand Down
74 changes: 74 additions & 0 deletions deploy/kubernetes/operator/examples/configuration.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

---
kind: ConfigMap
apiVersion: v1
metadata:
name: rss-configuration
namespace: kube-system
data:
coordinator.conf: |-
rss.coordinator.app.expired 60000
rss.coordinator.exclude.nodes.file.path /data/rssadmin/rss/coo
rss.coordinator.server.heartbeat.timeout 30000
rss.jetty.http.port 19996
rss.rpc.server.port 19997
log4j.properties: |-
log4j.rootCategory=INFO, RollingAppender
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.Threshold=INFO
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.RollingAppender=org.apache.log4j.RollingFileAppender
log4j.appender.RollingAppender.File=./logs/rss.log
log4j.appender.RollingAppender.MaxFileSize=50MB
log4j.appender.RollingAppender.MaxBackupIndex=10
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %t %c{1} %M - %m%n
server.conf: |-
rss.coordinator.quorum rss-coordinator-rss-demo-0:19997,rss-coordinator-rss-demo-1:19997
rss.jetty.http.port 19996
rss.rpc.executor.size 500
rss.rpc.message.max.size 1073741824
rss.rpc.server.port 19997
rss.server.app.expired.withoutHeartbeat 120000
rss.server.buffer.capacity 60g
rss.server.commit.timeout 600000
rss.server.disk.capacity 3g
rss.server.event.size.threshold.l1 128m
rss.server.event.size.threshold.l2 192m
rss.server.event.size.threshold.l3 256m
rss.server.flush.cold.storage.threshold.size 128m
rss.server.flush.thread.alive 6
rss.server.flush.threadPool.size 12
rss.server.hadoop.dfs.client.socket-timeout 15000
rss.server.hadoop.dfs.replication 2
rss.server.hdfs.base.path hdfs://${your-hdfs-path}
rss.server.health.check.enable false
rss.server.heartbeat.interval 10000
rss.server.heartbeat.timeout 60000
rss.server.memory.shuffle.highWaterMark.percentage 70.0
rss.server.memory.shuffle.lowWaterMark.percentage 10.0
rss.server.pending.event.timeoutSec 600
rss.server.preAllocation.expired 120000
rss.server.read.buffer.capacity 5g
rss.server.shuffle.expired.timeout.ms 120000
rss.server.write.retry.max 2
rss.storage.basePath /data1/rssdata,/data10/rssdata,/data11/rssdata,/data12/rssdata,/data2/rssdata,/data3/rssdata,/data4/rssdata,/data5/rssdata,/data6/rssdata,/data7/rssdata,/data8/rssdata,/data9/rssdata
rss.storage.type MEMORY_LOCALFILE
33 changes: 33 additions & 0 deletions deploy/kubernetes/operator/examples/full-restart/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

# Fully Restart of Shuffle Servers

If we want to restart shuffle server pods in full, we need to set `.spec.shuffleServer.sync` field to `true`, and
update `.spec.shuffleServer.upgradeStrategy.type` field to be `FullRestart`.

```yaml
spec:
shuffleServer:
sync: true
upgradeStrategy:
type: "FullRestart"
```
Unlike full upgrade, full restart does not require configuration and image modification.
We can refer to the [example](rss-full-restart.yaml).
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ metadata:
name: rss-full-restart-demo
namespace: kube-system
spec:
configMapName: rss-full-restart-demo
configMapName: "${rss-configuration-name}"
coordinator:
image: "${rss-coordinator-image}"
initContainerImage: "busybox:latest"
Expand Down
38 changes: 38 additions & 0 deletions deploy/kubernetes/operator/examples/full-upgrade/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

# Fully Upgrade of Shuffle Servers

If we want to upgrade shuffle servers in full, we first need to update the configuration files in the configMap.

Then, we need to edit the rss object as follows:

+ update `.spec.shuffleServer.image` with new image version of shuffle server
+ set `.spec.shuffleServer.sync` field to `true`
+ update `.spec.shuffleServer.upgradeStrategy` field:
+ set `.spec.shuffleServer.upgradeStrategy.type` to be `FullUpgrade`

```yaml
spec:
shuffleServer:
image: "${rss-shuffle-server-image}"
sync: true
upgradeStrategy:
type: "FullUpgrade"
```
We can refer to the [example](rss-full-upgrade.yaml).
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ metadata:
name: rss-full-upgrade-demo
namespace: kube-system
spec:
configMapName: rss-full-upgrade-demo
configMapName: "${rss-configuration-name}"
coordinator:
image: "${rss-coordinator-image}"
initContainerImage: "busybox:latest"
Expand Down
42 changes: 42 additions & 0 deletions deploy/kubernetes/operator/examples/partition-upgrade/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

# Specific Upgrade of Shuffle Servers

If we want to upgrade shuffle servers in partition mode, we need to edit the rss object as follows:

+ update `.spec.shuffleServer.image` with new image version of shuffle server
+ set `.spec.shuffleServer.sync` field to `true`
+ update `.spec.shuffleServer.upgradeStrategy` field:
+ set `.spec.shuffleServer.upgradeStrategy.type` to be `PartitionUpgrade`
+ update `.spec.shuffleServer.upgradeStrategy.partition` field, which has the same meaning
as `.spec.updateStrategy.rollingUpdate.partition` field
in [StatefulSet workload](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) (the replicas
whose index is less than this value will keep the old version, and the replicas whose index is greater than or
equal to this value will be updated to the new version)

```yaml
spec:
shuffleServer:
image: "${rss-shuffle-server-image}"
sync: true
upgradeStrategy:
type: "PartitionUpgrade"
partition: 2
```
We can refer to the [example](rss-partition-upgrade.yaml).
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ metadata:
name: rss-parition-upgrade-demo
namespace: kube-system
spec:
configMapName: rss-parition-upgrade-demo
configMapName: "${rss-configuration-name}"
coordinator:
image: "${rss-coordinator-image}"
initContainerImage: "busybox:latest"
Expand Down
38 changes: 38 additions & 0 deletions deploy/kubernetes/operator/examples/specific-upgrade/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

# Specific Upgrade of Shuffle Servers

If we want to upgrade shuffle servers by specifying replicas, we need to edit the rss object as follows:

- update `.spec.shuffleServer.image` with new image version of shuffle server
- set `.spec.shuffleServer.sync` field to `true`
- update `.spec.shuffleServer.upgradeStrategy` field:
- set `.spec.shuffleServer.upgradeStrategy.type` to be `SpecificUpgrade`
- update `.spec.shuffleServer.upgradeStrategy.specificNames` field, which means the pod name we want to upgrade

```yaml
spec:
shuffleServer:
image: "${rss-shuffle-server-image}"
sync: true
upgradeStrategy:
type: "SpecificUpgrade"
specificNames: [ "rss-shuffle-server-demo-0" ]
```
We can refer to the [example](rss-specific-upgrade.yaml).
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ metadata:
name: rss-specific-upgrade-demo
namespace: kube-system
spec:
configMapName: rss-specific-upgrade-demo
configMapName: "${rss-configuration-name}"
coordinator:
image: "${rss-coordinator-image}"
initContainerImage: "busybox:latest"
Expand Down
Binary file added docs/asset/rss-crd-state-transition.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions docs/operator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

# Uniffle Operator

The __[Uniffle Operator](https://github.com/apache/incubator-uniffle/tree/master/deploy/kubernetes/operator/)__ manages
Apache Uniffle Cluster within Kubernetes.

The operator is currently in beta (`v1alpha1`), and while we do not anticipate changing the API in
backwards-incompatible ways there is no such guarantee yet.

## Documentation

Please visit the following pages for documentation on using and developing the Uniffle Operator:

- [Installation](install.md): step-by-step instructions on how to get uniffle operator running on our cluster
- [Design & Usage](design.md): overview design of operator and detail usage of CRD

### Examples

Example uses of each CRD have been [provided](examples.md).
Loading

0 comments on commit 8be8390

Please sign in to comment.