Skip to content

Commit

Permalink
fix: Fix namespace-scoped install (#212)
Browse files Browse the repository at this point in the history
#### Motivation

There are some problems when installing in namespace-scoped mode:
- Though the docs mentioned that the RBAC should be changed, the install script currently applies the controller `ClusterRole` and `ClusterRoleBinding` (including namespace resource permission) regardless of whether the `--namespace-scope-mode` option is provided
- The combination of the `NAMESPACE_SCOPE=true` env var and namespace resource permissions results in a broken/conflicted state where the controller will "mostly" operate in cluster scoped mode and watch / delete resources in other namespaces

#### Modifications
- Split RBAC manifests into cluster-scope and namespace-scope variants, the latter including `Role`/`RoleBinding` instead of Cluster level equivalents
- Have install script apply the appropriate RBAC based on the namespace-scope-mode cli setting (in addition to setting the controller env var as is already done)
- Change the controller init logic to always use namespace mode if the env var is set
- Update documentation accordingly

#### Result

Namespace scoped install works properly, won't touch any other namespaces.

It would probably also be good to add some safeguard checks to detect/prohibit cluster scope installs in more than one namespace, but that's outside of the scope of this PR.

Signed-off-by: Nick Hill <nickhill@us.ibm.com>
njhill authored Aug 20, 2022

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
1 parent d879588 commit a125958
Showing 30 changed files with 285 additions and 36 deletions.
1 change: 0 additions & 1 deletion config/default/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -92,5 +92,4 @@ apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../crd
- ../rbac
- ../manager
17 changes: 17 additions & 0 deletions config/rbac/cluster-scope/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright 2022 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
resources:
- ../common
- role.yaml
- role_binding.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 IBM Corporation
# Copyright 2022 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 IBM Corporation
# Copyright 2022 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -13,8 +13,6 @@
# limitations under the License.
resources:
- service-account.yaml
- role.yaml
- role_binding.yaml
- leader_election_role.yaml
- leader_election_role_binding.yaml
- restricted_scc_role.yaml
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
17 changes: 17 additions & 0 deletions config/rbac/namespace-scope/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright 2022 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
resources:
- ../common
- role.yaml
- role_binding.yaml
166 changes: 166 additions & 0 deletions config/rbac/namespace-scope/role.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Copyright 2022 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: modelmesh-controller-role
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- endpoints
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- services
- services/finalizers
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- apps
resources:
- deployments
- deployments/finalizers
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- monitoring.coreos.com
resources:
- servicemonitors
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- serving.kserve.io
resources:
- inferenceservices
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- serving.kserve.io
resources:
- inferenceservices/finalizers
verbs:
- get
- patch
- update
- apiGroups:
- serving.kserve.io
resources:
- inferenceservices/status
verbs:
- get
- patch
- update
- apiGroups:
- serving.kserve.io
resources:
- predictors
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- serving.kserve.io
resources:
- predictors/finalizers
verbs:
- get
- patch
- update
- apiGroups:
- serving.kserve.io
resources:
- predictors/status
verbs:
- get
- patch
- update
- apiGroups:
- serving.kserve.io
resources:
- servingruntimes
- servingruntimes/finalizers
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- serving.kserve.io
resources:
- servingruntimes/status
verbs:
- get
- patch
- update
24 changes: 24 additions & 0 deletions config/rbac/namespace-scope/role_binding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Copyright 2022 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: modelmesh-controller-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: modelmesh-controller-role
subjects:
- kind: ServiceAccount
name: modelmesh-controller
4 changes: 2 additions & 2 deletions docs/install/README.md
Original file line number Diff line number Diff line change
@@ -10,7 +10,7 @@

- **S3-compatible object storage** - Before models can be deployed, a remote S3-compatible datastore is needed from which to pull the model data. This could be for example an [IBM Cloud Object Storage](https://www.ibm.com/cloud/object-storage) instance, or a locally running [MinIO](https://github.com/minio/minio) deployment. Note that this is not required to be in place prior to the initial installation.

We provide an install script to quickly run ModelMesh Serving with a provisioned etcd server. This may be useful for experimentation or development but should not be used in production.
We provide an install script `--quickstart` option to quickly run ModelMesh Serving with a provisioned etcd server. This may be useful for experimentation or development but should not be used in production.

## Cluster Scope or Namespace Scope

@@ -19,7 +19,7 @@ ModelMesh Serving can be used in either cluster scope or namespace mode.
- **Cluster scope mode** - Its components can exist in multiple user namespaces which are controlled by one instance of ModelMesh Serving Controller in the control plane namespace. Only one ModelMesh Serving instance can be installed within a Kubernetes cluster. A namespace label `modelmesh-enabled` needs to be "true" to enable a user namespace for ModelMesh Serving.
- **Namespace scope mode** - All of its components must exist within a single namespace and only one instance of ModelMesh Serving can be installed per namespace. Multiple ModelMesh Serving instances can be installed in separate namespaces within the cluster.

The default configuration is for the cluster scope mode. Change RBAC permissions, cluster role to role, and cluster role binding to role binding, to deploy ModelMesh Serving in the namespace scope mode.
The default configuration is for the cluster scope mode. Use the `--namespace-scope-mode` option of the install script for namespace scope.

## Deployed Components

42 changes: 20 additions & 22 deletions main.go
Original file line number Diff line number Diff line change
@@ -171,7 +171,6 @@ func main() {
var leaseDuration time.Duration
var leaseRenewDeadline time.Duration
var leaseRetryPeriod time.Duration
var clusterScopeMode bool
flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
flag.BoolVar(&enableLeaderElection, "enable-leader-election", false,
@@ -187,43 +186,42 @@ func main() {
"Duration the Leader elector clients should wait between tries of actions.")
flag.Parse()

mgrNamespace := ""
trueString := "true"

// Controller can be in namespace or cluster scope mode depending on an env variable
clusterScopeMode = os.Getenv(NamespaceScopeEnvVar) != trueString
clusterScopeMode := os.Getenv(NamespaceScopeEnvVar) != "true"

// Here we check whether RBAC is set for cluster scope
err = cl.Get(context.Background(), client.ObjectKey{Name: "foo"}, &corev1.Namespace{})
hasClusterPermissions := err == nil || errors.IsNotFound(err)

if err == nil || errors.IsNotFound(err) {
// Controller has cluster permissions
if clusterScopeMode {
setupLog.Info("Controller operating in cluster scope mode, will attempt to watch all namespaces")
} else {
// Config mismatch, namespace mode with cluster permissions, will continue with a log
setupLog.Info("In namespace scope mode but controller has cluster scope permissions, continue")
}
} else {
// Controller has namespace permissions
if clusterScopeMode {
if clusterScopeMode {
if !hasClusterPermissions {
// Config mismatch, cluster mode without cluster permissions, exit
setupLog.Error(fmt.Errorf("Insufficient permission for controller"), "In cluster scope mode but controller has namespace scope permissions, exit")
setupLog.Error(nil, "In cluster scope mode but controller does not have cluster scope permissions, exiting")
os.Exit(1)
} else {
mgrNamespace = ControllerNamespace
setupLog.Info("Controller operating in own-namespace only mode")
}
setupLog.Info("Controller operating in cluster scope mode, will attempt to watch/manage all namespaces")
} else {
// Namespace-scope mode configured
setupLog.Info("Controller operating in namespace scope (own-namespace only) mode",
"namespace", ControllerNamespace)

if hasClusterPermissions {
setupLog.Error(nil, "Warning: In namespace scope mode but controller has permission to access cluster namespace resources")
}
}

mgrOpts := ctrl.Options{
Scheme: scheme,
MetricsBindAddress: metricsAddr,
Namespace: mgrNamespace,
Port: 9443,
HealthProbeBindAddress: probeAddr,
}

if !clusterScopeMode {
// Set manager to operate scoped to our namespace
mgrOpts.Namespace = ControllerNamespace
}

if enableLeaderElection {
if leaderElectionType == "lease" {
setupLog.Info("using leader-with-lease election method")
@@ -317,7 +315,7 @@ func main() {
registryMap[registryKey] = registryValue
setupLog.Info(fmt.Sprintf("Reconciliation of %s is enabled", resourceName))
return true
} else if envVarVal == trueString {
} else if envVarVal == "true" {
// If env var is explicitly true, require that specified CRD is present
setupLog.Error(err, fmt.Sprintf("Unable to access %s Custom Resource", resourceName))
os.Exit(1)
17 changes: 15 additions & 2 deletions scripts/delete.sh
Original file line number Diff line number Diff line change
@@ -82,9 +82,12 @@ if [[ -n $namespace ]]; then
fi

# Ensure the namespace is overridden for all the resources
cd default
pushd default
kustomize edit set namespace "$namespace"
cd ..
popd
pushd rbac/namespace-scope
kustomize edit set namespace "$namespace"
popd

# Older versions of kustomize have different load restrictor flag formats.
# Can be removed once Kubeflow installation stops requiring v3.2.
@@ -115,7 +118,17 @@ if [[ ! -z $user_ns_array ]]; then
rm runtimes.yaml
fi

# Determine whether a modelmesh-controller-rolebinding clusterrolebinding exists and is
# associated with the service account in this namespace. If not, don't delete the cluster level RBAC.
set +e
crb_ns=$(kubectl get clusterrolebinding modelmesh-controller-rolebinding -o json | jq -r .subjects[0].namespace)
set -e
if [[ "$crb_ns" == "$namespace" ]]; then
echo "deleting cluster scope RBAC"
kustomize build rbac/cluster-scope | kubectl delete -f - --ignore-not-found=true
fi
kustomize build default | kubectl delete -f - --ignore-not-found=true
kustomize build rbac/namespace-scope | kubectl delete -f - --ignore-not-found=true
kustomize build runtimes ${kustomize_load_restrictor_arg} | kubectl delete -f - --ignore-not-found=true
kubectl delete -f dependencies/quickstart.yaml --ignore-not-found=true
kubectl delete -f dependencies/fvt.yaml --ignore-not-found=true
27 changes: 22 additions & 5 deletions scripts/install.sh
Original file line number Diff line number Diff line change
@@ -44,7 +44,7 @@ function showHelp() {
echo "Kubernetes namespaces."
echo
echo "Expects cluster-admin authority and Kube cluster access to be configured prior to running."
echo "Also requires Etcd secret 'model-serving-etcd' to be created in namespace already."
echo "Also requires etcd secret 'model-serving-etcd' to be created in namespace already."
}

die() {
@@ -220,9 +220,15 @@ else
fi

# Ensure the namespace is overridden for all the resources
cd default
pushd default
kustomize edit set namespace "$namespace"
cd ..
popd
pushd rbac/namespace-scope
kustomize edit set namespace "$namespace"
popd
pushd rbac/cluster-scope
kustomize edit set namespace "$namespace"
popd

# Clean up previous instances but do not fail if they do not exist
cp dependencies/quickstart.yaml .
@@ -233,6 +239,10 @@ if [[ $delete == "true" ]]; then
info "Deleting any previous ModelMesh Serving instances and older CRD with serving.kserve.io api group name"
kubectl delete crd/predictors.serving.kserve.io --ignore-not-found=true
kubectl delete crd/servingruntimes.serving.kserve.io --ignore-not-found=true
kustomize build rbac/namespace-scope | kubectl delete -f - --ignore-not-found=true
if [[ $namespace_scope_mode != "true" ]]; then
kustomize build rbac/cluster-scope | kubectl delete -f - --ignore-not-found=true
fi
kustomize build default | kubectl delete -f - --ignore-not-found=true
kubectl delete -f quickstart.yaml --ignore-not-found=true
kubectl delete -f fvt.yaml --ignore-not-found=true
@@ -259,7 +269,7 @@ if [[ $fvt == "true" ]]; then
fi

if ! kubectl get secret model-serving-etcd >/dev/null; then
die "Could not find Etcd kube secret 'model-serving-etcd'. This is a prerequisite for running ModelMesh Serving install."
die "Could not find etcd kube secret 'model-serving-etcd'. This is a prerequisite for running ModelMesh Serving install."
else
echo "model-serving-etcd secret found"
fi
@@ -268,7 +278,14 @@ info "Creating storage-config secret if it does not exist"
kubectl create -f default/storage-secret.yaml 2>/dev/null || :
kubectl get secret storage-config

info "Installing ModelMesh Serving CRDs, RBACs, and controller"
info "Installing ModelMesh Serving RBACs (namespace_scope_mode=$namespace_scope_mode)"
if [[ $namespace_scope_mode == "true" ]]; then
kustomize build rbac/namespace-scope | kubectl apply -f -
else
kustomize build rbac/cluster-scope | kubectl apply -f -
fi

info "Installing ModelMesh Serving CRDs and controller"
kustomize build default | kubectl apply -f -

if [[ $dev_mode_logging == "true" ]]; then

0 comments on commit a125958

Please sign in to comment.