Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
upgrade: retry if default DSCI creation fails
After removing leader election, operator fails to start if it is instructed to create default DSCI. Looks like webhook is not ready by the time: ``` create default DSCI CR. {"level":"error","ts":"2024-05-13T09:25:58Z","logger":"setup","msg":"unable to create initial setup for the operator","error":"Internal error occurred: failed calling webhook \"operator.opendatahub.io\": failed to call webhook: Post \"https://opendatahub-operator-controller-manager-service.oo-2ts9m.svc:443/validate-opendatahub-io-v1?timeout=10s\": no endpoints available for service \"opendatahub-operator-controller-manager-service\"","stacktrace":"main.main.func1\n\t/workspace/main.go:200\nsigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start\n\t/remote-source/operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/manager.go:336\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/remote-source/operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/runnable_group.go:219"} ``` Leader election added some delay. The problem does not happen in default configuration since it explicitly disables DSCI creation in the manifests: ``` containers: - command: - /manager env: - name: DISABLE_DSC_CONFIG value: 'true' args: - --operator-name=opendatahub image: controller:latest ``` Make a wrapper function cluster.CreateWithRetry for client.Object creation with timeout. Use hardcoded 5s interval, just seems reasonable, and timeout in minutes as the parameter. It requires disable linter nilerr since for the polling function error in creation is a valid condition, something the function wait to disappear. Fixes: 3610b0b ("feat: remove leader election for operator (#1000)") Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
- Loading branch information