Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ensures SMCP is created before any other features #1118

Conversation

bartoszmajsak
Copy link
Contributor

@bartoszmajsak bartoszmajsak commented Jul 12, 2024

Description

With the #1052 refactoring, the order of features added to the Registry was accidentally changed. It results in failing of metrics collection feature which expects SMCP to be created first, but the creation runs afterwards. The setup is eventually consistent, as the reconcile will retry, so this not a bug per se, but results in unnecassary errors.

This fix ensures features are ordered as before and levarages .EnabledWhen instead of wrapping features in ifs.

How Has This Been Tested?

  • create DSCI with Service Mesh enabled
  • observe logs
BEFORE: Reconcile log
[
  {
    "level": "info",
    "ts": "2024-07-12T07:52:53Z",
    "logger": "features",
    "msg": "waiting for control plane components to be ready",
    "feature": "mesh-metrics-collection",
    "control-plane": "data-science-smcp",
    "namespace": "istio-system",
    "duration (s)": 300
  },
  {
    "level": "error",
    "ts": "2024-07-12T07:52:55Z",
    "logger": "features",
    "msg": "failed waiting for control plane being ready",
    "feature": "mesh-metrics-collection",
    "control-plane": "data-science-smcp",
    "namespace": "istio-system",
    "error": "failed to find Service Mesh Control Plane: servicemeshcontrolplanes.maistra.io \"data-science-smcp\" not found",
    "stacktrace": "github.com/opendatahub-io/opendatahub-operator/v2/pkg/feature/servicemesh.EnsureServiceMeshInstalled\n\t/workspace/pkg/feature/servicemesh/conditions.go:55\ngithub.com/opendatahub-io/opendatahub-operator/v2/pkg/feature.(*Feature).applyFeature\n\t/workspace/pkg/feature/feature.go:110\ngithub.com/opendatahub-io/opendatahub-operator/v2/pkg/feature.(*Feature).Apply\n\t/workspace/pkg/feature/feature.go:93\ngithub.com/opendatahub-io/opendatahub-operator/v2/pkg/feature.(*FeaturesHandler).Apply\n\t/workspace/pkg/feature/handler.go:66\ngithub.com/opendatahub-io/opendatahub-operator/v2/pkg/feature.HandlerWithReporter[...].Apply\n\t/workspace/pkg/feature/handler.go:138\ngithub.com/opendatahub-io/opendatahub-operator/v2/controllers/dscinitialization.(*DSCInitializationReconciler).configureServiceMesh\n\t/workspace/controllers/dscinitialization/servicemesh_setup.go:41\ngithub.com/opendatahub-io/opendatahub-operator/v2/controllers/dscinitialization.(*DSCInitializationReconciler).Reconcile\n\t/workspace/controllers/dscinitialization/dscinitialization_controller.go:267\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235"
  },
  {
    "level": "info",
    "ts": "2024-07-12T07:52:55Z",
    "logger": "features",
    "msg": "waiting for pods to become ready",
    "feature": "mesh-control-plane-creation",
    "namespace": "istio-system",
    "duration (s)": 300
  },
  {
    "level": "info",
    "ts": "2024-07-12T07:53:21Z",
    "logger": "features",
    "msg": "done waiting for pods to become ready",
    "feature": "mesh-control-plane-creation",
    "namespace": "istio-system"
  },
  {
    "level": "error",
    "ts": "2024-07-12T07:53:21Z",
    "logger": "opendatahub.controllers.DSCInitialization",
    "msg": "failed applying service mesh resources",
    "error": "1 error occurred:\n\t* failed applying FeatureHandler features. cause: 1 error occurred:\n\t* 2 errors occurred:\n\t* failed to find Service Mesh Control Plane: servicemeshcontrolplanes.maistra.io \"data-science-smcp\" not found\n\t* service mesh control plane is not ready\n\n\n\n\n\n",
    "stacktrace": "github.com/opendatahub-io/opendatahub-operator/v2/controllers/dscinitialization.(*DSCInitializationReconciler).configureServiceMesh\n\t/workspace/controllers/dscinitialization/servicemesh_setup.go:43\ngithub.com/opendatahub-io/opendatahub-operator/v2/controllers/dscinitialization.(*DSCInitializationReconciler).Reconcile\n\t/workspace/controllers/dscinitialization/dscinitialization_controller.go:267\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235"
  },
  {
    "level": "error",
    "ts": "2024-07-12T07:53:21Z",
    "msg": "Reconciler error",
    "controller": "dscinitialization",
    "controllerGroup": "dscinitialization.opendatahub.io",
    "controllerKind": "DSCInitialization",
    "DSCInitialization": {
      "name": "default-dsci"
    },
    "namespace": "",
    "name": "default-dsci",
    "reconcileID": "929c88d0-cec4-4466-b844-3e3cb288ddd2",
    "error": "1 error occurred:\n\t* failed applying FeatureHandler features. cause: 1 error occurred:\n\t* 2 errors occurred:\n\t* failed to find Service Mesh Control Plane: servicemeshcontrolplanes.maistra.io \"data-science-smcp\" not found\n\t* service mesh control plane is not ready\n\n\n\n\n\n",
    "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235"
  },
  {
    "level": "info",
    "ts": "2024-07-12T07:53:21Z",
    "logger": "features",
    "msg": "waiting for control plane components to be ready",
    "feature": "mesh-metrics-collection",
    "control-plane": "data-science-smcp",
    "namespace": "istio-system",
    "duration (s)": 300
  },
  {
    "level": "info",
    "ts": "2024-07-12T07:53:23Z",
    "logger": "features",
    "msg": "done waiting for control plane components to be ready",
    "feature": "mesh-metrics-collection",
    "control-plane": "data-science-smcp",
    "namespace": "istio-system"
  },
  {
    "level": "info",
    "ts": "2024-07-12T07:53:23Z",
    "logger": "features",
    "msg": "waiting for pods to become ready",
    "feature": "mesh-control-plane-creation",
    "namespace": "istio-system",
    "duration (s)": 300
  },
  {
    "level": "info",
    "ts": "2024-07-12T07:53:25Z",
    "logger": "features",
    "msg": "done waiting for pods to become ready",
    "feature": "mesh-control-plane-creation",
    "namespace": "istio-system"
  },
AFTER: Reconcile log
[
  {
    "level": "info",
    "ts": "2024-07-12T10:11:48+02:00",
    "logger": "features",
    "msg": "waiting for pods to become ready",
    "feature": "mesh-control-plane-creation",
    "namespace": "istio-system",
    "duration (s)": 300
  },
  {
    "level": "info",
    "ts": "2024-07-12T10:11:54+02:00",
    "logger": "features",
    "msg": "done waiting for pods to become ready",
    "feature": "mesh-control-plane-creation",
    "namespace": "istio-system"
  },
  {
    "level": "info",
    "ts": "2024-07-12T10:11:56+02:00",
    "logger": "features",
    "msg": "waiting for control plane components to be ready",
    "feature": "mesh-metrics-collection",
    "control-plane": "data-science-smcp",
    "namespace": "istio-system",
    "duration (s)": 300
  },
  {
    "level": "info",
    "ts": "2024-07-12T10:11:58+02:00",
    "logger": "features",
    "msg": "done waiting for control plane components to be ready",
    "feature": "mesh-metrics-collection",
    "control-plane": "data-science-smcp",
    "namespace": "istio-system"
  },
  {
    "level": "info",
    "ts": "2024-07-12T10:12:03+02:00",
    "logger": "features",
    "msg": "waiting for control plane components to be ready",
    "feature": "mesh-control-plane-external-authz",
    "control-plane": "data-science-smcp",
    "namespace": "istio-system",
    "duration (s)": 300
  },
  {
    "level": "info",
    "ts": "2024-07-12T10:12:06+02:00",
    "logger": "features",
    "msg": "done waiting for control plane components to be ready",
    "feature": "mesh-control-plane-external-authz",
    "control-plane": "data-science-smcp",
    "namespace": "istio-system"
  },

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

With the opendatahub-io#1052 refactoring, the order of features added to the Registry was
accidentally changed. It results in failing of metrics collection
feature which expects SMCP to be created first, but the creation runs
afterwards. The setup is eventually consistent, as the reconcile will
retry, so this not a bug per se, but results in unnecassary errors.

This fix ensures features are ordered as before and levarages
`.EnabledWhen` instead of wrapping features in `if`s.
Copy link

openshift-ci bot commented Jul 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zdtsw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zdtsw
Copy link
Member

zdtsw commented Jul 12, 2024

/test opendatahub-operator-e2e

@openshift-merge-bot openshift-merge-bot bot merged commit d6f25c7 into opendatahub-io:incubation Jul 12, 2024
8 checks passed
Copy link
Contributor

@israel-hdez israel-hdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think noticed this... but because of the pre/post conditions, my brain told me "meh, it doesn't matter"... xD

VaishnaviHire pushed a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Jul 24, 2024
…o#1118)

With the opendatahub-io#1052 refactoring, the order of features added to the Registry was
accidentally changed. It results in failing of metrics collection
feature which expects SMCP to be created first, but the creation runs
afterwards. The setup is eventually consistent, as the reconcile will
retry, so this not a bug per se, but results in unnecassary errors.

This fix ensures features are ordered as before and levarages
`.EnabledWhen` instead of wrapping features in `if`s.

(cherry picked from commit d6f25c7)
VaishnaviHire pushed a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Jul 24, 2024
…o#1118)

With the opendatahub-io#1052 refactoring, the order of features added to the Registry was
accidentally changed. It results in failing of metrics collection
feature which expects SMCP to be created first, but the creation runs
afterwards. The setup is eventually consistent, as the reconcile will
retry, so this not a bug per se, but results in unnecassary errors.

This fix ensures features are ordered as before and levarages
`.EnabledWhen` instead of wrapping features in `if`s.

(cherry picked from commit d6f25c7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants