Skip to content

Commit

Permalink
Update incubation with downstream changes (red-hat-data-services#783)
Browse files Browse the repository at this point in the history
* fix(oauth-dashboard): update APIversion when patch oauth-client (red-hat-data-services#136)

add more comment and error message

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit d688f25)

* Update kube-aggregator version

(cherry picked from commit a0c7864)

* fix(kserve): check on multiple depends operators if all pre-installed (red-hat-data-services#744) (red-hat-data-services#119)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit 57c4b82)

* [backport]: changes from rhods_2.4 to rhods_2.5 (red-hat-data-services#129)

* [cherry-pick]: split workbenches image into 2 params.env file

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update opendatahub label

(cherry picked from commit 3e975f9)
(cherry picked from commit 9f8b649)

* Update Codeflare manifests path

(cherry picked from commit 014396c)
(cherry picked from commit 5f1c0d4)

* Move creation of default DSC

(cherry picked from commit ab33109)
(cherry picked from commit 00ddd6c)

* update(manifests): enable kserve, modelmesh and workbenches

- dashboard and modelmesh-monitoring still from odh-manifests

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Fix cherry-pick for dsci

* fix(mm): set the new logic for modelmesh

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Fix the KF deployment:

* fix(monitoring): do the switch for dev mode to not send alert

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit 001cad1)

* refactor: reduce alert level for codeflare operator

* Update(manifests): for monitoring

- remove https:// for dashbaord target
- add nwp from odh-deployer
- fix: wrong service name for operator, this is defined in CSV
- port: do not use https but 8080

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Fix manifests for monitoring

(cherry picked from commit 85883f102bc15f2343c0f6afe253a29a4ff3f64f)

* Revert changes to prometheus port

Changes to prometheus port makes the route inaccessible

* fix rebase

* fix(dsci): missing label on namespaces (red-hat-data-services#98)

- add SM which is in modelmesh-monitroing into operator monitoring
- add roles which are in modelmesh-monitoring into ours too
- apply 3 labels to both monitoring and application namespace (which is v1 doing)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix(monitoring): typo (red-hat-data-services#101)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update(monitoring)

- remove hardcoded app. namespace in segment manifests
- remove hardcoded monitoring. namepsace in base manifests
- add placeholder to inject monitoring namespace in Servicemonitor

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* uplift: package version

- github.com/operator-framework/operator-lifecycle-manager/releases/tag/v0.26.0
- github.com/openshift/api to latest v0.0.0

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Remove odh csv

* fix(crd): do not set ownerreference on CRD (red-hat-data-services#725)

-  we covered the case when set component from Managed to Remvoe
-  this is to cover the case when set have component as Managed and
delete DSC CR
- so if we do not set at first it wont get deleted

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit e9461e0)

* Fix DSCI Patch

* update(monitoring): metrics (red-hat-data-services#107)

* update(monitoring):

- add log in pod for QE to see it is dev mode cluster
- add two metrics:
	i do not think they are used in this config
	but they are presented in v1 config , so i add back
- move recording for workbench to correct rule file
- remove operator-alerting.rules it is not used in v1 to keep it simple

- fix: openshift-monitoring is using web as port name and our port

- add more comments for the config  and comments out not needed config
- add egress for odh monitoring and add cluster monitoring NS for ingress

- keep rhdos_aggerate_avaiablity from proemtehusrules along with 2 users
   reason for this is: PSI does not get non openshift-* or kube-* NS metrics to cluster-monitoring prometheus. as
cluster-monitoring prometheus-k8s only use prometheusrule not serivcemonitor ?

-  from test result:
	if our monitoring ns not set cluster-monitoring, there is no targets on federation2 and no rhods_aggreated_in metrics

- fix(monitoring): removed duplicated alerts of dashboard in workbenches

- add UWM ns for operator ingress

- according to doc: when enable UWM should not have custom Prometheus, this might be the conflicts why we cannot see metrics from odh monitoring in cluster-monitoring prometheus?

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Remove DSCI explicit naming

* Fix regression in Prometheus Deployment

* Remove os.exit for custom functions

* Delete legacy blackbox exporter

* fix(monitoring): add missing role and rolebinding for prometheus (red-hat-data-services#112)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix(monitoring): missing add new files into kustomization (red-hat-data-services#113)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* cleanup(monitoring): after previous 2 commits this is not needed/useful (red-hat-data-services#114)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix(monitoring): do not set odh monitoring namespace when apply for  manifests in "monitoring/base" (red-hat-data-services#115)

* fix(monitoring):  not set our monitoring when apply to monitoring/base folder
- hardcode our monitoring namespace for all needed manifests

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* revert: label changes made in upgrade PR

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix(monitoring): cannot load dashbaord record rules (red-hat-data-services#123)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix(monitoring): when DSC is removed entry in rule_files should be
cleanedup

- match does not work with * in the string need to use (.*)
- add (-) in the front for diffientiate the rule_file or the real rules

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* cherry-pick: edson's rhods-12939 from odh + debug + timeout tuning

comnent out ExpointialBackoffWithContext for now to test
not add v2 into markedDeletion list

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix(upgrade): modelmesh monitoring deployment need deletion as well

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: add statefulset

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* cherrypick: upstream 748 fix no reconcile when no error return

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* RHODS-12956: removing CR update from the operator reconciliation loop to avoid infinite loop (red-hat-data-services#128)

* chore

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Vaishnavi Hire <vhire@redhat.com>
Co-authored-by: Dimitri Saridakis <dimitri.saridakis@gmail.com>
Co-authored-by: Edson Tirelli <ed.tirelli@gmail.com>
(cherry picked from commit 81ebc87)
(cherry picked from commit 7525f99)

* fix(rebase): in previous commits (red-hat-data-services#131)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit 15b2db8)
(cherry picked from commit 0062ba3)

* [rhods-2.5] Add Predicate for Prometheus Configmap (red-hat-data-services#134)

* Add Predicate for Prometheus Configmap

(cherry picked from commit 35f4136)

* fix(linter)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit 6ecf2b5)
(cherry picked from commit deeccb7)

* fix(monitoring): only set prometheus as part-of label from component (red-hat-data-services#135)

this will reduced necessary updates on configmap

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit 13c28ec)
(cherry picked from commit b809260)

* update: set kserve  as Managed by default DSC (red-hat-data-services#130)

- keep modelmesh in clean install removed(flip from previous managed)
- keep modelmesh from old version as-was
set OSSM and serving both as default Managaed
- update docs with default status and missing new components
- fix nilpointer in DSCI

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit f7c2713)
(cherry picked from commit 4f3cd66)

* fix(monitoring): do not add component rules till service is up (red-hat-data-services#137)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit d85efc5)
(cherry picked from commit 8cca478)

* fix(secret): do not delete secret if cannot find (red-hat-data-services#140)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit 60f0419)
(cherry picked from commit 5e3731b)

* chore: typo (red-hat-data-services#141)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit 0f9fe32)
(cherry picked from commit b8926f7)

* Add defaults for Kserve for new install

(cherry picked from commit 8bd2782)
(cherry picked from commit 81433ba)

* Revert "Update defaults for modelmesh" (red-hat-data-services#146)

(cherry picked from commit e5a27c4)
(cherry picked from commit 7389619)

* fix(mm-monitoring): revert the code logic but set to disable as delete (red-hat-data-services#153)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

fix(dsc): stop watching validationwebhook for non-create/delete events (red-hat-data-services#150)

* fix(dsc): stop watching validationwebhook for non-create/delete events
* update: remove CRD in the DSC watch and cleanup debug
* fix: add more ignore on the label changes
---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Revert "Remove modelmesh monitoring"

This reverts commit 91dd78f.

fix(modelmesh): remove wrong check on the deployment of modelmesh (red-hat-data-services#148)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Retain existing DSCI values

Explicilty add Servicemesh in default dsci

Update defaults for modelmesh

(cherry picked from commit 6eb6d4a)
(cherry picked from commit a4788f3)

* fix: update default name for DSC in initialization-resource

- add missing default config for serving of kserve in sample
- set modelmesh in sample and init as Managed

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit 633d9f6)

* fix: do not force check if servicemesh is set to managed in DSCI (red-hat-data-services#154)

* fix: do not force check if servicemesh is set to managed in DSCI

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: add supported value for serverless and servicemesh

- currently removed and unmanaged are the same logic

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: do not remove resources if it has label

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit 3a672d4)

* Fix lint

* fix: rebase incubation

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* cherry-pick: red-hat-data-services#157

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: for ODH to resolve
https://issues.redhat.com/browse/RHOAIENG-157

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: CSV

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: linter

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
  • Loading branch information
VaishnaviHire and zdtsw authored Dec 12, 2023
1 parent 42b2bdd commit f756e40
Show file tree
Hide file tree
Showing 38 changed files with 693 additions and 1,579 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ run: manifests generate fmt vet ## Run a controller from your host.
go run ./main.go

.PHONY: image-build
image-build: unit-test ## Build image with the manager.
image-build: # unit-test ## Build image with the manager.
$(IMAGE_BUILDER) build --no-cache -f Dockerfiles/Dockerfile ${IMAGE_BUILD_FLAGS} -t $(IMG) .

.PHONY: image-push
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,8 @@ spec:
managementState: Managed
workbenches:
managementState: Managed
trustyai:
managementState: Managed
```

2. Enable only Dashboard and Workbenches
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -262,9 +262,10 @@ spec:
type: string
type: object
managementState:
default: Removed
default: Managed
enum:
- Managed
- Unmanaged
- Removed
pattern: ^(Managed|Unmanaged|Force|Removed)$
type: string
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ spec:
default: Removed
enum:
- Managed
- Unmanaged
- Removed
pattern: ^(Managed|Unmanaged|Force|Removed)$
type: string
Expand Down
19 changes: 18 additions & 1 deletion bundle/manifests/opendatahub-operator.clusterserviceversion.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,16 @@ metadata:
"managementState": "Managed"
},
"kserve": {
"managementState": "Removed"
"managementState": "Managed",
"serving": {
"ingressGateway": {
"certificate": {
"type": "SelfSigned"
}
},
"managementState": "Managed",
"name": "knative-serving"
}
},
"modelmeshserving": {
"managementState": "Managed"
Expand Down Expand Up @@ -64,6 +73,14 @@ metadata:
"monitoring": {
"managementState": "Managed",
"namespace": "opendatahub"
},
"serviceMesh": {
"controlPlane": {
"metricsCollection": "Istio",
"name": "data-science-smcp",
"namespace": "istio-system"
},
"managementState": "Managed"
}
}
},
Expand Down
2 changes: 2 additions & 0 deletions components/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ can be found [here](https://github.com/opendatahub-io/opendatahub-operator/tree/
GetComponentName() string
GetManagementState() operatorv1.ManagementState
SetImageParamsMap(imageMap map[string]string) map[string]string
UpdatePrometheusConfig(cli client.Client, enable bool, component string) error
WaitForDeploymentAvailable(ctx context.Context, r *rest.Config, c string, n string, i int, t int) error
}
```
### Add reconcile and Events
Expand Down
17 changes: 14 additions & 3 deletions components/codeflare/codeflare.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,19 @@
package codeflare

import (
"context"
"fmt"
"path/filepath"

operatorv1 "github.com/openshift/api/operator/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/rest"
"sigs.k8s.io/controller-runtime/pkg/client"

dsciv1 "github.com/opendatahub-io/opendatahub-operator/v2/apis/dscinitialization/v1"
"github.com/opendatahub-io/opendatahub-operator/v2/components"
"github.com/opendatahub-io/opendatahub-operator/v2/pkg/deploy"
"github.com/opendatahub-io/opendatahub-operator/v2/pkg/monitoring"
)

var (
Expand Down Expand Up @@ -53,7 +56,7 @@ func (c *CodeFlare) GetComponentName() string {
return ComponentName
}

func (c *CodeFlare) ReconcileComponent(cli client.Client, owner metav1.Object, dscispec *dsciv1.DSCInitializationSpec, _ bool) error {
func (c *CodeFlare) ReconcileComponent(ctx context.Context, cli client.Client, resConf *rest.Config, owner metav1.Object, dscispec *dsciv1.DSCInitializationSpec, _ bool) error {
var imageParamMap = map[string]string{
"odh-codeflare-operator-controller-image": "RELATED_IMAGE_ODH_CODEFLARE_OPERATOR_IMAGE", // no need mcad, embedded in cfo
"namespace": dscispec.ApplicationsNamespace,
Expand All @@ -78,7 +81,7 @@ func (c *CodeFlare) ReconcileComponent(cli client.Client, owner metav1.Object, d
}

if found, err := deploy.OperatorExists(cli, dependentOperator); err != nil {
return err
return fmt.Errorf("operator exists throws error %v", err)
} else if found {
return fmt.Errorf("operator %s is found. Please uninstall the operator before enabling %s component",
dependentOperator, ComponentName)
Expand All @@ -102,14 +105,22 @@ func (c *CodeFlare) ReconcileComponent(cli client.Client, owner metav1.Object, d

// CloudServiceMonitoring handling
if platform == deploy.ManagedRhods {
if enabled {
// first check if the service is up, so prometheus wont fire alerts when it is just startup
if err := monitoring.WaitForDeploymentAvailable(ctx, resConf, ComponentName, dscispec.ApplicationsNamespace, 20, 2); err != nil {
return fmt.Errorf("deployment for %s is not ready to server: %w", ComponentName, err)
}
fmt.Printf("deployment for %s is done, updating monitoring rules\n", ComponentName)
}

// inject prometheus codeflare*.rules in to /opt/manifests/monitoring/prometheus/prometheus-configs.yaml
if err = c.UpdatePrometheusConfig(cli, enabled && monitoringEnabled, ComponentName); err != nil {
return err
}
if err = deploy.DeployManifestsFromPath(cli, owner,
filepath.Join(deploy.DefaultManifestPath, "monitoring", "prometheus", "apps"),
dscispec.Monitoring.Namespace,
ComponentName+"prometheus", true); err != nil {
"prometheus", true); err != nil {
return err
}
}
Expand Down
38 changes: 36 additions & 2 deletions components/component.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package components

import (
"context"
"fmt"
"os"
"path/filepath"
Expand All @@ -9,6 +10,7 @@ import (
operatorv1 "github.com/openshift/api/operator/v1"
"gopkg.in/yaml.v2"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/rest"
"sigs.k8s.io/controller-runtime/pkg/client"

dsciv1 "github.com/opendatahub-io/opendatahub-operator/v2/apis/dscinitialization/v1"
Expand Down Expand Up @@ -78,13 +80,14 @@ type ManifestsConfig struct {
}

type ComponentInterface interface {
ReconcileComponent(cli client.Client, owner metav1.Object, DSCISpec *dsciv1.DSCInitializationSpec, currentComponentStatus bool) error
ReconcileComponent(ctx context.Context, cli client.Client, resConf *rest.Config, owner metav1.Object, DSCISpec *dsciv1.DSCInitializationSpec, currentComponentStatus bool) error
Cleanup(cli client.Client, DSCISpec *dsciv1.DSCInitializationSpec) error
GetComponentName() string
GetManagementState() operatorv1.ManagementState
SetImageParamsMap(imageMap map[string]string) map[string]string
OverrideManifests(platform string) error
UpdatePrometheusConfig(cli client.Client, enable bool, component string) error
// WaitForDeploymentAvailable(ctx context.Context, r *rest.Config, c string, n string, i int, t int) error
}

// UpdatePrometheusConfig update prometheus-configs.yaml to include/exclude <component>.rules
Expand All @@ -106,7 +109,7 @@ func (c *Component) UpdatePrometheusConfig(_ client.Client, enable bool, compone
DeadManSnitchRules string `yaml:"deadmanssnitch-alerting.rules"`
CFRRules string `yaml:"codeflare-recording.rules"`
CRARules string `yaml:"codeflare-alerting.rules"`
DashboardRRules string `yaml:"rhods-dashboard-recording.rule"`
DashboardRRules string `yaml:"rhods-dashboard-recording.rules"`
DashboardARules string `yaml:"rhods-dashboard-alerting.rules"`
DSPRRules string `yaml:"data-science-pipelines-operator-recording.rules"`
DSPARules string `yaml:"data-science-pipelines-operator-alerting.rules"`
Expand Down Expand Up @@ -181,3 +184,34 @@ func (c *Component) UpdatePrometheusConfig(_ client.Client, enable bool, compone

return err
}

// WaitForDeploymentAvailable to check if component deployment from 'namepsace' is ready within 'timeout' before apply prometheus rules for the component
// func (c *Component) WaitForDeploymentAvailable(ctx context.Context, restConfig *rest.Config, componentName string, namespace string, interval int, timeout int) error {
// resourceInterval := time.Duration(interval) * time.Second
// resourceTimeout := time.Duration(timeout) * time.Minute
// return wait.PollUntilContextTimeout(context.TODO(), resourceInterval, resourceTimeout, true, func(ctx context.Context) (bool, error) {
// clientset, err := kubernetes.NewForConfig(restConfig)
// if err != nil {
// return false, fmt.Errorf("error getting client %w", err)
// }
// componentDeploymentList, err := clientset.AppsV1().Deployments(namespace).List(context.TODO(), metav1.ListOptions{
// LabelSelector: "app.opendatahub.io/" + componentName,
// })
// if err != nil {
// if errors.IsNotFound(err) {
// return false, nil
// }
// }
// isReady := false
// if len(componentDeploymentList.Items) != 0 {
// for _, deployment := range componentDeploymentList.Items {
// if deployment.Status.ReadyReplicas == deployment.Status.Replicas {
// isReady = true
// } else {
// isReady = false
// }
// }
// }
// return isReady, nil
// })
// }
39 changes: 29 additions & 10 deletions components/dashboard/dashboard.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,15 @@ import (
v1 "k8s.io/api/core/v1"
apierrs "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/rest"
"sigs.k8s.io/controller-runtime/pkg/client"

dsciv1 "github.com/opendatahub-io/opendatahub-operator/v2/apis/dscinitialization/v1"
"github.com/opendatahub-io/opendatahub-operator/v2/components"
"github.com/opendatahub-io/opendatahub-operator/v2/pkg/cluster"
"github.com/opendatahub-io/opendatahub-operator/v2/pkg/common"
"github.com/opendatahub-io/opendatahub-operator/v2/pkg/deploy"
"github.com/opendatahub-io/opendatahub-operator/v2/pkg/monitoring"
)

var (
Expand Down Expand Up @@ -78,7 +80,13 @@ func (d *Dashboard) GetComponentName() string {
}

//nolint:gocyclo
func (d *Dashboard) ReconcileComponent(cli client.Client, owner metav1.Object, dscispec *dsciv1.DSCInitializationSpec, currentComponentStatus bool) error {
func (d *Dashboard) ReconcileComponent(ctx context.Context,
cli client.Client,
resConf *rest.Config,
owner metav1.Object,
dscispec *dsciv1.DSCInitializationSpec,
currentComponentExist bool,
) error {
var imageParamMap = map[string]string{
"odh-dashboard-image": "RELATED_IMAGE_ODH_DASHBOARD_IMAGE",
}
Expand All @@ -92,9 +100,11 @@ func (d *Dashboard) ReconcileComponent(cli client.Client, owner metav1.Object, d

// Update Default rolebinding
if enabled {
if err := d.cleanOauthClientSecrets(cli, dscispec, currentComponentStatus); err != nil {
// cleanup OAuth client related secret and CR if dashboard is in 'installed falas' status
if err := d.cleanOauthClient(cli, dscispec, currentComponentExist); err != nil {
return err
}

// Download manifests and update paths
if err := d.OverrideManifests(string(platform)); err != nil {
return err
Expand Down Expand Up @@ -162,13 +172,21 @@ func (d *Dashboard) ReconcileComponent(cli client.Client, owner metav1.Object, d
}
// CloudService Monitoring handling
if platform == deploy.ManagedRhods {
if enabled {
// first check if the service is up, so prometheus wont fire alerts when it is just startup
if err := monitoring.WaitForDeploymentAvailable(ctx, resConf, ComponentNameSupported, dscispec.ApplicationsNamespace, 20, 3); err != nil {
return fmt.Errorf("deployment for %s is not ready to server: %w", ComponentName, err)
}
fmt.Printf("deployment for %s is done, updating monitoring rules\n", ComponentNameSupported)
}

if err := d.UpdatePrometheusConfig(cli, enabled && monitoringEnabled, ComponentNameSupported); err != nil {
return err
}
if err = deploy.DeployManifestsFromPath(cli, owner,
filepath.Join(deploy.DefaultManifestPath, "monitoring", "prometheus", "apps"),
dscispec.Monitoring.Namespace,
ComponentName+"prometheus", true); err != nil {
"prometheus", true); err != nil {
return err
}
}
Expand Down Expand Up @@ -263,25 +281,26 @@ func (d *Dashboard) deployConsoleLink(cli client.Client, owner metav1.Object, na
return nil
}

func (d *Dashboard) cleanOauthClientSecrets(cli client.Client, dscispec *dsciv1.DSCInitializationSpec, currentComponentStatus bool) error {
func (d *Dashboard) cleanOauthClient(cli client.Client, dscispec *dsciv1.DSCInitializationSpec, currentComponentExist bool) error {
// Remove previous oauth-client secrets
// Check if component is going from state of `Not Installed --> Installed`
// Assumption: Component is currently set to enabled
if !currentComponentStatus {
name := "dashboard-oauth-client"
if !currentComponentExist {
fmt.Println("Cleanup any left secret")
// Delete client secrets from previous installation
oauthClientSecret := &v1.Secret{}
err := cli.Get(context.TODO(), client.ObjectKey{
Namespace: dscispec.ApplicationsNamespace,
Name: "dashboard-oauth-client",
Name: name,
}, oauthClientSecret)
if err != nil {
if !apierrs.IsNotFound(err) {
return err
return fmt.Errorf("error getting secret %s: %w", name, err)
}
} else {
err := cli.Delete(context.TODO(), oauthClientSecret)
if err != nil {
return fmt.Errorf("error deleting oauth client secret: %v", err)
if err := cli.Delete(context.TODO(), oauthClientSecret); err != nil {
return fmt.Errorf("error deleting secret %s in namespace %s : %w", name, dscispec.ApplicationsNamespace, err)
}
}
}
Expand Down
23 changes: 21 additions & 2 deletions components/datasciencepipelines/datasciencepipelines.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,19 @@
package datasciencepipelines

import (
"context"
"fmt"
"path/filepath"

operatorv1 "github.com/openshift/api/operator/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/rest"
"sigs.k8s.io/controller-runtime/pkg/client"

dsciv1 "github.com/opendatahub-io/opendatahub-operator/v2/apis/dscinitialization/v1"
"github.com/opendatahub-io/opendatahub-operator/v2/components"
"github.com/opendatahub-io/opendatahub-operator/v2/pkg/deploy"
"github.com/opendatahub-io/opendatahub-operator/v2/pkg/monitoring"
)

var (
Expand Down Expand Up @@ -50,7 +54,13 @@ func (d *DataSciencePipelines) GetComponentName() string {
return ComponentName
}

func (d *DataSciencePipelines) ReconcileComponent(cli client.Client, owner metav1.Object, dscispec *dsciv1.DSCInitializationSpec, _ bool) error {
func (d *DataSciencePipelines) ReconcileComponent(ctx context.Context,
cli client.Client,
resConf *rest.Config,
owner metav1.Object,
dscispec *dsciv1.DSCInitializationSpec,
_ bool,
) error {
var imageParamMap = map[string]string{
"IMAGES_APISERVER": "RELATED_IMAGE_ODH_ML_PIPELINES_API_SERVER_IMAGE",
"IMAGES_ARTIFACT": "RELATED_IMAGE_ODH_ML_PIPELINES_ARTIFACT_MANAGER_IMAGE",
Expand Down Expand Up @@ -88,13 +98,22 @@ func (d *DataSciencePipelines) ReconcileComponent(cli client.Client, owner metav
}
// CloudService Monitoring handling
if platform == deploy.ManagedRhods {
if enabled {
// first check if the service is up, so prometheus wont fire alerts when it is just startup
// only 1 replica should be very quick
if err := monitoring.WaitForDeploymentAvailable(ctx, resConf, ComponentName, dscispec.ApplicationsNamespace, 10, 1); err != nil {
return fmt.Errorf("deployment for %s is not ready to server: %w", ComponentName, err)
}
fmt.Printf("deployment for %s is done, updating monitoring rules\n", ComponentName)
}

if err := d.UpdatePrometheusConfig(cli, enabled && monitoringEnabled, ComponentName); err != nil {
return err
}
if err = deploy.DeployManifestsFromPath(cli, owner,
filepath.Join(deploy.DefaultManifestPath, "monitoring", "prometheus", "apps"),
dscispec.Monitoring.Namespace,
ComponentName+"prometheus", true); err != nil {
"prometheus", true); err != nil {
return err
}
}
Expand Down
Loading

0 comments on commit f756e40

Please sign in to comment.