Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rhoai] Update Component rules in Prometheus #1511

Merged
merged 6 commits into from
Jan 21, 2025

Conversation

VaishnaviHire
Copy link
Member

@VaishnaviHire VaishnaviHire commented Jan 16, 2025

(cherry picked from commit ef677d0 and 05c7947)

Description

JIRA: RHOAIENG-14787

How Has This Been Tested?

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@VaishnaviHire
Copy link
Member Author

/test opendatahub-operator-e2e

@VaishnaviHire VaishnaviHire force-pushed the sync_rhoai_monitoing branch 4 times, most recently from 1232e67 to 85c34cb Compare January 17, 2025 14:38
@VaishnaviHire
Copy link
Member Author

/test opendatahub-operator-e2e

@VaishnaviHire VaishnaviHire requested review from zdtsw and lburgazzoli and removed request for etirelli and gzaronikas January 17, 2025 16:32
@grdryn
Copy link
Member

grdryn commented Jan 17, 2025

/retest

@openshift-ci openshift-ci bot added the lgtm label Jan 18, 2025
@VaishnaviHire
Copy link
Member Author

/test opendatahub-operator-e2e

@openshift-ci openshift-ci bot removed the lgtm label Jan 18, 2025
Copy link

codecov bot commented Jan 18, 2025

Codecov Report

Attention: Patch coverage is 1.04167% with 190 lines in your changes missing coverage. Please review.

Project coverage is 20.15%. Comparing base (0860779) to head (1b836c0).
Report is 2 commits behind head on rhoai.

Files with missing lines Patch % Lines
...rvices/monitoring/monitoring_controller_actions.go 0.00% 73 Missing ⚠️
.../dscinitialization/dscinitialization_controller.go 4.76% 40 Missing ⚠️
pkg/controller/predicates/resources/resources.go 0.00% 37 Missing ⚠️
pkg/controller/handlers/handlers.go 0.00% 15 Missing ⚠️
pkg/controller/predicates/component/component.go 0.00% 11 Missing ⚠️
controllers/services/monitoring/monitoring.go 0.00% 10 Missing ⚠️
...llers/services/monitoring/monitoring_controller.go 0.00% 3 Missing ⚠️
pkg/upgrade/upgrade.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            rhoai    #1511      +/-   ##
==========================================
- Coverage   20.36%   20.15%   -0.22%     
==========================================
  Files         161      161              
  Lines       10915    11040     +125     
==========================================
+ Hits         2223     2225       +2     
- Misses       8453     8576     +123     
  Partials      239      239              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zdtsw zdtsw force-pushed the sync_rhoai_monitoing branch from 1987845 to 74fb9c3 Compare January 20, 2025 09:43
},
}

var DSCDeletionPredicate = predicate.Funcs{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may not be doing what it is expected as by default all the methods return true : https://github.com/kubernetes-sigs/controller-runtime/blob/release-0.17/pkg/predicate/predicate.go#L54-L99

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this newly added code seems a mistake.
it is not used anywhere,
a similar one was already in the DSCI,
but you are right.
we probaby no need this predicate on DSC deletion at all. r.watchDSCResource is already dealt with that case.

@zdtsw zdtsw marked this pull request as draft January 21, 2025 09:52
VaishnaviHire and others added 3 commits January 21, 2025 11:35
(cherry picked from commit ef677d0)

(cherry picked from commit ca07cac)

Update Component rules in Prometheus

(cherry picked from commit bcdd7c8)
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit cab3a3a)
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit e0b7c39)
@zdtsw zdtsw force-pushed the sync_rhoai_monitoing branch from 74fb9c3 to d1563c3 Compare January 21, 2025 10:41
@zdtsw zdtsw marked this pull request as ready for review January 21, 2025 10:50
@openshift-ci openshift-ci bot requested review from robotmaxtron and zdtsw January 21, 2025 10:50
- move if to switch...case
- add .status.condition.MonitoringReady type
- change Monitoring CR .status.condition Reason and Message and Type name
- remove unused predicate var from DSCI
- change check on promethus deployment ready
- update: change to use Apply than Create
- update: add or remove prom rules
- add field manager for monitoring CR to DSCI
- add isComponentReady()
- update predicate for monitoring on DSC change on both .spec.components and .status.condition

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit 05c7947)
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
@zdtsw zdtsw force-pushed the sync_rhoai_monitoing branch from d1563c3 to f27e9e5 Compare January 21, 2025 10:52
@biswassri
Copy link
Contributor

/test opendatahub-operator-e2e

zdtsw added 2 commits January 21, 2025 14:00
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
@openshift-ci openshift-ci bot added the lgtm label Jan 21, 2025
Copy link

openshift-ci bot commented Jan 21, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zdtsw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zdtsw zdtsw merged commit 83abe0d into opendatahub-io:rhoai Jan 21, 2025
11 checks passed
MarianMacik pushed a commit to MarianMacik/opendatahub-operator that referenced this pull request Jan 22, 2025
…flux/component-updates/odh-dashboard-v2-17

chore(deps): update odh-dashboard-v2-17 to 723b43b
MarianMacik pushed a commit to MarianMacik/opendatahub-operator that referenced this pull request Jan 22, 2025
* Update Component rules in Prometheus

(cherry picked from commit ef677d0)

(cherry picked from commit ca07cac)

Update Component rules in Prometheus

(cherry picked from commit bcdd7c8)

* update: address some comments

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit cab3a3a)

* update: add monitoring CR status on condition of prom deployment

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit e0b7c39)

* update: address comments for monitoring component (opendatahub-io#1520)

- move if to switch...case
- add .status.condition.MonitoringReady type
- change Monitoring CR .status.condition Reason and Message and Type name
- remove unused predicate var from DSCI
- change check on promethus deployment ready
- update: change to use Apply than Create
- update: add or remove prom rules
- add field manager for monitoring CR to DSCI
- add isComponentReady()
- update predicate for monitoring on DSC change on both .spec.components and .status.condition

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit 05c7947)
Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: add missing cache for selfmanaged cluster because of monitoring is created there as well

* fix: wrong name in manifestss

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
VaishnaviHire added a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Jan 28, 2025
* [sync] : use label selector to pick namespace (opendatahub-io#1427) (opendatahub-io#1516)

* feat: use label selector to pick namespace (opendatahub-io#1427)

* feat: use label selector to pick namespace

- for clean installation:
  - if user keep using default pre-defined namespace: same as before
  - if user want to use a different namespace: they will need to create project
    themselves, and add label "opendatahub.io/application-namespace": "true"
    then install ODH operator and create DSCI by fill in this namespace
    or install RHOAI and delete auto created DSCI and create DSCI by fill in this namespace
- add more resource kinds into cache due to change use component CR with owns and watches
- add check when start up if we get multiple customized application ns then exit
- add check in DSCI controller:
  for customized app ns: check if it exists and has label, if not exit as error
  (do not handle managed case for now since we do not support that yet)
  force security label on it
- rename createOdhNamespace to createOperatorResource and split into small functions
- add more detail error in DSCI, example:
    conditions:
    - lastHeartbeatTime: '2025-01-17T15:00:29Z'
      lastTransitionTime: '2025-01-17T15:00:29Z'
      message: DSCI must used the same namespace which has opendatahub.io/application-namespace=true label
      reason: ReconcileFailed
      status: Unknown
    or
      conditions:
    - lastHeartbeatTime: '2025-01-17T15:03:15Z'
      lastTransitionTime: '2025-01-17T15:03:15Z'
      message: 'only support max. one namespace with label: opendatahub.io/application-namespace=true'
      reason: ReconcileFailed
      status: Unknown

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit e838877)
Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: networkpolicy

- rhoai both self and managed need networkpolicy on redhat-ods-operator namespace

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update:

- fix return of error
- add openshift-operators into list, subscriptoin get need it

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Fix kueue watch event handler for CRDs (opendatahub-io#1529) (opendatahub-io#1534)

cherry pick from main 2cbb627

* [rhoai] Update Component rules in Prometheus (opendatahub-io#1511)

* Update Component rules in Prometheus

(cherry picked from commit ef677d0)

(cherry picked from commit ca07cac)

Update Component rules in Prometheus

(cherry picked from commit bcdd7c8)

* update: address some comments

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit cab3a3a)

* update: add monitoring CR status on condition of prom deployment

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit e0b7c39)

* update: address comments for monitoring component (opendatahub-io#1520)

- move if to switch...case
- add .status.condition.MonitoringReady type
- change Monitoring CR .status.condition Reason and Message and Type name
- remove unused predicate var from DSCI
- change check on promethus deployment ready
- update: change to use Apply than Create
- update: add or remove prom rules
- add field manager for monitoring CR to DSCI
- add isComponentReady()
- update predicate for monitoring on DSC change on both .spec.components and .status.condition

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit 05c7947)
Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: add missing cache for selfmanaged cluster because of monitoring is created there as well

* fix: wrong name in manifestss

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>

* Avoid modelmesh and Kserve loop on updating shared CRDs (opendatahub-io#1548)

The KServe and ModelMesh are shipping the same CRDs as part of
their manifests but with different versions, this cause the
respective component reconcilers to keep trying to install
their respective version, ending in a infinite loop.

This commit does not solve the underlying problem of having two
components shipping the same CRDs with different versions, but
avoids the infinite reconcile loop. The CRDs that is actually
installed on the cluster is undefined, the latest one that is
applied wins

(cherry picked from commit 36b829c)

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Luca Burgazzoli <lburgazzoli@users.noreply.github.com>
Co-authored-by: Vaishnavi Hire <vhire@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants