Skip to content

Commit

Permalink
Merge pull request opendatahub-io#124 from zdtsw-forking/fix_revert_m…
Browse files Browse the repository at this point in the history
…onitoring_rules

fix(monitoring): when DSC is removed entry in rule_files should be cleanup without panic
  • Loading branch information
etirelli authored Nov 20, 2023
2 parents dafe183 + 786f815 commit c8a6d91
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 11 deletions.
20 changes: 10 additions & 10 deletions config/monitoring/prometheus/apps/prometheus-configs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ metadata:
data:
prometheus.yml: |
rule_files:
- 'operator-recording.rules'
- 'deadmanssnitch-alerting.rules'
- operator-recording.rules
- deadmanssnitch-alerting.rules
global:
scrape_interval: 10s
Expand Down Expand Up @@ -333,14 +333,14 @@ data:
- name: DeadManSnitch
interval: 1m
rules:
- alert: DeadManSnitch
expr: vector(1)
labels:
severity: critical
namespace: redhat-ods-monitoring
annotations:
description: This is a DeadManSnitch to ensure RHODS monitoring and alerting pipeline is online.
summary: Alerting DeadManSnitch
- alert: DeadManSnitch
expr: vector(1)
labels:
severity: critical
namespace: redhat-ods-monitoring
annotations:
description: This is a DeadManSnitch to ensure RHODS monitoring and alerting pipeline is online.
summary: Alerting DeadManSnitch
codeflare-recording.rules: |
groups:
Expand Down
11 changes: 10 additions & 1 deletion controllers/dscinitialization/monitoring.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,18 @@ func (r *DSCInitializationReconciler) configureManagedMonitoring(ctx context.Con
}
}
if initial == "revertbackup" {
// TODO: implement with a better solution
// to have - before component name is to filter out the real rules file line
// e.g line of "workbenches-recording.rules: |"
err := common.MatchLineInFile(filepath.Join(prometheusConfigPath, "prometheus-configs.yaml"),
map[string]string{
"*.rules: ": "",
"(.*)-(.*)workbenches(.*).rules": "",
"(.*)-(.*)rhods-dashboard(.*).rules": "",
"(.*)-(.*)codeflare(.*).rules": "",
"(.*)-(.*)data-science-pipelines-operator(.*).rules": "",
"(.*)-(.*)model-mesh(.*).rules": "",
"(.*)-(.*)odh-model-controller(.*).rules": "",
"(.*)-(.*)ray(.*).rules": "",
})
if err != nil {
r.Log.Error(err, "error to remove previous enabled component rules")
Expand Down

0 comments on commit c8a6d91

Please sign in to comment.