Skip to content

Commit

Permalink
feat: Add ModelRegistry component (opendatahub-io#775) (opendatahub-i…
Browse files Browse the repository at this point in the history
…o#776)

Squashed commit due to buildability since ComponentInterface has
changed.

Other patches squashed as well to avoid double squashing due to
merge policy.

modelregistry: regenerate autogenerated files

Run `make generate manifests` after all the changes

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

feat: Add ModelRegistry component (opendatahub-io#775) (opendatahub-io#776)

* feat: Add ModelRegistry component (opendatahub-io#775)

* fix: Fix modelregistry odh overlays path

* fix: fix dsc_create_test tests err nil check

* fix: refactor ModelRegistry.ReconcileComponent for new parameters

* chore: added modelregistry to README.md

* fix: add missing rbac rules for deploymentconfigs and daemonsets

* chore: code lint cleanup

* fix: added check for nil DevFlags in model-registry component

* fix: add nil check for dscispec.DevFlags in model-registry ReconcileComponent

* fix: remove RBAC rules for daemonsets and deploymentconfigs

* fix(chore): fix lint errors in dsc_deletion_test.go

(cherry picked from commit 112d3f1)
Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

modelregistry: partial: chore: removes SetImageParamsMap from ComponentInterface (opendatahub-io#897)

Partial application of already applied

commit d10a764
Author: Bartosz Majsak <bartosz.majsak@gmail.com>
Date:   Thu Mar 7 15:43:37 2024 +0100

    chore: removes SetImageParamsMap from ComponentInterface (opendatahub-io#897)

    As it's not used by any component, acting as a simple pass-return loop.

    This makes the API contract a bit cleaner.

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

modelregistry: partial: chore: remove the need of passing rest config (opendatahub-io#895)

Partial application of already applied

commit ca7fa98
Author: Bartosz Majsak <bartosz.majsak@gmail.com>
Date:   Fri Mar 8 17:40:54 2024 +0100

    chore: remove the need of passing rest config (opendatahub-io#895)

    * chore: fixes ComponentInterface docs

    By removing reference to non-existing func. This function has been in
    use outside of this component.

    * fix: removes rest config

    As we are already using client.Client interface we do not have to
    instantiate other typed clients to e.g. list resources using their own
    funcs. Generic client.Client is sufficient for these needs.

    Additionally this change adds ctx propogation for these calls.

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

modelregistry: partial: feat(logger): for both controller level and component level (opendatahub-io#837)

Partial application of already applied

commit d8a83a2
Author: Wen Zhou <wenzhou@redhat.com>
Date:   Mon Apr 1 22:06:16 2024 +0200

    feat(logger): for both controller level and component level (opendatahub-io#837)

    * feat(logger): for both controller level and component level

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>

    * update(logger): use logr instead of uber's zap

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>

    * update: do not log error only print

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>

    * update: use zap.Options for both and tune levels

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>

    * update: move setting into common function

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>
    Signed-off-by: Zhou, Wen <wenzhou@redhat.com>

    ---------

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>
    Signed-off-by: Zhou, Wen <wenzhou@redhat.com>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

update(modelregistry): rename image name (opendatahub-io#877)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
(cherry picked from commit b4e4d6f)

modelregistry: partial: chore: cleanup duplicated functions packages and add more for godoc (opendatahub-io#981)

Partial application of already applied

commit 96c85f2
Author: Wen Zhou <wenzhou@redhat.com>
Date:   Tue Apr 23 14:05:24 2024 +0200

    chore: cleanup duplicated functions packages and add more for godoc (opendatahub-io#981)

    * chore: cleanup duplicated functions/package and add godoc

    - move GetPlatform() from deploy package to cluster package
    - move const ManagedRhods SelfManagedRhods OpenDataHub from deploy to cluster package
    - move WaitForDeploymentAvailable() monitoring package to cluster package
    - remove monitoring package
    - move UpdatePodSecurityRolebinding() from common package to cluster package
    - deprecate GetDomain from common package, to only use GetDomain from cluster package.
    - remove gvk package, move its GVK to cluster package
    - move DeleteExistingSubscription() from deploy package to upgrade package
    - do not export getSubscription()

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>

    * update: remove gvk into one file but under cluster package

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>

    * update: rename variable, removing GVK from it

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>

    * update: move gvk into a sub package under cluster

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>

    ---------

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

feat(mr): create namespace for Model Registry (opendatahub-io#930)

* feat(mr): create namespace for smm

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: rebase

Signed-off-by: Zhou, Wen <wenzhou@redhat.com>

* update: code review comments

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix(doc): wrong comments

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: remove label to keep namespace even opreator is uninstalled

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Zhou, Wen <wenzhou@redhat.com>
(cherry picked from commit 1188ce1)

feat(mr): add model registry odh extras manifests, fixes RHOAIENG-5112 (opendatahub-io#953)

(cherry picked from commit 7c3e81b)

modelregistry: partial: chore: Open up util functions for context propagation (opendatahub-io#1033)

Partial application of already applied

commit 105adae
Author: Aslak Knutsen <aslak@4fs.no>
Date:   Tue Jun 4 15:16:21 2024 +0200

    chore: Open up util functions for context propagation (opendatahub-io#1033)

    context should be determined by the caller and propagated
    down the call chain.

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

modelregistry: partial: chore: remove duplicated platform call in each component (opendatahub-io#1055)

Partial application of already applied

commit 1b04761
Author: Wen Zhou <wenzhou@redhat.com>
Date:   Fri Jun 14 14:47:33 2024 +0200

    chore: remove duplicated platform call in each component (opendatahub-io#1055)

    - get in DSC and pass into compoment

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

modelregistry: update api docs

run `make api-docs`
add +groupName=datasciencecluster.opendatahub.io

On backporting of
1b86e42 ("Update readme.md  (opendatahub-io#890)")

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

modelregistry: partial: chore(lint): enable contextcheck and containedctx (opendatahub-io#1070)

Partial application of already applied:

commit 06e21a4
Author: Luca Burgazzoli <lburgazzoli@users.noreply.github.com>
Date:   Tue Jun 25 17:15:13 2024 +0200

    chore(lint): enable contextcheck and containedctx (opendatahub-io#1070)

    * chore(lint): enable contextcheck

    Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com>

    * chore(lint): enable containedctx

    Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com>

    * Fix PR review findings

    * Fix rebase

    ---------

    Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

refactor: dashboard with new manifests structure (opendatahub-io#1065)

Partial application of already applied:

commit 438f4c2
Author: Wen Zhou <wenzhou@redhat.com>
Date:   Tue Jul 2 16:56:25 2024 +0200

    refactor: dashboard with new manifests structure (opendatahub-io#1065)

    * refactor: dashboard with new manifests structure

    - change type of platform, skip convert to string
    - add more support for ApplyParam() to
      not only take ENV but also anything from ExtraParamMaps
    * update: simplify override function
    * update: add value for Unknown platform
    ---------

    Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

feat: add managed model registry prometheus config handling logic, part of RHOAIENG-4273 (opendatahub-io#1150)

(cherry picked from commit 72fc80f)

Adjusted Kueue and TrainingOperator rules

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

feat: add default cert for model registry, fixes RHOAIENG-9909 (opendatahub-io#1165)

Conflicts: ApplyParams arguments due to missing:
  d84cd33 ("update: remove unnecessary param from ApplyParams() (opendatahub-io#1180)")

* feat: add default cert for model registry, fixes RHOAIENG-9909

* fix: fixed lint errors

* fix: add servicemesh feature check for MR, add MR enable check in e2e default cert test

* fix: changed MR servicemesh status check to look for Managed state

* fix: ignore missing model-registry default cert if already removed

(cherry picked from commit 4c411a6)

feat: add servicemeshmember for model registry namespace, fixes RHOAIENG-11831 (opendatahub-io#1202)

* feat: add servicemeshmember for model registry namespace, fixes RHOAIENG-11831

* fix: ignore error if MR smm already exists

* code cleanup for readability

Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com>

* Avoid shadowing package name in variable

Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com>

* chore: rename createServicemeshMember to enrollToServiceMesh, add log messages

---------

Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com>
(cherry picked from commit 8f3d013)

feat: add managed model registry prometheus job, metrics, and alering rules, fixes RHOAIENG-4273

(cherry picked from commit f811d67)
  • Loading branch information
dhirajsb authored and ykaliuta committed Aug 27, 2024
1 parent c4446d6 commit 83e3e39
Show file tree
Hide file tree
Showing 19 changed files with 636 additions and 1 deletion.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,8 @@ spec:
managementState: Managed
workbenches:
managementState: Managed
modelregistry:
managementState: Managed
```

2. Enable only Dashboard and Workbenches
Expand Down
4 changes: 4 additions & 0 deletions apis/datasciencecluster/v1/datasciencecluster_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ import (
"github.com/opendatahub-io/opendatahub-operator/v2/components/kserve"
"github.com/opendatahub-io/opendatahub-operator/v2/components/kueue"
"github.com/opendatahub-io/opendatahub-operator/v2/components/modelmeshserving"
"github.com/opendatahub-io/opendatahub-operator/v2/components/modelregistry"
"github.com/opendatahub-io/opendatahub-operator/v2/components/ray"
"github.com/opendatahub-io/opendatahub-operator/v2/components/trainingoperator"
"github.com/opendatahub-io/opendatahub-operator/v2/components/trustyai"
Expand Down Expand Up @@ -80,6 +81,9 @@ type Components struct {

//Training Operator component configuration.
TrainingOperator trainingoperator.TrainingOperator `json:"trainingoperator,omitempty"`

// ModelRegistry component configuration.
ModelRegistry modelregistry.ModelRegistry `json:"modelregistry,omitempty"`
}

// DataScienceClusterStatus defines the observed state of DataScienceCluster.
Expand Down
1 change: 1 addition & 0 deletions apis/datasciencecluster/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,49 @@ spec:
pattern: ^(Managed|Unmanaged|Force|Removed)$
type: string
type: object
modelregistry:
description: ModelRegistry component configuration.
properties:
devFlags:
description: Add developer fields
properties:
manifests:
description: List of custom manifests for the given component
items:
properties:
contextDir:
default: ""
description: contextDir is the relative path to
the folder containing manifests in a repository
type: string
sourcePath:
default: ""
description: 'sourcePath is the subpath within contextDir
where kustomize builds start. Examples include
any sub-folder or path: `base`, `overlays/dev`,
`default`, `odh` etc'
type: string
uri:
default: ""
description: uri is the URI point to a git repo
with tag/branch. e.g https://github.com/org/repo/tarball/<tag/branch>
type: string
type: object
type: array
type: object
managementState:
description: "Set to one of the following values: \n - \"Managed\"
: the operator is actively managing the component and trying
to keep it active. It will only upgrade the component if
it is safe to do so \n - \"Removed\" : the operator is actively
managing the component and will not install it, or if it
is installed, the operator will try to remove it"
enum:
- Managed
- Removed
pattern: ^(Managed|Unmanaged|Force|Removed)$
type: string
type: object
ray:
description: Ray component configuration.
properties:
Expand Down
29 changes: 29 additions & 0 deletions bundle/manifests/rhods-operator.clusterserviceversion.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ metadata:
"modelmeshserving": {
"managementState": "Managed"
},
"modelregistry": {
"managementState": "Removed"
},
"ray": {
"managementState": "Managed"
},
Expand Down Expand Up @@ -1032,6 +1035,32 @@ spec:
- update
- use
- watch
- apiGroups:
- modelregistry.opendatahub.io
resources:
- modelregistries
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- modelregistry.opendatahub.io
resources:
- modelregistries/finalizers
verbs:
- update
- apiGroups:
- modelregistry.opendatahub.io
resources:
- modelregistries/status
verbs:
- get
- patch
- update
- apiGroups:
- monitoring.coreos.com
resources:
Expand Down
2 changes: 2 additions & 0 deletions components/component.go
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,8 @@ func (c *Component) UpdatePrometheusConfig(_ client.Client, enable bool, compone
TrustyAIARules string `yaml:"trustyai-alerting.rules"`
KserveRRules string `yaml:"kserve-recording.rules"`
KserveARules string `yaml:"kserve-alerting.rules"`
ModelRegistryRRules string `yaml:"model-registry-operator-recording.rules"`
ModelRegistryARules string `yaml:"model-registry-operator-alerting.rules"`
} `yaml:"data"`
}
var configMap ConfigMap
Expand Down
211 changes: 211 additions & 0 deletions components/modelregistry/modelregistry.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
// Package modelregistry provides utility functions to config ModelRegistry, an ML Model metadata repository service
// +groupName=datasciencecluster.opendatahub.io
package modelregistry

import (
"context"
"errors"
"fmt"
"path/filepath"
"strings"
"text/template"

"github.com/go-logr/logr"
operatorv1 "github.com/openshift/api/operator/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"sigs.k8s.io/controller-runtime/pkg/client"

dsciv1 "github.com/opendatahub-io/opendatahub-operator/v2/apis/dscinitialization/v1"
infrav1 "github.com/opendatahub-io/opendatahub-operator/v2/apis/infrastructure/v1"
"github.com/opendatahub-io/opendatahub-operator/v2/components"
"github.com/opendatahub-io/opendatahub-operator/v2/pkg/cluster"
"github.com/opendatahub-io/opendatahub-operator/v2/pkg/conversion"
"github.com/opendatahub-io/opendatahub-operator/v2/pkg/deploy"

_ "embed"
)

const DefaultModelRegistryCert = "default-modelregistry-cert"

var (
ComponentName = "model-registry-operator"
Path = deploy.DefaultManifestPath + "/" + ComponentName + "/overlays/odh"
// we should not apply this label to the namespace, as it triggered namspace deletion during operator uninstall
// modelRegistryLabels = cluster.WithLabels(
// labels.ODH.OwnedNamespace, "true",
// ).
ModelRegistriesNamespace = "odh-model-registries"
)

// Verifies that ModelRegistry implements ComponentInterface.
var _ components.ComponentInterface = (*ModelRegistry)(nil)

// ModelRegistry struct holds the configuration for the ModelRegistry component.
// +kubebuilder:object:generate=true
type ModelRegistry struct {
components.Component `json:""`
}

func (m *ModelRegistry) OverrideManifests(ctx context.Context, _ cluster.Platform) error {
// If devflags are set, update default manifests path
if len(m.DevFlags.Manifests) != 0 {
manifestConfig := m.DevFlags.Manifests[0]
if err := deploy.DownloadManifests(ctx, ComponentName, manifestConfig); err != nil {
return err
}
// If overlay is defined, update paths
defaultKustomizePath := "overlays/odh"
if manifestConfig.SourcePath != "" {
defaultKustomizePath = manifestConfig.SourcePath
}
Path = filepath.Join(deploy.DefaultManifestPath, ComponentName, defaultKustomizePath)
}

return nil
}

func (m *ModelRegistry) GetComponentName() string {
return ComponentName
}

func (m *ModelRegistry) ReconcileComponent(ctx context.Context, cli client.Client, logger logr.Logger,
owner metav1.Object, dscispec *dsciv1.DSCInitializationSpec, platform cluster.Platform, _ bool) error {
l := m.ConfigComponentLogger(logger, ComponentName, dscispec)
var imageParamMap = map[string]string{
"IMAGES_MODELREGISTRY_OPERATOR": "RELATED_IMAGE_ODH_MODEL_REGISTRY_OPERATOR_IMAGE",
"IMAGES_GRPC_SERVICE": "RELATED_IMAGE_ODH_MLMD_GRPC_SERVER_IMAGE",
"IMAGES_REST_SERVICE": "RELATED_IMAGE_ODH_MODEL_REGISTRY_IMAGE",
}
enabled := m.GetManagementState() == operatorv1.Managed
monitoringEnabled := dscispec.Monitoring.ManagementState == operatorv1.Managed

if enabled {
// return error if ServiceMesh is not enabled, as it's a required feature
if dscispec.ServiceMesh == nil || dscispec.ServiceMesh.ManagementState != operatorv1.Managed {
return errors.New("ServiceMesh needs to be set to 'Managed' in DSCI CR, it is required by Model Registry")
}

if err := m.createDependencies(ctx, cli, dscispec); err != nil {
return err
}

if m.DevFlags != nil {
// Download manifests and update paths
if err := m.OverrideManifests(ctx, platform); err != nil {
return err
}
}

// Update image parameters only when we do not have customized manifests set
if (dscispec.DevFlags == nil || dscispec.DevFlags.ManifestsUri == "") && (m.DevFlags == nil || len(m.DevFlags.Manifests) == 0) {
extraParamsMap := map[string]string{
"DEFAULT_CERT": DefaultModelRegistryCert,
}
if err := deploy.ApplyParams(Path, imageParamMap, false, extraParamsMap); err != nil {
return fmt.Errorf("failed to update image from %s : %w", Path, err)
}
}

// Create model registries namespace
// We do not delete this namespace even when ModelRegistry is Removed or when operator is uninstalled.
ns, err := cluster.CreateNamespace(ctx, cli, ModelRegistriesNamespace)
if err != nil {
return err
}
l.Info("created model registry namespace", "namespace", ModelRegistriesNamespace)
// create servicemeshmember here, for now until post MVP solution
err = enrollToServiceMesh(ctx, cli, dscispec, ns)
if err != nil {
return err
}
l.Info("created model registry servicemesh member", "namespace", ModelRegistriesNamespace)
} else {
err := m.removeDependencies(ctx, cli, dscispec)
if err != nil {
return err
}
}

// Deploy ModelRegistry Operator
if err := deploy.DeployManifestsFromPath(ctx, cli, owner, Path, dscispec.ApplicationsNamespace, m.GetComponentName(), enabled); err != nil {
return err
}
l.Info("apply manifests done")

// Create additional model registry resources, componentEnabled=true because these extras are never deleted!
if err := deploy.DeployManifestsFromPath(ctx, cli, owner, Path+"/extras", dscispec.ApplicationsNamespace, m.GetComponentName(), true); err != nil {
return err
}
l.Info("apply extra manifests done")

// CloudService Monitoring handling
if platform == cluster.ManagedRhods {
if enabled {
if err := cluster.WaitForDeploymentAvailable(ctx, cli, ComponentName, dscispec.ApplicationsNamespace, 10, 1); err != nil {
return fmt.Errorf("deployment for %s is not ready to server: %w", ComponentName, err)
}
l.Info("deployment is done, updating monitoring rules")
}
if err := m.UpdatePrometheusConfig(cli, enabled && monitoringEnabled, ComponentName); err != nil {
return err
}
if err := deploy.DeployManifestsFromPath(ctx, cli, owner,
filepath.Join(deploy.DefaultManifestPath, "monitoring", "prometheus", "apps"),
dscispec.Monitoring.Namespace,
"prometheus", true); err != nil {
return err
}
l.Info("updating SRE monitoring done")
}
return nil
}

func (m *ModelRegistry) createDependencies(ctx context.Context, cli client.Client, dscispec *dsciv1.DSCInitializationSpec) error {
// create DefaultModelRegistryCert
if err := cluster.PropagateDefaultIngressCertificate(ctx, cli, DefaultModelRegistryCert, dscispec.ServiceMesh.ControlPlane.Namespace); err != nil {
return err
}
return nil
}

func (m *ModelRegistry) removeDependencies(ctx context.Context, cli client.Client, dscispec *dsciv1.DSCInitializationSpec) error {
// delete DefaultModelRegistryCert
certSecret := corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: DefaultModelRegistryCert,
Namespace: dscispec.ServiceMesh.ControlPlane.Namespace,
},
}
// ignore error if the secret has already been removed
if err := cli.Delete(ctx, &certSecret); client.IgnoreNotFound(err) != nil {
return err
}
return nil
}

//go:embed resources/servicemesh-member.tmpl.yaml
var smmTemplate string

func enrollToServiceMesh(ctx context.Context, cli client.Client, dscispec *dsciv1.DSCInitializationSpec, namespace *corev1.Namespace) error {
tmpl, err := template.New("servicemeshmember").Parse(smmTemplate)
if err != nil {
return fmt.Errorf("error parsing servicemeshmember template: %w", err)
}
builder := strings.Builder{}
controlPlaneData := struct {
Namespace string
ControlPlane *infrav1.ControlPlaneSpec
}{Namespace: namespace.Name, ControlPlane: &dscispec.ServiceMesh.ControlPlane}

if err = tmpl.Execute(&builder, controlPlaneData); err != nil {
return fmt.Errorf("error executing servicemeshmember template: %w", err)
}

unstrObj, err := conversion.StrToUnstructured(builder.String())
if err != nil || len(unstrObj) != 1 {
return fmt.Errorf("error converting servicemeshmember template: %w", err)
}

return client.IgnoreAlreadyExists(cli.Create(ctx, unstrObj[0]))
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: maistra.io/v1
kind: ServiceMeshMember
metadata:
name: default
namespace: {{.Namespace}}
spec:
controlPlaneRef:
namespace: {{ .ControlPlane.Namespace }}
name: {{ .ControlPlane.Name }}
40 changes: 40 additions & 0 deletions components/modelregistry/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 83e3e39

Please sign in to comment.