Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metadata.managedFields must be nil error in DSC #1018

Merged

Conversation

VaishnaviHire
Copy link
Member

@VaishnaviHire VaishnaviHire commented May 20, 2024

Description

Jira Issue: https://issues.redhat.com/browse/RHOAIENG-7401

This PR also refactors the manageResource function.

How Has This Been Tested?

Testing Steps:

  1. Deploy a clusterwide resource without operator. E.g (Kuberay clusterrole)
  2. Deploy operator
  3. Create DSC with Kuberay set to Removed
  4. Verify Kuberay clusterrole is not removed
  5. Verify metadata.managedFields must be nil error is not triggered.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@openshift-ci openshift-ci bot requested review from HumairAK and ykaliuta May 20, 2024 15:59
@VaishnaviHire VaishnaviHire requested review from ajaypratap003 and AjayJagan and removed request for HumairAK May 20, 2024 15:59
Comment on lines 443 to 448
for _, owner := range ownerReferences {
if owner.Kind != "DataScienceCluster" && owner.Kind != "DataScienceInitialization" {
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if owerReerences is array, resource could have multiple ownerreference.
should it return false only when all not match DSC and DSCI?

for _, owner := range ownerReferences {
		if owner.Kind == "DataScienceCluster" ||  owner.Kind == "DataScienceInitialization" {
			return true
		}
return false

return nil
}

func hasInvalidOwnerReferences(ownerReferences []metav1.OwnerReference) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name of the func is a bit confusing. isOwnedByODHCRD() ?
so the idea is to check if resource does not have OwnerRef or it is owned by DSCI and DSC, then delete it?

@zdtsw
Copy link
Member

zdtsw commented May 21, 2024

i think there are two different cases with the metadata.managedFields error:

  1. as stated in the jira ticket, by the way of how users creates these resoruce
  2. for the component controllers, they could create resource with sdk but without setting ownerreference or labels
    this PR should have covered both cases, but on the safe side, we need to check with component owners to ensure they have their controller updated if they want such resource to be kept or deleted.

Copy link
Contributor

@ykaliuta ykaliuta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the refactoring, but I would say splitting such code moves and tiny functional changes into 2 patches would make it a bit more review friendly as for me.

}
// Return if error getting resource in cluster
found, err := getResource(ctx, cli, obj)
if err != nil && !apierrs.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be

if client.IgnoreNotFound(err) != nil

makes less negations to handle. Not sure.

@@ -488,4 +388,132 @@ func OperatorExists(cli client.Client, operatorPrefix string) (bool, error) {
return false, nil
}

func getResource(ctx context.Context, cli client.Client, obj *unstructured.Unstructured) (*unstructured.Unstructured, error) {
found := obj.DeepCopy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it was done such a way at the first place? Get() does not need the object from manifests. But the logic was "if it is not deleted, not 'not found', but we could not fetch it, continue patching with the data we got from the manifests, not from the cluster".

Copy link
Contributor

@ykaliuta ykaliuta May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does

func getResource(ctx context.Context, cli client.Client, obj *unstructured.Unstructured) (*unstructured.Unstructured, error) {
	found := &unstructured.Unstructured{}
	err := cli.Get(ctx, types.NamespacedName{Name: obj.GetName(), Namespace: obj.GetNamespace()}, found)
	if err != nil {
		return nil, err
	}
	return found, nil
}

make it work as expected or I'm missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this should work. You are rightfound will never be nil. Thanks for catching this. Will update the function.

}

func handleDisabledComponent(ctx context.Context, cli client.Client, found *unstructured.Unstructured, componentName string, owner metav1.Object) error {
if found == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, will not work here since line 392 makes allocation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should since we pass it function argument

Copy link
Contributor

@ykaliuta ykaliuta May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is never nil even if the resource does not exist

}

func updateOwnerReferencesAndDelete(ctx context.Context, cli client.Client, found *unstructured.Unstructured, componentName string, owner metav1.Object) error {
existingOwnerReferences := found.GetOwnerReferences()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, was it a problem in the original code? It used obj, not found.


func hasInvalidOwnerReferences(ownerReferences []metav1.OwnerReference) bool {
for _, owner := range ownerReferences {
if owner.Kind != "DataScienceCluster" && owner.Kind != "DataScienceInitialization" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not "DSCInitialization", right?

}

func createResource(ctx context.Context, cli client.Client, obj *unstructured.Unstructured, owner metav1.Object) error {
if obj.GetKind() != "CustomResourceDefinition" && obj.GetKind() != "OdhDashboardConfig" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a change in logic with the old code, "OdhDashboardConfig" was handled here as well, right?

}

func skipUpdateOnWhitelistedFields(obj *unstructured.Unstructured, componentName string) error {
if componentName == "kserve" || componentName == "model-mesh" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A component added and label check removed comparing to the original code, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have both components in incubation branch - https://github.com/opendatahub-io/opendatahub-operator/blob/incubation/pkg/deploy/deploy.go#L292-L310

And the label check is not needed

@VaishnaviHire
Copy link
Member Author

i think there are two different cases with the metadata.managedFields error:

  1. as stated in the jira ticket, by the way of how users creates these resoruce
  2. for the component controllers, they could create resource with sdk but without setting ownerreference or labels
    this PR should have covered both cases, but on the safe side, we need to check with component owners to ensure they have their controller updated if they want such resource to be kept or deleted.

For (2), operator should not be responsible for managing and cleanup of resources created by controllers. We are not currently handling this and should not moving forward. The only resources managed by the operator should be the ones included in manifests or the resources created as part of preconditions and postconditions.

Let me know if you see a reason to manage resources that are created by a component controller

@AjayJagan
Copy link
Contributor

changes looks good to me :)

@zdtsw
Copy link
Member

zdtsw commented May 23, 2024

i think there are two different cases with the metadata.managedFields error:

  1. as stated in the jira ticket, by the way of how users creates these resoruce
  2. for the component controllers, they could create resource with sdk but without setting ownerreference or labels
    this PR should have covered both cases, but on the safe side, we need to check with component owners to ensure they have their controller updated if they want such resource to be kept or deleted.

For (2), operator should not be responsible for managing and cleanup of resources created by controllers. We are not currently handling this and should not moving forward. The only resources managed by the operator should be the ones included in manifests or the resources created as part of preconditions and postconditions.

Let me know if you see a reason to manage resources that are created by a component controller

i agree with you that we should not take ownership of these resource not done by our operator.
my concern, for example: end user want to use Ray but they do not use our Operator to manage/remove Ray.
they update Ray's yaml file to add "label" then do kubectl apply to create these resource => these resource are created in the cluster, with label but no ownerreference (as existingOwnerReferences == nil ?)
will our Operator remove these resource once Operator starts?

@VaishnaviHire
Copy link
Member Author

i think there are two different cases with the metadata.managedFields error:

  1. as stated in the jira ticket, by the way of how users creates these resoruce
  2. for the component controllers, they could create resource with sdk but without setting ownerreference or labels
    this PR should have covered both cases, but on the safe side, we need to check with component owners to ensure they have their controller updated if they want such resource to be kept or deleted.

For (2), operator should not be responsible for managing and cleanup of resources created by controllers. We are not currently handling this and should not moving forward. The only resources managed by the operator should be the ones included in manifests or the resources created as part of preconditions and postconditions.
Let me know if you see a reason to manage resources that are created by a component controller

i agree with you that we should not take ownership of these resource not done by our operator. my concern, for example: end user want to use Ray but they do not use our Operator to manage/remove Ray. they update Ray's yaml file to add "label" then do kubectl apply to create these resource => these resource are created in the cluster, with label but no ownerreference (as existingOwnerReferences == nil ?) will our Operator remove these resource once Operator starts?

Yes, operator will remove those resources. Manually adding odh labels is not a supported path for end users. So if a user add s the label to their self-managed resources, they can expect them to be managed by the operator

@VaishnaviHire
Copy link
Member Author

/test opendatahub-operator-e2e

This commit also updates how we fetch resources from cluster
@VaishnaviHire
Copy link
Member Author

/test opendatahub-operator-e2e

if err != nil {
return err
// Create resource if it doesn't exist
if apierrs.IsNotFound(err) || found == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This found check looks redundant. If the object lookup returns NotFound, it's created. Creation uses only manifest's object. (If it's not NotFound, line 216 returns. So, updateResource happens only when no error).

// Setting gvk is required to do Get request
found.SetGroupVersionKind(obj.GroupVersionKind())
err := cli.Get(ctx, types.NamespacedName{Name: obj.GetName(), Namespace: obj.GetNamespace()}, found)
return found, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It again ends up in found never nil.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would need to return nil when we get ObjectNotFound error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, since we have error handling on the caller function, returning nil with any error should be ok (and actual object on non-error Get only)

if err := removeResourcesFromDeployment(obj); err != nil {
return err
}
// Return if error getting resource in cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is a bit misleading now since getResource does not return error if it was NotFound.

I would say the original logic flow is more clear to me. If getResource() returns (nil, error) if there was error including NotFound and returns (found, nil) if it could get the object, than original check here if IgnoreNotFound(err) != nil { return err } will catch all the errors and we can handle NotFound with found to nil check.

Then getResource() as

func getResource(ctx context.Context, cli client.Client, obj *unstructured.Unstructured) (*unstructured.Unstructured, error) {
	found := &unstructured.Unstructured{}
	// Setting gvk is required to do Get request
	found.SetGroupVersionKind(obj.GroupVersionKind())
	err := cli.Get(ctx, types.NamespacedName{Name: obj.GetName(), Namespace: obj.GetNamespace()}, found)
	if err != nil {
		return nil, err
	}
	return found, nil
}

will not break idiomatic way of propagating error.

But I'll not insist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I will update this. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

if apierrs.IsNotFound(err) {
return nil, nil
}
return found, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not NotFound it will return the error and blank object. Hope, it's ok for you. It does not break the code (the check in manageResource for err will filter it).

Patching resources is not required before deletion. Updated delete function
@zdtsw
Copy link
Member

zdtsw commented May 30, 2024

/test opendatahub-operator-e2e

Copy link

openshift-ci bot commented May 30, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AjayJagan, ykaliuta, zdtsw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [AjayJagan,ykaliuta,zdtsw]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit f4e66f0 into opendatahub-io:incubation May 30, 2024
8 checks passed
VaishnaviHire added a commit to VaishnaviHire/opendatahub-operator that referenced this pull request May 30, 2024
)

* Fix metadata.managedFields must be nil error in DSC

* Address comments

This commit also updates how we fetch resources from cluster

* Fix getResource function

* addressed comments

* fix linters

* Update resource deletion

Patching resources is not required before deletion. Updated delete function

* Fix linters

(cherry picked from commit f4e66f0)
VaishnaviHire added a commit to VaishnaviHire/opendatahub-operator that referenced this pull request May 31, 2024
)

* Fix metadata.managedFields must be nil error in DSC

* Address comments

This commit also updates how we fetch resources from cluster

* Fix getResource function

* addressed comments

* fix linters

* Update resource deletion

Patching resources is not required before deletion. Updated delete function

* Fix linters

(cherry picked from commit f4e66f0)
VaishnaviHire referenced this pull request in red-hat-data-services/rhods-operator May 31, 2024
Fix metadata.managedFields must be nil error in DSC (#1018)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants