Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade: cleanup CreateWithRetry usage #1145

Merged

Conversation

ykaliuta
Copy link
Contributor

@ykaliuta ykaliuta commented Jul 30, 2024

Description

Make CreateWithRetry work for already existing object and remove extra handling for AlreadyExists in the caller code. This does not work for webhook case anyway.

How Has This Been Tested?

Tested webhook configuration:

  • enable automatic dsci creation, in config/manager/manager.yaml set DISABLE_DSC_CONFIG to false
  • install on clean cluster:
    • logs show Error creating object: Internal error occurred: failed calling webhook "operator.opendatahub.io": failed to call webhook: Post "https://opendatahub-operator-webhook-service.opendatahub-operator-system.svc:443/validate-opendatahub-io-v1?timeout=10s": no endpoints available for service "opendatahub-operator-webhook-service". Retrying...
  • after some time oc get dsci shows default-dsci Ready
  • restart the pod (delete)
    • logs show Resource default-dsci already exists. It will not be updated with default values
  • pod's status Running
  • dsci phase Ready

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link

openshift-ci bot commented Jul 30, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ykaliuta ykaliuta force-pushed the createwithretry branch 4 times, most recently from 36e1e4f to 4de8683 Compare July 30, 2024 22:40
@ykaliuta ykaliuta marked this pull request as ready for review July 30, 2024 22:54
@openshift-ci openshift-ci bot requested review from MarianMacik and Sara4994 July 30, 2024 22:54
@ykaliuta
Copy link
Contributor Author

/onhold

return err
}

switch {
case len(instances.Items) > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but by removing this part, when there is no webhook, we are actually able to create 2nd instance, isnt it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand you move the logic "already one exist" and "not any exists yet" into CreateWithRetry() but not really cover the case "already one exist but now i wanna create another one with a different name" if webhook (say, downstream) is not in place. so this code wont be able to sync to downstream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from this ^ , i am fine with the change in CreateWithRetry()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are totally right, I mentioned it in the commit message BTW.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took back already exists logic for DSCI. Did not introduce it for DSC. Removed the message from CreateWithRetry().

return fmt.Errorf("failed to create DataScienceCluster custom resource: %w", err)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on line 122: If there exists an instance already, it patches the DSCISpec with default values

can we first clarify what we want: if exist then patch back to default or leave it as-is.
from code (even before this PR) looks like it is not gonna patch to default, because if one DSCI CR exist, then dont do anything regardless what is the name or if the spec is different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great eye! I missed the comment. In the original patch 1ccbe05 ("Unset Tech Preview components by default (#708)") it patched with the default values. But then f756e40 ("Update incubation with downstream changes (#783)") changes it with the message I borrowed without changing the comment.

(Sorry, but due to squashing it's not obvious where the change came from, I could dig it to a4788f3 ("fix(mm-monitoring): revert the code logic but set to disable as delete (#153)") but it squashed again and has only line Retain existing DSCI values with no description or commit reference around.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. i will need to have a follow-up on this. probably talk with @VaishnaviHire when she is back.

@zdtsw zdtsw requested review from VaishnaviHire and removed request for MarianMacik and Sara4994 August 1, 2024 09:03
Copy link
Member

@zdtsw zdtsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@zdtsw zdtsw mentioned this pull request Aug 5, 2024
5 tasks
@ykaliuta
Copy link
Contributor Author

ykaliuta commented Aug 5, 2024

/retest-required

Intention of the function was to ensure that the object is created
and retry in case of webhook is temporary not available.

The original code lacks handling already existing object. Handling
with and without webhook configurations make the things more
complicated. Let's check Create() errors for different cases:

If webhook enabled:
  - no error (err == nil)
  - 500 InternalError likely if webhook is not available (yet)
  - 403 Forbidden if webhook blocks creation (check of existance)
  - some problem (real error)
else, if webhook disabled:
  - no error (err == nil)
  - 409 AlreadyExists if object exists
  - some problem (real error)

Check already existing object after Create() with a call to
Get(). It covers both with and without webhook configurations.

Reuse the same object for Get() to avoid fetching Gvk for it. It
doesn't make harm, the object structure is not used after creation.

500 InternalError is not 100% webhook problem, but consider it as a
reason for retry.

Fixes: e26100e ("upgrade: retry if default DSCI creation fails (opendatahub-io#1008)")

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
CreateWithRetry() now checks AlreadyExists condition inside, so skip
its handling.

sleep() is not needed for DSCI with proper working CreateWithRetry
since it is the actual point of existance of the function.

Keep checking number of DSCI instances for non-webhook configuration.

Basically, using CreateWithRetry for DSC is redundant from webhook
unavailability point of view since it is created after DSCI which
garantees working webhook. But it handles already existed object.

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm label Aug 5, 2024
@openshift-ci openshift-ci bot added the lgtm label Aug 5, 2024
Copy link

openshift-ci bot commented Aug 5, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zdtsw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 5c755e6 into opendatahub-io:incubation Aug 5, 2024
8 checks passed
zdtsw referenced this pull request in zdtsw-forking/rhods-operator Aug 15, 2024
* cluster: CreateWithRetry: make work for already existing object

Intention of the function was to ensure that the object is created
and retry in case of webhook is temporary not available.

The original code lacks handling already existing object. Handling
with and without webhook configurations make the things more
complicated. Let's check Create() errors for different cases:

If webhook enabled:
  - no error (err == nil)
  - 500 InternalError likely if webhook is not available (yet)
  - 403 Forbidden if webhook blocks creation (check of existance)
  - some problem (real error)
else, if webhook disabled:
  - no error (err == nil)
  - 409 AlreadyExists if object exists
  - some problem (real error)

Check already existing object after Create() with a call to
Get(). It covers both with and without webhook configurations.

Reuse the same object for Get() to avoid fetching Gvk for it. It
doesn't make harm, the object structure is not used after creation.

500 InternalError is not 100% webhook problem, but consider it as a
reason for retry.

Fixes: e26100e ("upgrade: retry if default DSCI creation fails (red-hat-data-services#1008)")

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

* upgrade: cleanup CreateWithRetry usage

CreateWithRetry() now checks AlreadyExists condition inside, so skip
its handling.

sleep() is not needed for DSCI with proper working CreateWithRetry
since it is the actual point of existance of the function.

Keep checking number of DSCI instances for non-webhook configuration.

Basically, using CreateWithRetry for DSC is redundant from webhook
unavailability point of view since it is created after DSCI which
garantees working webhook. But it handles already existed object.

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

---------

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
(cherry picked from commit 5c755e6)
openshift-merge-bot bot referenced this pull request in red-hat-data-services/rhods-operator Aug 16, 2024
* cluster: CreateWithRetry: make work for already existing object

Intention of the function was to ensure that the object is created
and retry in case of webhook is temporary not available.

The original code lacks handling already existing object. Handling
with and without webhook configurations make the things more
complicated. Let's check Create() errors for different cases:

If webhook enabled:
  - no error (err == nil)
  - 500 InternalError likely if webhook is not available (yet)
  - 403 Forbidden if webhook blocks creation (check of existance)
  - some problem (real error)
else, if webhook disabled:
  - no error (err == nil)
  - 409 AlreadyExists if object exists
  - some problem (real error)

Check already existing object after Create() with a call to
Get(). It covers both with and without webhook configurations.

Reuse the same object for Get() to avoid fetching Gvk for it. It
doesn't make harm, the object structure is not used after creation.

500 InternalError is not 100% webhook problem, but consider it as a
reason for retry.

Fixes: e26100e ("upgrade: retry if default DSCI creation fails (#1008)")

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

* upgrade: cleanup CreateWithRetry usage

CreateWithRetry() now checks AlreadyExists condition inside, so skip
its handling.

sleep() is not needed for DSCI with proper working CreateWithRetry
since it is the actual point of existance of the function.

Keep checking number of DSCI instances for non-webhook configuration.

Basically, using CreateWithRetry for DSC is redundant from webhook
unavailability point of view since it is created after DSCI which
garantees working webhook. But it handles already existed object.

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

---------

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
(cherry picked from commit 5c755e6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants