-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: [RHOAIENG-4485] Fix feature tracker errors while uninstalling operator #1192
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -105,6 +105,7 @@ metadata: | |||
createdAt: "2024-08-09T10:12:27Z" | |||
olm.skipRange: '>=1.0.0 <2.16.0' | |||
operators.operatorframework.io/builder: operator-sdk-v1.31.0 | |||
operators.operatorframework.io/internal-objects: '[featuretrackers.features.opendatahub.io]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotation expects double quotes for the crd to be hidden in console
operators.operatorframework.io/internal-objects: '["featuretrackers.features.opendatahub.io"]'
Even when CRD is hidden, it does appear in list of operands during uninstall I see 2 errors -
Note There seems to be no way current to do ordered deletion of operands in OpenShift Web Console. |
@VaishnaviHire Before I dive deeper, quick question: who is deleting instances of |
Morning coffee later. Is it what is happening?
What we can do here is to make DSC and DSCI (depending on where features were created) owners of associated FTs and simplify the cleanup logic. Not sure this will mitigate the problem of trying to delete a resource that was already garbage collected. We have this already done on our feature branch waiting as stacked PR, so can open it against |
Well, it came there for reasons. What can be a solution then? There was discussion about ownership (@bartoszmajsak ) but IIUC it will cause DSC deletion if a user deletes DSCI "by accident" which is not desirable.
|
@ykaliuta I don't remember a discussion about DSCI accidental deletion - can you point me to it so that I can refresh my fish memory? :) |
|
here are two different ways of uninstallations:
the problem here lies in the way how 1) and 2) are not doing the same thing. but as you all have said abvoe ^, when console first snapshot all operands and try to delete all of them, some of these are already gone due to operator's intelligence . so the error catch all these "missing" ones. back to the change itself: as for the webhook part: and i have no good idea how to tackle this "UI uninstallation" without a big refactor of code, tbh. |
My 2 cents:
|
FWIW, you can actually do the same with |
mostly likely, yes. |
I doubt it's an option: if DSCI is deleted with its Features, components probably will fail. Some of them I remember make some decisions on initialization, so the cluster will be in a pretty undefined state. Correct me if I'm wrong. Would it be possible to merge DSCI configuration into DSC without changing API version? Probably, yes, it just adds fields and live should be possible.
|
No you are not wrong and we may need to work to improve components failure handling. My main point, is that we should try to make the operator resilient to failure and be able to gracefully handle a degraded service.
As long as the change is backward compatible, then yes is is possible. |
@bartoszmajsak Regarding the Feature Tracker error, Will merging two resources (DSC & DSCI) solve/fix the Feature tracker related errors while uninstalling via UI? while reproduction the issue as I can see and you have also mentioned the same here, the operator removes or cleans up FTs quickly whereas openshift console is trying remove the operands which are already removed. Could there be a way to notify/signal web console from the operator immediately after the FTs operands are removed? so that web console doesnt look for them again. @zdtsw @ykaliuta @VaishnaviHire |
I think if the FTs are listed in the |
i did a test with this proposal (fixed with "") and from UI, it does not show FTer anymore after installed our Operator. |
@lburgazzoli Well i tried with that annotation, though FT didnt show up in the UI(list of operands) , but while uninstalling i still see error releated to FT. some how it is still considering FT while uninstalling |
+1 |
ok, so we need to dig more into this |
Introduces ability to define owner reference for FeatureTracker (internal custom resource). This enables garbage collection of resources belonging to the given Feature when its owner such as DSC or DSCI has been removed. When Features are grouped through means of `FeatureHandler`s, the ownership is defined under the hood, linking it to DSCI or DSC, depending on the type used. In addition, this PR simplifies how Feature relies on `client.Client` instance. Instead of creating its own instance it expects to have one passed as part of the builder call chain. It creates a default one otherwise. With that we can rely on a client configured for controllers with the right schemas, caching rules etc. As a consequence the change of retry logic for status needed to adjusted. Besides a conflict it needs to be triggered on NotFound error, as shared client's cache might not have yet an instance of FeatureTracker created just before the condition is reported. The reason why it was working before is the fact that Feature has its own simple client instance w/o caching. > [!IMPORTANT] > > With DSCI and DSC becoming owners of particular FeatureTrackers > `func (c *Component) Cleanup` called in finalizer block becomes > somewhat speculative. The only use case where custom function is > invoked is unpatching SMCP authorization provider. This was based > on early assumption that we might want to plug into customer's > existing SMCP instance. It's unclear to me if this is still long-term > thinking so we might want to revisit need for this function. > > From the completness point of view, if we allow to create/manipulate > resources programatically as part of the Feature DSL, we should be able > to register cleanup counter-part functions, especially if we cannot simply > remove them (as this is handled through ownership of associated > FeatureTracker). > > There is, however, a need to perform cleanup when particular > component is "Removed" (not managed anymore). Right now this is > handled inside its Reconcile function. Fixes [RHOAIENG-8629](https://issues.redhat.com/browse/RHOAIENG-8629) Relates to opendatahub-io#1192 (comment) Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com>
…1228) * feat(tracker): feature tracker can have ownerReferences Introduces ability to define owner reference for FeatureTracker (internal custom resource). This enables garbage collection of resources belonging to the given Feature when its owner such as DSC or DSCI has been removed. When Features are grouped through means of `FeatureHandler`s, the ownership is defined under the hood, linking it to DSCI or DSC, depending on the type used. In addition, this PR simplifies how Feature relies on `client.Client` instance. Instead of creating its own instance it expects to have one passed as part of the builder call chain. It creates a default one otherwise. With that we can rely on a client configured for controllers with the right schemas, caching rules etc. As a consequence the change of retry logic for status needed to adjusted. Besides a conflict it needs to be triggered on NotFound error, as shared client's cache might not have yet an instance of FeatureTracker created just before the condition is reported. The reason why it was working before is the fact that Feature has its own simple client instance w/o caching. > [!IMPORTANT] > > With DSCI and DSC becoming owners of particular FeatureTrackers > `func (c *Component) Cleanup` called in finalizer block becomes > somewhat speculative. The only use case where custom function is > invoked is unpatching SMCP authorization provider. This was based > on early assumption that we might want to plug into customer's > existing SMCP instance. It's unclear to me if this is still long-term > thinking so we might want to revisit need for this function. > > From the completness point of view, if we allow to create/manipulate > resources programatically as part of the Feature DSL, we should be able > to register cleanup counter-part functions, especially if we cannot simply > remove them (as this is handled through ownership of associated > FeatureTracker). > > There is, however, a need to perform cleanup when particular > component is "Removed" (not managed anymore). Right now this is > handled inside its Reconcile function. Fixes [RHOAIENG-8629](https://issues.redhat.com/browse/RHOAIENG-8629) Relates to #1192 (comment) Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com> * fix: no need to return dsci --------- Co-authored-by: Cameron Garrison <cgarriso@redhat.com>
…pendatahub-io#1228) * feat(tracker): feature tracker can have ownerReferences Introduces ability to define owner reference for FeatureTracker (internal custom resource). This enables garbage collection of resources belonging to the given Feature when its owner such as DSC or DSCI has been removed. When Features are grouped through means of `FeatureHandler`s, the ownership is defined under the hood, linking it to DSCI or DSC, depending on the type used. In addition, this PR simplifies how Feature relies on `client.Client` instance. Instead of creating its own instance it expects to have one passed as part of the builder call chain. It creates a default one otherwise. With that we can rely on a client configured for controllers with the right schemas, caching rules etc. As a consequence the change of retry logic for status needed to adjusted. Besides a conflict it needs to be triggered on NotFound error, as shared client's cache might not have yet an instance of FeatureTracker created just before the condition is reported. The reason why it was working before is the fact that Feature has its own simple client instance w/o caching. > [!IMPORTANT] > > With DSCI and DSC becoming owners of particular FeatureTrackers > `func (c *Component) Cleanup` called in finalizer block becomes > somewhat speculative. The only use case where custom function is > invoked is unpatching SMCP authorization provider. This was based > on early assumption that we might want to plug into customer's > existing SMCP instance. It's unclear to me if this is still long-term > thinking so we might want to revisit need for this function. > > From the completness point of view, if we allow to create/manipulate > resources programatically as part of the Feature DSL, we should be able > to register cleanup counter-part functions, especially if we cannot simply > remove them (as this is handled through ownership of associated > FeatureTracker). > > There is, however, a need to perform cleanup when particular > component is "Removed" (not managed anymore). Right now this is > handled inside its Reconcile function. Fixes [RHOAIENG-8629](https://issues.redhat.com/browse/RHOAIENG-8629) Relates to opendatahub-io#1192 (comment) Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com> * fix: no need to return dsci --------- Co-authored-by: Cameron Garrison <cgarriso@redhat.com> (cherry picked from commit 10dd554)
…pendatahub-io#1228) * feat(tracker): feature tracker can have ownerReferences Introduces ability to define owner reference for FeatureTracker (internal custom resource). This enables garbage collection of resources belonging to the given Feature when its owner such as DSC or DSCI has been removed. When Features are grouped through means of `FeatureHandler`s, the ownership is defined under the hood, linking it to DSCI or DSC, depending on the type used. In addition, this PR simplifies how Feature relies on `client.Client` instance. Instead of creating its own instance it expects to have one passed as part of the builder call chain. It creates a default one otherwise. With that we can rely on a client configured for controllers with the right schemas, caching rules etc. As a consequence the change of retry logic for status needed to adjusted. Besides a conflict it needs to be triggered on NotFound error, as shared client's cache might not have yet an instance of FeatureTracker created just before the condition is reported. The reason why it was working before is the fact that Feature has its own simple client instance w/o caching. > [!IMPORTANT] > > With DSCI and DSC becoming owners of particular FeatureTrackers > `func (c *Component) Cleanup` called in finalizer block becomes > somewhat speculative. The only use case where custom function is > invoked is unpatching SMCP authorization provider. This was based > on early assumption that we might want to plug into customer's > existing SMCP instance. It's unclear to me if this is still long-term > thinking so we might want to revisit need for this function. > > From the completness point of view, if we allow to create/manipulate > resources programatically as part of the Feature DSL, we should be able > to register cleanup counter-part functions, especially if we cannot simply > remove them (as this is handled through ownership of associated > FeatureTracker). > > There is, however, a need to perform cleanup when particular > component is "Removed" (not managed anymore). Right now this is > handled inside its Reconcile function. Fixes [RHOAIENG-8629](https://issues.redhat.com/browse/RHOAIENG-8629) Relates to opendatahub-io#1192 (comment) Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com> * fix: no need to return dsci --------- Co-authored-by: Cameron Garrison <cgarriso@redhat.com> (cherry picked from commit 10dd554)
…1228) * feat(tracker): feature tracker can have ownerReferences Introduces ability to define owner reference for FeatureTracker (internal custom resource). This enables garbage collection of resources belonging to the given Feature when its owner such as DSC or DSCI has been removed. When Features are grouped through means of `FeatureHandler`s, the ownership is defined under the hood, linking it to DSCI or DSC, depending on the type used. In addition, this PR simplifies how Feature relies on `client.Client` instance. Instead of creating its own instance it expects to have one passed as part of the builder call chain. It creates a default one otherwise. With that we can rely on a client configured for controllers with the right schemas, caching rules etc. As a consequence the change of retry logic for status needed to adjusted. Besides a conflict it needs to be triggered on NotFound error, as shared client's cache might not have yet an instance of FeatureTracker created just before the condition is reported. The reason why it was working before is the fact that Feature has its own simple client instance w/o caching. > [!IMPORTANT] > > With DSCI and DSC becoming owners of particular FeatureTrackers > `func (c *Component) Cleanup` called in finalizer block becomes > somewhat speculative. The only use case where custom function is > invoked is unpatching SMCP authorization provider. This was based > on early assumption that we might want to plug into customer's > existing SMCP instance. It's unclear to me if this is still long-term > thinking so we might want to revisit need for this function. > > From the completness point of view, if we allow to create/manipulate > resources programatically as part of the Feature DSL, we should be able > to register cleanup counter-part functions, especially if we cannot simply > remove them (as this is handled through ownership of associated > FeatureTracker). > > There is, however, a need to perform cleanup when particular > component is "Removed" (not managed anymore). Right now this is > handled inside its Reconcile function. Fixes [RHOAIENG-8629](https://issues.redhat.com/browse/RHOAIENG-8629) Relates to opendatahub-io#1192 (comment) Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com> * fix: no need to return dsci --------- Co-authored-by: Cameron Garrison <cgarriso@redhat.com> (cherry picked from commit 10dd554)
…flux/component-updates/odh-trustyai-service-operator-v2-17 chore(deps): update odh-trustyai-service-operator-v2-17 to 6b0de77
Description
How Has This Been Tested?
Screenshot or short clip
Merge criteria