-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have pre-req retry. #571
Have pre-req retry. #571
Conversation
A new image has been built to help with testing out this PR: To use this image run the following: cd $(mktemp -d)
git clone git@github.com:opendatahub-io/data-science-pipelines-operator.git
cd data-science-pipelines-operator/
git fetch origin pull/571/head
git checkout -b pullrequest c3221fced0eb9209b9338c1165cd8ede9e1fdc22
oc new-project opendatahub
make deploy IMG="quay.io/opendatahub/data-science-pipelines-operator:pr-571" More instructions here on how to deploy and test a Data Science Pipelines Application. |
c3221fc
to
8c9bff0
Compare
Change to PR detected. A new PR build was completed. |
Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
8c9bff0
to
eef5934
Compare
Change to PR detected. A new PR build was completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/Approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DharmitD The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The issue resolved by this Pull Request:
Resolves https://issues.redhat.com/browse/RHOAIENG-2099
Description of your changes:
Fixes a bug where we were not returning an error where expected, when DB connection failed to query against the configured DB.
Also added a fall back reconcile requeue. When a health check by DSPO fails for Object store or Database, dspo will requeue this DSPA to reconcile again after 20s (by default, but configurable at dspo level), then try again.
Other considerations
This adds infinite retry logic when prereqs fail. There is no sort of exponential backoff added here, the logic is pretty. My thinking is to iterate on this and eventually add a max retry attempts or some sort of backoff time in the status field.
Testing instructions
Deploy DSPO
Deploy Multiple DSPA's
Inspect the DSPA status field, ensure the proper status message/reasons are being trickled into the status field
Try with a working DB, then an invalid endpoint.
Try to scale down the default MariaDB before it comes up, and inspect the behavior of the DSPO, then try scaling it back up again.
Checklist