-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow for retry on typically transient k8s errors in both core controller and resolver for remote resolution #7894
allow for retry on typically transient k8s errors in both core controller and resolver for remote resolution #7894
Conversation
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
fd6ea81
to
db0ed3f
Compare
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
just pushed an extra-credit commit that properly generates like:
I can break it off into a separate PR if desired |
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
LGTM |
10a9dbe
to
e4973c0
Compare
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
e4973c0
to
0e94804
Compare
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
0e94804
to
4368c8a
Compare
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
4368c8a
to
84a6c2f
Compare
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
fab6011
to
e96f4a0
Compare
ok @chitrangpatel @afrittoli I have rebased on top of @chitrangpatel 's recently merged refactor |
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also make the corresponding changes from pkg/resolution/resolver/framework/reconciler.go
and its test
into pkg/remoteresolution/resolver/framework/reconciler.go
and its test file?
All the other files are LGTM!
Thats the newer framework. The current one will be deprecated at somepoint.
Thanks @chitrangpatel - as I mentioned on slack, I think we should only do the changes in the new framework, since that's what is actually deployed and used now, and I would not want to have to maintain both. The old framework is still around to provide some room for potential users to move to the new one, but I would not backport changes to it. @chitrangpatel should we mark the old resolvers as deprecated already? |
Yes, I'm happy to. I submitted a PR for that already. |
e96f4a0
to
018b243
Compare
ok @chitrangpatel @afrittoli I moved those changes from the deprecated resolution subpackage over to remoteresolution there was a minor goimport cleanup I left in the old deprecated package reconciler_test.go file I also squashed the commits and updated the git commit message accordingling PTAL / thanks |
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chitrangpatel, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR!
I added some minor comments, but nothing blocking.
/lgtm
@@ -645,8 +646,12 @@ func resolveTask( | |||
case errors.Is(err, remote.ErrRequestInProgress): | |||
return rt, err | |||
case err != nil: | |||
name := pipelineTask.TaskRef.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: since you've done the investigation, would you mind adding a comment to explain in which situations the name might be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part of next push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops .... I pushed the rebase before this update @afrittoli and with @chitrangpatel lgtm the pr has merged
I'll handle this in and the unit test coverage item in a follow up PR I should be able to open today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #7950
errors.New("etcdserver: leader changed"), | ||
context.DeadlineExceeded, | ||
apierrors.NewConflict(pipeline.TaskRunResource, "", nil), | ||
apierrors.NewServerTimeout(pipeline.TaskRunResource, "", 0), | ||
apierrors.NewTimeoutError("", 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :)
paramString += fmt.Sprintf("name could not be marshalled: %s\n", err.Error()) | ||
continue | ||
} | ||
name = string(asJSON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: coverage shows no coverage for this line, can we add a test case with an object param that can be marshalled as JSON? It can be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I'll handle this in separate PR ... hopefully should have it open today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #7950
…ller and resolver for remote resolution During both sides of remote resolution (core controller and resolver) typically transient kubernetes errors were being treated as permanent knative errors and no attempts at trying to reconcile again were made, leading to failures which could be avoided. Then, while diagnosing this, discovered the TaskNotFoundError was missing the Task name when identification comes from params. That is also addressed.
018b243
to
cf47a44
Compare
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
/lgtm |
Changes
collaborating with @khrm here is an augmented version of #7893
Fixes #7909
During both sides of remote resolution (core controller and resolver) typically transient kubernetes errors were being treated as permanent knative errors and no attempts at trying to reconcile again were made, leading to failures which could be avoided.
These changes addresses that.
An example log snippet from the core controller
Accompanying log snippet from the resolver
/kind bug
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
/kind <type>
. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tepRelease Notes