Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: do not fail TaskRun for concurrent modification errors #7467

Merged
merged 1 commit into from
Dec 11, 2023

Conversation

JeromeJu
Copy link
Member

@JeromeJu JeromeJu commented Dec 7, 2023

Changes

This commit fixes the behaviour that a concurrent modification error when stopping sidecar will fail the TaskRun, which could cause successful Tasks to fail even though it could succeed after retrying.

/kind bug
fixes: #7452

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

fix: taskRuns will not fail for concurrent modification errors when stopping sideCars

@tekton-robot tekton-robot added kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesnt merit a release note. labels Dec 7, 2023
@tekton-robot tekton-robot requested review from abayer and jerop December 7, 2023 15:04
@tekton-robot tekton-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 7, 2023
@JeromeJu JeromeJu marked this pull request as draft December 7, 2023 15:04
@tekton-robot tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 7, 2023
@JeromeJu JeromeJu marked this pull request as ready for review December 7, 2023 15:05
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 7, 2023
@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesnt merit a release note. labels Dec 7, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 85.6% 84.1% -1.5

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 85.6% 84.1% -1.5

@JeromeJu JeromeJu force-pushed the 7452-concurrent-modification branch from 1759704 to 433e847 Compare December 7, 2023 15:25
@tekton-robot tekton-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 7, 2023
@JeromeJu
Copy link
Member Author

JeromeJu commented Dec 7, 2023

So far I have managed to add a bare minimum unit test because it doesn't seem to be easy(or even possible) to mock an injection of the k8s concurrent modification error.

If there are any suggestions for adding more testing around, I'm all ears 🙏

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 85.6% 85.2% -0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 85.6% 85.2% -0.4

This commit fixes the behaviour that a concurrent modification error
when stopping sidecar will fail the TaskRun, which could cause
successful Tasks to fail even though it could succeed after retrying.

/kind bug
@JeromeJu JeromeJu force-pushed the 7452-concurrent-modification branch from 433e847 to 5f07c37 Compare December 7, 2023 16:14
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 85.6% 85.2% -0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/taskrun.go 85.6% 85.2% -0.4

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 7, 2023
@@ -1014,6 +1020,28 @@ func isResourceQuotaConflictError(err error) bool {
return k8ErrStatus.Details != nil && k8ErrStatus.Details.Kind == "resourcequotas"
}

const (
// TODO(#7466) Currently this appears as a local constant due to upstream dependencies bump blocker.
Copy link
Member

@Yongxuanzhang Yongxuanzhang Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~is 7466 a pr?~
Oh I see, #7466 this one

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester, Yongxuanzhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Yongxuanzhang,vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@Yongxuanzhang Yongxuanzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2023
// isConcurrentModificationError determines whether it is a concurrent
// modification error depending on its error type and error message.
func isConcurrentModificationError(err error) bool {
if !k8serrors.IsConflict(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why just checking k8serrors.IsConflict() isn't sufficient and we have to check the exact optimisticLockErrorMsg?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are actual other cases where the NewConflict() errors are thrown i.e. there is an invalid storage error or uid mismatch where there could be a case of an object was missing unexpectedly.

It might be safe to only retry on concurrent modifications for this case IIUC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, makes sense.

@dibyom
Copy link
Member

dibyom commented Dec 11, 2023

/test pull-tekton-pipeline-alpha-integration-tests

@tekton-robot tekton-robot merged commit fc02b29 into tektoncd:main Dec 11, 2023
9 checks passed
@JeromeJu
Copy link
Member Author

/cherry-pick release-v0.53.x

@tekton-robot
Copy link
Collaborator

@JeromeJu: new pull request created: #7479

In response to this:

/cherry-pick release-v0.53.x

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should we have stopSidecar retryable?
5 participants