-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite flakey workspace-in-sidecar example #4349
Conversation
i havent seen a named pipe in so long 🤩 (if you want to try to get the CI to trigger this, one terribly hacky thing you could do is update the test logic to run this test a bunch of times - like 100 times) |
/test check-pr-has-kind-label |
@sbwsg i wonder if it's possible this is some kind of interaction between the volumeMounts specification and workspaces - probably a red herring but i was surprised to see this example wasn't using the sidecar workspaces feature (i guess it pre-dates it?) |
@bobcatfish great idea about running the test lots of times, I'll give that a shot! The |
/test pull-tekton-pipeline-integration-tests |
2 similar comments
/test pull-tekton-pipeline-integration-tests |
/test pull-tekton-pipeline-integration-tests |
The workspace-in-sidecar example taskruns are the source of regular flakes in Pipelines' CI. The examples work by using files in a shared emptyDir volume to synchronize the behaviour of two containers. This commit introduces a named pipe for synchronizing behaviour between the two containers, removing one of the file polling loops. The size of the resource requests and shared volume has been set extremely low (just in case the errors are related to disk pressure or resource starvation on the kubelet) and extra log lines are also included to help narrow down where a freeze might be occurring in future.
I updated the test runner to repeat the workspace-in-sidecar example 20 times per run, and then re-ran the suite 3 times. So after 60 executions the I don't think that this PR necessarily solves the problem with the example but hopefully the extra log lines I've added will help surface the real underlying problem when it rears its head again. |
/hold cancel |
i guess you jinxed it @sbwsg X'D that's one way to reproduce an error i suppose XD |
/test pull-tekton-pipeline-alpha-integration-tests Frustratingly the tests that failed during this alpha integration run weren't the |
The isolated workspaces example failed because it took longer than 1 minute to complete.
The Edit: OK so the 50 separate pods is likely because there are two copies of |
/test pull-tekton-pipeline-alpha-integration-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excited to see this fix as i just hit this issue in another pr! thanks @sbwsg 🎉
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jerop The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
thank you @sbwsg for detailed explanation 🙏 hoping not to see this flake again 🤣 /lgtm |
/test pull-tekton-pipeline-alpha-integration-tests |
flake while fixing flake 🤔 /test pull-tekton-pipeline-alpha-integration-tests |
Sigh, ok clearly my changes don't fix the flake :D Based on timestamps from the taskrun and pod here's the order of operations:
Couple observations here:
|
Changes
Issue: #4169
The
workspace-in-sidecar
example taskruns are the source of regularflakes in Pipelines' CI. The examples work by using files in a shared
emptyDir volume to synchronize the behaviour of two containers.
This commit introduces a named pipe for synchronizing behaviour between
the two containers, removing one of the file polling loops. The size of the
shared volume has been set extremely low (just in case the problem is related
to disk pressure on the kubelet) and extra log lines are also included to help
narrow down where a freeze might be occurring.
Adding a hold to see if we can get the CI to fail on workspace-in-sidecar
examples for debugging.
/hold
/kind flake
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
functionality, content, code)
Release Notes