-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test-cmd flake: quota.sh:32: executing 'oc describe appliedclusterresourcequota/for-deads-by-annotation -n bar --as deads' #11560
Comments
@deads2k that's a very incriminating test, if I ever saw one :) |
yeah, never put your own name in a test case :) (I just hit this also) |
@stevekuznetsov this is starting to look like a good candidate for the "if it's not fixed in a day, back it out" philosophy. |
@stevekuznetsov not yet, the leads call hasn't happened yet. i'm just building the case :) |
Depends on what you mean by "back it out". I think that commenting out the test (which @stevekuznetsov has proposed in the past) would be the wrong thing to do. This was likely caused by #11394 and so @mfojtik should have his chance to fix it before the pull is reverted (which I would be ok with). But I don't see anyone referencing it in this issue, so I'm not sure what you'd back out. |
Even saying the magic words got the likely culprit PR mentioned and the right person assigned... not sure I even needed to think about what to back out :) |
@deads2k again, it needs to be discussed w/ the team, but i'm prepared to say
I think it's important to recognize the huge productivity cost flakes have on teams trying to test/merge code. Every time someone hits a flake they have to go investigate why things failed, look up the issue associated, and then wait for another round of test/merge processing. It's pretty costly, not to mention how discouraging/frustrating it is for developers. I don't really feel like our current flake triaging/prioritizing process is addressing that. |
This condition shouldn't be likely. Even if we don't run it all the time, a bisect to find the offender ought to be something the test infrastructure can answer in a couple hours. Spawn 10+ concurrent jobs for every likely commit and see which ones flake on the new problem. "I don't know" seems like an unreasonable thing to say. For things like this, if the test got disabled, I think its unlikely someone would have chased it down. As I noted in my comment, 2 and 3 seem pretty reasonable to me, but simply "let's disable this test" doesn't seem like a good path forward to me. |
it's not an issue with the test, it's a bug:
|
That's neat. I still don't think the right answer to comment out the test. |
yeah that's a concern, obviously if we disable a test we need a tracking issue of high priority(p1/blocker) to ensure the test does get re-enabled at some point in the future. |
to reproduce, run this over and over until the last output shows the mismatch between the total and the sum of the individual namespaces: oc delete project foo bar
oc delete clusterresourcequota --all
oc create clusterquota for-deads-by-annotation --project-annotation-selector=openshift.io/requester=deads --hard=secrets=50
oc new-project foo --as=deads
oc new-project bar --as=deads
oc get clusterresourcequota/for-deads-by-annotation -o yaml |
one possibility: checkAttributes()/checkQuotas() assume they are free to modify the quotas returned from the accessor if checkQuotas() fails to update a quota, it refetches from the accessor, and recurses into checkQuotas(), reapplying the request attributes to the returned quotas the clusterresourcequota accessor does not return copies, but original fields from the cache |
clusterresourcequota status totals are getting out of sync with the sum of the namespaces. still debugging why. mustfix for 1.4 |
found it in deep copies: #11621 |
github.com/openshift/origin/test/cmd/quota/clusterquota.test/cmd/quota.sh:32: executing 'oc describe appliedclusterresourcequota/for-deads-by-annotation -n bar --as deads' expecting any result and text 'secrets.*1[0-9]'; re-trying every 0.2s until completion or 60.000s
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_check/7682/
The text was updated successfully, but these errors were encountered: