test-cmd flake: quota.sh:32: executing 'oc describe appliedclusterresourcequota/for-deads-by-annotation -n bar --as deads' #11560

csrwng · 2016-10-25T15:00:19Z

github.com/openshift/origin/test/cmd/quota/clusterquota.test/cmd/quota.sh:32: executing 'oc describe appliedclusterresourcequota/for-deads-by-annotation -n bar --as deads' expecting any result and text 'secrets.*1[0-9]'; re-trying every 0.2s until completion or 60.000s

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_check/7682/

csrwng · 2016-10-25T15:02:17Z

@deads2k that's a very incriminating test, if I ever saw one :)

bparees · 2016-10-25T15:26:52Z

yeah, never put your own name in a test case :) (I just hit this also)

bparees · 2016-10-26T04:10:40Z

@stevekuznetsov this is starting to look like a good candidate for the "if it's not fixed in a day, back it out" philosophy.

stevekuznetsov · 2016-10-26T13:29:27Z

@bparees not sure if we got buy-in on that or if I'm the one to do it, but yes I agree with you. @deads2k any updates?

bparees · 2016-10-26T13:36:04Z

@stevekuznetsov not yet, the leads call hasn't happened yet. i'm just building the case :)

deads2k · 2016-10-26T13:44:51Z

@stevekuznetsov this is starting to look like a good candidate for the "if it's not fixed in a day, back it out" philosophy.

Depends on what you mean by "back it out". I think that commenting out the test (which @stevekuznetsov has proposed in the past) would be the wrong thing to do. This was likely caused by #11394 and so @mfojtik should have his chance to fix it before the pull is reverted (which I would be ok with).

But I don't see anyone referencing it in this issue, so I'm not sure what you'd back out.

stevekuznetsov · 2016-10-26T13:58:33Z

But I don't see anyone referencing it in this issue, so I'm not sure what you'd back out.

Even saying the magic words got the likely culprit PR mentioned and the right person assigned... not sure I even needed to think about what to back out :)

bparees · 2016-10-26T14:08:17Z

@deads2k again, it needs to be discussed w/ the team, but i'm prepared to say

if we don't know what broke the test, the test gets disabled while the person who owns the test investigates
if we know what broke it, the pull that broke it gets reverted immediately. it never should have gone in (it should have failed testing due to the flakes it introduced) so why would we leave it in?
in a slightly less harsh world, i'd be amenable to an immediate fix being delivered in lieu of reverting, but if the fix is not known immediately, then reverting the PR seems like the right thing to do.

I think it's important to recognize the huge productivity cost flakes have on teams trying to test/merge code. Every time someone hits a flake they have to go investigate why things failed, look up the issue associated, and then wait for another round of test/merge processing. It's pretty costly, not to mention how discouraging/frustrating it is for developers. I don't really feel like our current flake triaging/prioritizing process is addressing that.

deads2k · 2016-10-26T14:13:47Z

if we don't know what broke the test, the test gets disabled while the person who owns the test investigates

This condition shouldn't be likely. Even if we don't run it all the time, a bisect to find the offender ought to be something the test infrastructure can answer in a couple hours. Spawn 10+ concurrent jobs for every likely commit and see which ones flake on the new problem. "I don't know" seems like an unreasonable thing to say. For things like this, if the test got disabled, I think its unlikely someone would have chased it down.

As I noted in my comment, 2 and 3 seem pretty reasonable to me, but simply "let's disable this test" doesn't seem like a good path forward to me.

liggitt · 2016-10-26T14:13:49Z

it's not an issue with the test, it's a bug:

- apiVersion: v1
  kind: ClusterResourceQuota
  metadata:
    creationTimestamp: 2016-10-26T14:12:57Z
    name: for-deads-by-annotation
    resourceVersion: "1965"
    selfLink: /oapi/v1/clusterresourcequotas/for-deads-by-annotation
    uid: 4b6272f8-9b86-11e6-aa70-acbc32c1ca87
  spec:
    quota:
      hard:
        secrets: "50"
    selector:
      annotations:
        openshift.io/requester: deads
      labels: null
  status:
    namespaces:
    - namespace: foo
      status:
        hard:
          secrets: "50"
        used:
          secrets: "9"
    - namespace: bar
      status:
        hard:
          secrets: "50"
        used:
          secrets: "9"
    total:
      hard:
        secrets: "50"
      used:
        secrets: "24"

deads2k · 2016-10-26T14:15:03Z

it's not a test break, it's a bug:

That's neat. I still don't think the right answer to comment out the test.

bparees · 2016-10-26T14:16:18Z

For things like this, if the test got disabled, I think its unlikely someone would have chased it down.

yeah that's a concern, obviously if we disable a test we need a tracking issue of high priority(p1/blocker) to ensure the test does get re-enabled at some point in the future.

liggitt · 2016-10-26T14:19:27Z

to reproduce, run this over and over until the last output shows the mismatch between the total and the sum of the individual namespaces:

oc delete project foo bar
oc delete clusterresourcequota --all
oc create clusterquota for-deads-by-annotation --project-annotation-selector=openshift.io/requester=deads --hard=secrets=50
oc new-project foo --as=deads
oc new-project bar --as=deads
oc get clusterresourcequota/for-deads-by-annotation -o yaml

liggitt · 2016-10-26T16:29:28Z

one possibility:

checkAttributes()/checkQuotas() assume they are free to modify the quotas returned from the accessor

if checkQuotas() fails to update a quota, it refetches from the accessor, and recurses into checkQuotas(), reapplying the request attributes to the returned quotas

the clusterresourcequota accessor does not return copies, but original fields from the cache

liggitt · 2016-10-26T21:56:47Z

clusterresourcequota status totals are getting out of sync with the sum of the namespaces. still debugging why. mustfix for 1.4

deads2k · 2016-10-27T15:49:07Z

found it in deep copies: #11621

csrwng added priority/P2 area/tests kind/test-flake Categorizes issue or PR as related to test flakes. labels Oct 25, 2016

csrwng assigned deads2k Oct 25, 2016

csrwng mentioned this issue Oct 25, 2016

Start-build/env and multitag jenkins plugin tests #11252

Merged

bparees mentioned this issue Oct 25, 2016

really re-enable jenkins autoprovisioning #11543

Merged

fabianofranz mentioned this issue Oct 25, 2016

UPSTREAM: 32722: warn on empty oc get output #10915

Merged

knobunc mentioned this issue Oct 25, 2016

Clarify how we handle compression as a router env #11485

Merged

bparees mentioned this issue Oct 26, 2016

Updates template and image stream metadata #11540

Merged

deads2k assigned mfojtik and unassigned deads2k Oct 26, 2016

bparees mentioned this issue Oct 26, 2016

fix broken sample pipeline job #11576

Merged

stevekuznetsov mentioned this issue Oct 26, 2016

Make extended test build optional in origin.spec #11491

Merged

bparees mentioned this issue Oct 26, 2016

include timestamps in extended test build logs #11564

Merged

eparis mentioned this issue Oct 26, 2016

sdn: fix single-tenant pod setup and leave docker0 around #11588

Merged

liggitt mentioned this issue Oct 26, 2016

Fix clusterresourcequota status deepcopy #11595

Merged

PI-Victor mentioned this issue Oct 26, 2016

bump(github.com/openshift/source-to-image): 2dffea37104471547b8653074… #11596

Merged

liggitt added kind/bug Categorizes issue or PR as related to a bug. priority/P1 and removed priority/P2 labels Oct 26, 2016

liggitt added this to the 1.4.0 milestone Oct 26, 2016

csrwng mentioned this issue Oct 27, 2016

Display warning instead of error if ports 80/443 in use #11600

Merged

coreydaley mentioned this issue Oct 27, 2016

Add additional warning for oc cluster up not being able to access port 8443 #11597

Merged

deads2k mentioned this issue Oct 27, 2016

[DO NOT MERGE] fix deep copy on ResourceQuotasStatusByNamespace #11621

Closed

gnufied mentioned this issue Oct 27, 2016

UPSTREAM: 30836: fix Dynamic provisioning for vSphere #11598

Merged

openshift-bot closed this as completed in #11595 Oct 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test-cmd flake: quota.sh:32: executing 'oc describe appliedclusterresourcequota/for-deads-by-annotation -n bar --as deads' #11560

test-cmd flake: quota.sh:32: executing 'oc describe appliedclusterresourcequota/for-deads-by-annotation -n bar --as deads' #11560

csrwng commented Oct 25, 2016

csrwng commented Oct 25, 2016

bparees commented Oct 25, 2016

bparees commented Oct 26, 2016

stevekuznetsov commented Oct 26, 2016

bparees commented Oct 26, 2016

deads2k commented Oct 26, 2016

stevekuznetsov commented Oct 26, 2016

bparees commented Oct 26, 2016

deads2k commented Oct 26, 2016

liggitt commented Oct 26, 2016 •

edited

Loading

deads2k commented Oct 26, 2016

bparees commented Oct 26, 2016

liggitt commented Oct 26, 2016 •

edited

Loading

liggitt commented Oct 26, 2016 •

edited

Loading

liggitt commented Oct 26, 2016

deads2k commented Oct 27, 2016

test-cmd flake: quota.sh:32: executing 'oc describe appliedclusterresourcequota/for-deads-by-annotation -n bar --as deads' #11560

test-cmd flake: quota.sh:32: executing 'oc describe appliedclusterresourcequota/for-deads-by-annotation -n bar --as deads' #11560

Comments

csrwng commented Oct 25, 2016

csrwng commented Oct 25, 2016

bparees commented Oct 25, 2016

bparees commented Oct 26, 2016

stevekuznetsov commented Oct 26, 2016

bparees commented Oct 26, 2016

deads2k commented Oct 26, 2016

stevekuznetsov commented Oct 26, 2016

bparees commented Oct 26, 2016

deads2k commented Oct 26, 2016

liggitt commented Oct 26, 2016 • edited Loading

deads2k commented Oct 26, 2016

bparees commented Oct 26, 2016

liggitt commented Oct 26, 2016 • edited Loading

liggitt commented Oct 26, 2016 • edited Loading

liggitt commented Oct 26, 2016

deads2k commented Oct 27, 2016

liggitt commented Oct 26, 2016 •

edited

Loading

liggitt commented Oct 26, 2016 •

edited

Loading

liggitt commented Oct 26, 2016 •

edited

Loading