Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Wait" is overly used in e2e/conformance #1178

Closed
mattmoor opened this issue Jun 12, 2018 · 22 comments · Fixed by #11680
Closed

"Wait" is overly used in e2e/conformance #1178

mattmoor opened this issue Jun 12, 2018 · 22 comments · Fixed by #11680
Assignees
Labels
area/test-and-release It flags unit/e2e/conformance/perf test issues for product features good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/spec Discussion of how a feature should be exposed to customers. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Issues which should be fixed (post-triage)

Comments

@mattmoor
Copy link
Member

The conformance (and possibly e2e) tests have a number of checks that poll for success. This masks problems where we say we're Ready but aren't actually.

We should change the e2e test to use Wait methods a bit more sparingly and then concretely check the state of the system.

If this is too flaky, we can consider adding retries, but the idea of "Wait" feels wrong to me.

/assign @mattmoor

I'll see how flaky this is after we get it re-enabled with Wait in place.

@google-prow-robot google-prow-robot added area/test-and-release It flags unit/e2e/conformance/perf test issues for product features kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jun 12, 2018
@mattmoor mattmoor removed their assignment Jun 13, 2018
@mattmoor
Copy link
Member Author

Sent out a PoC (linked above).

google-prow-robot pushed a commit that referenced this issue Jun 15, 2018
Every check we perform shouldn't be a WaitFor; we should WaitFor readiness and then assert things work in-the-now.  This creates parallel methods for checking things, which has the same signature as their `WaitFor` siblings.

This uses these methods for the recently sunk checks in the routing conformance test.

Related: #1178
@mattmoor
Copy link
Member Author

It is notable that the small change I made in the PoC above caught a bug I almost introduced in #1322.
The Revision's status would have converged to what we expected after a resync, but because we were "Check"ing vs. "Wait"ing it blocked my PR. I'd like to see us tighten up more of this checking to hopefully squeeze out other issues we might be missing.

/kind API
/kind spec
/assign @jonjohnsonjr

@knative-housekeeping-robot

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

@knative-prow-robot knative-prow-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2019
@knative-housekeeping-robot

Stale issues rot after 30 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.
Rotten issues close after an additional 30 days of inactivity.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle rotten

@knative-prow-robot knative-prow-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 21, 2020
@knative-housekeeping-robot

Rotten issues close after 30 days of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

@knative-prow-robot
Copy link
Contributor

@knative-housekeeping-robot: Closing this issue.

In response to this:

Rotten issues close after 30 days of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vagababov
Copy link
Contributor

/reopen
Still need to fix this
/kind good-first-issue

@knative-prow-robot
Copy link
Contributor

@vagababov: Reopened this issue.

In response to this:

/reopen
Still need to fix this
/kind good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-housekeeping-robot

Rotten issues close after 30 days of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

@knative-prow-robot knative-prow-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 26, 2020
@mattmoor
Copy link
Member Author

/lifecycle frozen

@knative-prow-robot knative-prow-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jul 26, 2020
@Cynocracy
Copy link
Contributor

Is it enough to just scan for occurrences of Wait() and see if there is extra state checking that can be done?
Or just the TODOs you mentioned, maybe?

@mattmoor
Copy link
Member Author

This history of this is that "Ready" didn't used to mean "Ready", so we'd wait for Ready and then we'd wait for it to really be ready.

Things like this should be "Check" not "WaitFor" because we already waited for Ready, and that should mean something:

_, err := pkgTest.WaitForEndpointState(

I'd probably start with the TODOs, but in general WaitForEndpointState is probably another good thing to grep for.

@Cynocracy
Copy link
Contributor

Sounds good. thanks for the info! If I have spare cycles I think this sounds interesting to pick up,.

@markusthoemmes
Copy link
Contributor

FWIW this is related: #5573. Sadly we still haven't fixed that downstream.

@mattmoor
Copy link
Member Author

Here's a great example fwiw:

// Ready does not actually mean Ready for a Route just yet.
// See https://github.com/knative/serving/issues/1582

@dprotaso dprotaso added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. and removed kind/good-first-issue labels Jan 19, 2021
@evankanderson
Copy link
Member

This probably either needs a target list or some other explicit criteria on when it can be closed.

/assign @dprotaso

/triage needs-user-input

@knative-prow-robot knative-prow-robot added the triage/needs-user-input Issues which are waiting on a response from the reporter label Mar 22, 2021
@dprotaso
Copy link
Member

/unassign @dprotaso
/assign @shinigambit will take a look at this

@knative-prow-robot
Copy link
Contributor

@dprotaso: GitHub didn't allow me to assign the following users: at, this, will, take, a, look.

Note that only knative members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/unassign @dprotaso
/assign @shinigambit will take a look at this

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@evankanderson
Copy link
Member

/remove-triage needs-user-input
/triage accepted

@knative-prow-robot knative-prow-robot added triage/accepted Issues which should be fixed (post-triage) and removed triage/needs-user-input Issues which are waiting on a response from the reporter labels Jun 23, 2021
nak3 added a commit to nak3/serving that referenced this issue Sep 17, 2021
nak3 added a commit to nak3/serving that referenced this issue Sep 17, 2021
@dprotaso dprotaso removed this from the Ice Box milestone Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test-and-release It flags unit/e2e/conformance/perf test issues for product features good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/spec Discussion of how a feature should be exposed to customers. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

Successfully merging a pull request may close this issue.