Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brief 404 errors after creating a route. #3312

Closed
markusthoemmes opened this issue Feb 25, 2019 · 19 comments · Fixed by #4734 or #4976
Closed

Brief 404 errors after creating a route. #3312

markusthoemmes opened this issue Feb 25, 2019 · 19 comments · Fixed by #4734 or #4976
Assignees
Labels
area/API API objects and controllers area/networking kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@markusthoemmes
Copy link
Contributor

markusthoemmes commented Feb 25, 2019

In what area(s)?

/area API
/area networking

What version of Knative?

HEAD

Expected Behavior

When a route/kservice is Ready, I expect it to be routable immediately without issues.

Actual Behavior

There is a brief duration of 404s before the networking state is propagated to the routers and is consistent.

Steps to Reproduce the Problem

A reproducer is supposed to be produced by #3287

@markusthoemmes markusthoemmes added the kind/bug Categorizes issue or PR as related to a bug. label Feb 25, 2019
@knative-prow-robot knative-prow-robot added area/API API objects and controllers area/networking labels Feb 25, 2019
@wtam2018
Copy link
Contributor

wtam2018 commented Mar 7, 2019

I am interested in working on this issue.
BTW, how do I become a Member so I can assign issue to myself?

@markusthoemmes markusthoemmes changed the title Brief 404/503 errors after creating a route. Brief 404 errors after creating a route. Mar 7, 2019
@vagababov
Copy link
Contributor

There are also:

  • 503s (no healthy upstream)
  • 500s (Error getting active endpoint: timeout waiting for the revision to become ready)

But those are supposed to be fixed with the revision controlled handoff (hopefully), since we blame them on network programming currently.

@markusthoemmes
Copy link
Contributor Author

I took these sources out specifically because there are other issues covering them.

@vagababov The handoff will only solve issues around scale-to-zero, no? This issue is about issues right after creating new routes not necessarily after reprogramming

@vagababov
Copy link
Contributor

Here during creation you mean the point in time after the route is "ready" but istio still is not?

@markusthoemmes
Copy link
Contributor Author

@vagababov not necessarily. You're describing a wider range of issues. This one is potentially narrower in scope and thus could potentially need a different (maybe simpler?) fix?

@tanzeeb
Copy link
Contributor

tanzeeb commented May 9, 2019

Hey @wtam2018, are you working on this issue? If not, I wouldn't mind taking a look.

@wtam2018
Copy link
Contributor

I think @markusthoemmes was working on it.

@markusthoemmes
Copy link
Contributor Author

I'm not. @tanzeeb I think you can go ahead.

@mdemirhan
Copy link
Contributor

@tanzeeb will you take this one?

@mattmoor
Copy link
Member

@wtam2018 @tanzeeb Do you have interest in pursuing this either in 0.7 or 0.8?

I'm still not sure I fully understand what the problem is here?

@tcnghia
Copy link
Contributor

tcnghia commented Jun 12, 2019

I also reported #1582 to find a workaround for missing Status in VirtualService.

@markusthoemmes
Copy link
Contributor Author

@mattmoor The "problem" is that a Ready route is not actually reachable necessarily because of inconsistencies. The goal would be for us to ditch the retries on 404s in our tests as well.

@mattmoor
Copy link
Member

Right, so the main source of this that I'm aware of is lack of status from Istio. We've talked about a prober for this, but my interpretation of an earlier comment was that you thought there was more than just this? Just trying to figure that out.

@markusthoemmes
Copy link
Contributor Author

Nope, you're spot on. I think this is only network programming. There was a discussion above that conflated this with 503s etc. Some of the 503s might have been from the same source, but I wanted to view both issues in isolation as far as possible.

@mattmoor
Copy link
Member

Ok, I think @tcnghia had some trick in mind for how to probe this. I'll bug him to write it up, unless folks have their own ideas. Let's try to settle on a plan as 0.7 wraps up, and maybe we can nail this in 0.8 (if it has an owner).

@mattmoor
Copy link
Member

Looks like @tcnghia assigned @JRBANCEL yesterday, so maybe JR has bandwidth. Can one of you write up a plan and run through it at next week's #networking WG?

@tcnghia tcnghia removed this from the Needs Triage milestone Jun 22, 2019
@tcnghia tcnghia added this to the Serving 0.8 milestone Jun 22, 2019
@tcnghia
Copy link
Contributor

tcnghia commented Jun 22, 2019

plan shared and reviewed in 6/20 networking wg meet up
https://docs.google.com/document/d/1mXDrRhVOf48qRR7-4fZMTkMHKoOGZJtrRGavGloVjGs/edit#

@JRBANCEL
Copy link
Contributor

/reopen
While this is fixed by #4734, I still need to remove the 404 retrying logic in the tests before marking this as fully fixed.

@knative-prow-robot
Copy link
Contributor

@JRBANCEL: Reopened this issue.

In response to this:

/reopen
While this is fixed by #4734, I still need to remove the 404 retrying logic in the tests before marking this as fully fixed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment