-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a delay for transition of Ready condition from Unknown to False #6784
Comments
From a high-level this is not something that we will be able to guarantee in all cases. I would rather like to focus this discussion on the current race and flake at hand.
What is the 'Spec' when this occurs? Do you see this happen on updates or only on Service creates? We currently serialize changes to prevent this when using BYO Revision name since a Revision can be directly referenced in the same YAML it is created, but have not done so for normal Revisions as the name is randomized. Looking at the reconcilers I would suspect that this might happen on Service creation when using 'latestRevision', but I would not expect it to happen on an update if the previous Revision reached 'Ready=True'. Does this match what you are seeing? |
Thanks for coming back on this issue. In the meantime we increased our debugging in case of an client E2E error, and it turns out that the issue is probably something completely different: Its not so much of a race, but an optimistic locking error when updating the You find the full log here but the events which lead to a
This happens just when creating a service with |
I'm creating a new issue for what we found. Feel free to close this issue, if you think this kind of delay doesn't make sense. |
I carried over the issue to #6837 |
Issues go stale after 90 days of inactivity. Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra. /lifecycle stale |
/remove-lifecycle stale |
This issue is stale because it has been open for 90 days with no |
Describe the feature
Please reconcile to a
ConditionReady
toConditionFalse
only when it's clear that multi-reconcile action fails overall. The use case is when a user creates aService
then multiple other dependency resources are created in parallel, like aRoute
and aRevision
. However, the route can be only ready when the revision is ready, so there is the chance of a race. Currently, when the route can't find the referenced revision because of this race, the overallService
goes fromConditionUnknown
toConditionFalse
for the ready condition, but switches toConditionTrue
as soon as the revision is ready and the route is reconciled.This confuses clients who are waiting synchronously on a service creation (or update) and return immediately with "ok" for a transition
unknown -> true
, or an error forunknown -> false
. In the situation above this would falsely detect an error as the overall action very quickly reconciles toready == true
(but with the temporary false state).For the Knative client, this caused a 50% flake in the E2E tests which is solved now by introducing an error window to wait on an eventual
true
ready state.It would be very helpful also for other clients if exercising this kind of patience on the server-side, so that the first transition to
false
indicates the error of a combined reconciliation step.// cc: @dprotaso @evankanderson
The text was updated successfully, but these errors were encountered: