Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't consider ANY pod that is not terminating #1957

Closed
wants to merge 1 commit into from

Conversation

aojea
Copy link
Member

@aojea aojea commented Feb 15, 2023

The neg syncer was doing some wrong assumptions:

  1. Endpoints that are not Ready can have Pods with deletionTimestamp not nil.
    The Endpoint and EndpointSlice controller will remove the Pod from the corresponding object once it gets deleted (DeletionTimestamp != nil)

  2. Any pod with deletionTimestamp == nil is valid, this is the most dangerous assumption, since a pod can be Unready and be used.

/kind bug
/assign @swetharepakula @bowei

The neg syncer was doing some wrong assumptions:

1. Endpoints that are not Ready can have Pods with deletionTimestamp
not nil.
The Endpoint and EndpointSlice controller will remove the Pod from
the corresponding object once it gets deleted (DeletionTimestamp != nil)

2. Any pod with deletionTimestamp == nil is valid, this is the most
dangerous assumption, since a pod can be Unready and be used.
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 15, 2023
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 15, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 15, 2023
// Only consider endpoints that are Ready.
// TODO: In order to consider Terminating endpoints, the AddressData type should add a new field to
// carry this information based on the EndpointSlices Conditions Terminating = true and Serving = true.
// Terminating endpoints should only be used with ExternalTrafficPolicy = Local and if there are any
Copy link
Member

@swetharepakula swetharepakula Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the L7 case we don't consider Local or Cluster as we always just program the pod endpoint. It also means there is no backup if the traffic is routed to that pod and the pod is removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to complete the existing one, I don't feel it fits too

Copy link
Member

@swetharepakula swetharepakula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one nit on the comment because I don't think it completely applies to this part of the code.

The only issue with this is that Endpoints do not have the terminating status on the endpoint itself, which is why this logic was added. I will be creating a PR to remove the entire endpoints support, which would remove this code as well. However until we do that, I do not think we should remove this due to correctness of the Endpoints path. I will try to have that PR ready by the end of the week, so we can merge this one in too.

@aojea
Copy link
Member Author

aojea commented Feb 16, 2023

The only issue with this is that Endpoints do not have the terminating status on the endpoint itself, which is why this logic was added.

this logic is wrong, or I'm reading it wrong, the terminating status means pod.deletionTimestsamp != nil, and this logic does not consider those pods, the existing logic may try to do that, but is not doing it

In addition, it doesn't check the phase of the pods, that means that can be adding notReady or failing pods

@aojea
Copy link
Member Author

aojea commented Feb 16, 2023

The only issue with this is that Endpoints do not have the terminating status on the endpoint itself, which is why this logic was added.

This is a new EndpointData that Alan added to consolidate slices and Endpoints, we can add Terminating status there if we need it, but besides that, this is not handling terminating ... but I'd love if you can double and triple check my assumptions

@aojea
Copy link
Member Author

aojea commented Feb 16, 2023

/hold

talked offline, @swetharepakula has to land different changes around this code and she'll fix this during that refactor

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 16, 2023
@swetharepakula
Copy link
Member

This has been added as part of #1973.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants