PreStop hook logic seems incorrect #4698

forestoden · 2021-07-27T19:29:50Z

Bug Report

What did you do?
When terminating ES Pod PreStop hook script pre-stop-hook-script.sh attempts to wait for the POD_IP to be removed from the list of Endpoints of the headless service

I replicated what the PreStop hook did in an ubuntu pod in the same namespace with the following:

while true; do getent hosts search-test-es-client  | grep $ES_POD_IP; sleep 0.5; done

When modifying PRE_STOP_MAX_WAIT_SECONDS and PRE_STOP_ADDITIONAL_WAIT_SECONDS you can observe that the IP remains in the Service until PRE_STOP_MAX_WAIT_SECONDS when the PreStop hook terminates unsuccessfully and Kubernetes continues with the Termination. This holds true even if you set PRE_STOP_MAX_WAIT_SECONDS to a substantial number. I went as high as 180s which is more than enough time.

What did you expect to see?
I would expect to see the POD_IP get removed from the output of getent hosts $HEADLESS_SERVICE_NAME | grep $POD_IP before the Pod is terminated.

What did you see instead? Under which circumstances?
It appears (although documentation on headless services is sparse) that headless services do not remove the IP from the list of Endpoint IPs until after the Pod is terminated, not when it enters Terminating status. What this results in is the PreStop hook will simply wait PRE_STOP_MAX_WAIT_SECONDS and then continue terminating, but the IP is still in service for some time afterwards. It seems like the logic here is not working as intended. I'm not really sure how this would work given this behavior of Headless Services

Environment

ECK version:

1.3.0
Kubernetes information:
GKE 1.19

for each of them please give us the version you are using

kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T20:58:09Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.10-gke.1600", GitCommit:"7b8e568a7fb4c9d199c2ba29a5f7d76f6b4341c2", GitTreeState:"clean", BuildDate:"2021-05-07T09:18:53Z", GoVersion:"go1.15.10b5", Compiler:"gc", Platform:"linux/amd64"}

Resource definition:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: search-test
spec:
  auth: {}
  http:
    service:
      metadata:
        creationTimestamp: null
      spec: {}
    tls:
      certificate: {}
      selfSignedCertificate:
        disabled: true
  nodeSets:
  - config:
      node.data: false
      node.ingest: false
      node.master: true
      node.store.allow_mmap: true
    count: 1
    name: master-nodes
    podTemplate:
      metadata:
      spec:
        containers:
        - name: elasticsearch
        initContainers:
        - Image: busybox:latest
          command:
          - sh
          - -c
          - |
            sysctl -w vm.max_map_count=262144
          name: sysctl
          securityContext:
            privileged: true
  - config:
      node.data: true
      node.ingest: false
      node.master: false
      node.store.allow_mmap: true
    count: 1
    name: data-nodes
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
        initContainers:
        - Image: busybox:latest
          command:
          - sh
          - -c
          - |
            sysctl -w vm.max_map_count=262144
          name: sysctl
          securityContext:
            privileged: true
  - config:
      node.data: false
      node.ingest: false
      node.master: false
      node.store.allow_mmap: true
    count: 2
    name: client
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
        initContainers:
        - Image: busybox:latest
          command:
          - sh
          - -c
          - |
            sysctl -w vm.max_map_count=262144
          name: sysctl
          securityContext:
            privileged: true
  transport:
    service:
      metadata:
        creationTimestamp: null
      spec: {}
  updateStrategy:
    changeBudget: {}
  version: 6.8.13

Logs:

insert operator logs or any relevant message to the issue here

The text was updated successfully, but these errors were encountered:

pebrc · 2021-08-23T18:44:52Z

I have a suspicion that this PreStop script is ineffective since the IPv6 PR where the following change was introduced, that keeps endpoints in the service even if the corresponding Pod is no longer ready (which is the case when terminating)

8cf990a#diff-f8806c3bf97eb22be2e2915e4fbe06d5a1e85f335e026cdc46a6e96f4d63da59R57-R58

I think this change was somewhat accidental as I was exploring switching to DNS based discovery for Elasticsearch an idea we later gave up on due to issues with DNS resolution and re-use of Pod IPs but the PublishNotReadyAddresses was merged nonetheless.

The simplest fix could therefore be to just revert that change.

pebrc · 2021-08-31T07:32:19Z

Things are bit more complicated unfortunately. We are relying on unready Pods being published for node discovery from clients (aka sniffing) #3182

If we were to go back to only publishing ready Pods in the headless service we would break that capability.

I am thinking we could have a simpler version of the pre-stop hook that just waits for PRE_STOP_MAX_WAIT_SECONDS before allowing termination of the Pod. I am not even sure that this pre-stop hook ever worked the way it was intended. All clients would typically use the $CLUSTER_NAME-es-http service which is a ClusterIP service i.e. not relying on DNS anyway but (typically) on iptables managed by kube-proxy fed by the endpoints API directly. So the only connections that were relying on the headless service should have been inter-node connections in Elasticsearch itself. However Elasticsearch has its own discovery mechanism that we feed with a list of IP addresses and not DNS names.

botelastic bot added the triage label Jul 27, 2021

david-kow added the >bug Something isn't working label Aug 5, 2021

botelastic bot removed the triage label Aug 5, 2021

pebrc added the v1.8.0 label Aug 30, 2021

pebrc self-assigned this Aug 30, 2021

pebrc mentioned this issue Aug 31, 2021

Remove non-functional pre-stop hook parts #4801

Merged

pebrc closed this as completed in #4801 Sep 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PreStop hook logic seems incorrect #4698

PreStop hook logic seems incorrect #4698

forestoden commented Jul 27, 2021

pebrc commented Aug 23, 2021 •

edited

Loading

pebrc commented Aug 31, 2021 •

edited

Loading

PreStop hook logic seems incorrect #4698

PreStop hook logic seems incorrect #4698

Comments

forestoden commented Jul 27, 2021

Bug Report

pebrc commented Aug 23, 2021 • edited Loading

pebrc commented Aug 31, 2021 • edited Loading

pebrc commented Aug 23, 2021 •

edited

Loading

pebrc commented Aug 31, 2021 •

edited

Loading