Slow re-election when elected master pod is deleted #63

andreykaipov · 2019-02-17T02:07:29Z

First of all - thank you guys for the chart!

I was playing around with the multi-node example and experienced some odd behavior. Here's how I'm reproducing the issue.

After the multi example is deployed, open up the multi-data service to your local in one terminal:

$ kubectl port-forward service/multi-data 9200

Watch the call to /_cat/master in another terminal:

$ watch -n1 time curl -s http://localhost:9200/_cat/master?v

In a third terminal, whoever the elected master is, delete them:

$ kubectl delete pod multi-master-0

The API call in the second terminal will now hang. After 30 seconds, the request will timeout and we might see the following error for a split second:

{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}

Soon after, the cluster recovers and the API call from the second window starts responding again. Here are the logs off another master node before and after the re-election:

[2019-02-17T01:49:03,736][INFO ][o.e.d.z.ZenDiscovery     ] [multi-master-1] master_left [{multi-master-0}{KZPjmKZtSf2LGV-IyvtfOg}{Es2wrbyiQdWz5CnT4V5wkA}{10.40.1.14}{10.40.1.14:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [shut_down]
[2019-02-17T01:49:03,736][WARN ][o.e.d.z.ZenDiscovery     ] [multi-master-1] master left (reason = shut_down), current nodes: nodes:
   {multi-master-0}{KZPjmKZtSf2LGV-IyvtfOg}{Es2wrbyiQdWz5CnT4V5wkA}{10.40.1.14}{10.40.1.14:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, master
   {multi-data-0}{Ndz2WGGiSz6Y1XO1tWyWRw}{nPoZagPdRr2Tq45BPAkh_g}{10.40.2.13}{10.40.2.13:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
   {multi-data-2}{MmkHmP6XRTibriDLazv_1A}{iS1mfA0JQ-yURqZ4-ng2zQ}{10.40.1.8}{10.40.1.8:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
   {multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
   {multi-master-1}{uetiuhetRbasFRNLqI6ixg}{AwsDmF2aTZS1JesCE0HZ0A}{10.40.2.15}{10.40.2.15:9300}{ml.machine_memory=2147483648, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, local
   {multi-data-1}{Uh9aucMmS6uLRBeG5559_w}{tlxqIDoZRjCLMfMWxU0IyQ}{10.40.0.11}{10.40.0.11:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}

[2019-02-17T01:49:03,830][WARN ][o.e.t.TcpTransport       ] [multi-master-1] send message failed [channel: Netty4TcpChannel{localAddress=0.0.0.0/0.0.0.0:9300, remoteAddress=/10.40.1.14:36958}]
java.nio.channels.ClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2019-02-17T01:49:07,107][INFO ][o.e.c.s.ClusterApplierService] [multi-master-1] detected_master {multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, reason: apply cluster state (from master [master {multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [95]])
[2019-02-17T01:49:34,832][WARN ][o.e.c.NodeConnectionsService] [multi-master-1] failed to connect to node {multi-master-0}{KZPjmKZtSf2LGV-IyvtfOg}{Es2wrbyiQdWz5CnT4V5wkA}{10.40.1.14}{10.40.1.14:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [multi-master-0][10.40.1.14:9300] connect_timeout[30s]
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:1576) ~[elasticsearch-6.6.0.jar:6.6.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:660) ~[elasticsearch-6.6.0.jar:6.6.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
[2019-02-17T01:49:37,200][WARN ][o.e.c.s.ClusterApplierService] [multi-master-1] cluster state applier task [apply cluster state (from master [master {multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [95]])] took [30s] above the warn threshold of 30s
[2019-02-17T01:49:40,315][INFO ][o.e.c.s.ClusterApplierService] [multi-master-1] removed {{multi-master-0}{KZPjmKZtSf2LGV-IyvtfOg}{Es2wrbyiQdWz5CnT4V5wkA}{10.40.1.14}{10.40.1.14:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [96]])
[2019-02-17T01:49:55,945][WARN ][o.e.t.TransportService   ] [multi-master-1] Received response for a request that has timed out, sent [47893ms] ago, timed out [17870ms] ago, action [internal:discovery/zen/fd/master_ping], node [{multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [579]
[2019-02-17T01:50:03,784][INFO ][o.e.c.s.ClusterApplierService] [multi-master-1] added {{multi-master-0}{KZPjmKZtSf2LGV-IyvtfOg}{LqkyAGeTSUW2mtNmt49HIA}{10.40.1.15}{10.40.1.15:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {multi-master-2}{0PQvO9UhT--kvv2mEuJyBg}{WVb0wCALRP6KIt2oHNTmcQ}{10.40.0.20}{10.40.0.20:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [98]])

I figured Kubernetes might be killing the pods too abruptly, so I followed the instructions at https://www.elastic.co/guide/en/elasticsearch/reference/6.6/stopping-elasticsearch.html to stop Elasticsearch. Sure enough, if we kill the process from the elected master pod directly, the re-election will be quick!

Assuming mutli-master-2 is the new master:

$ kubectl exec multi-master-2 -- kill -SIGTERM 1

Notice how the API call from the second terminal only hangs for around 3 seconds this time!

Reading through the docs for the termination of pods (https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods), Kubernetes does in fact send a SIGTERM to the container, so I'm guessing deleting a pod does something more than just send a SIGTERM that Elasticsearch doesn't like.

The text was updated successfully, but these errors were encountered:

andreykaipov · 2019-02-18T05:54:05Z

So it doesn't look like I'm alone! I found issue helm/charts#8785 that looks identical to this issue. For that issue, PR helm/charts#10687 was proposed and also submitted to this repo in PR #41. Unfortunately, I didn't have success when deploying the multi-node example with those PRs.

Here's a summary of the problem.

kubectl delete pod multi-master-0 sends a SIGTERM to the pod's container
Elasticsearch shuts down gracefully, broadcasting that info to the other masters, and the pod is removed
Re-election starts, and the other eligible masters try to ping the old master for confirmation (at least I'm guessing that's the motivation behind the ping?). However, the old master's pod is already gone and the associated IP address doesn't exist anymore. So, rather than getting a refused connection, the eligible masters just outright cannot connect. The timeout for this attempt is 30 seconds, and once it timeouts, re-election continues.

Whereas issue helm/charts#8785 talks about a total cluster outage where even reads are not possible, I'm thankfully not seeing that. For example, calls to /_search?q=* still work fine while the eligible masters are stuck in step (3), so that's good. The outage I'm experiencing is for writes and the /_cat APIs, but maybe there's more unresponsive endpoints?

The easiest solution to the connection timeout would be to just keep the pod running for a while after shutting down Elasticsearch. I tried to do this with a preStop lifecycle handler by manually sending a SIGTERM to the Java process, and then just sleeping. However, since the container automatically dies when its main process (the entrypoint) dies, the sleep afterwards never happens.

So, how can we actually run code after Elasticsearch stops? We can modify the container's entrypoint to send Elasticsearch to the background, trap the SIGTERM upon pod termination, forward that to Elasticsearch, and then sleep!

For example, inside of the container spec:

containers:
- name: elasticsearch-master
  image: docker.elastic.co/elasticsearch/elasticsearch:6.6.0
  command: ["bash", "-c"]
  args: ["trap 'pkill -TERM java; sleep 5' TERM; docker-entrypoint.sh eswrapper & wait"]

Deploying our master nodes like this allows for the outer pod to sit in a terminating state for 5 seconds after Elasticsearch stops, so that other masters can properly get a refused connection rather than timing out. As a result, writes and the /_cat APIs hang for no more than the expected re-election time of 3-4 seconds. Awesome! ...Except that entrypoint is hideous!

An alternative approach is to deploy a dummy sidecar container alongside the master nodes that just waits indefinitely. To keep the pod alive for a bit after Elasticsearch stops, we add a preStop handler that just sleeps, and then terminates the indefinitely waiting process. Killing the process manually this way is important since Kubernetes will send the SIGTERM to the entrypoint running as PID 1 (but SIGTERMs to PID 1 processes are ignored).

containers:
- name: just-chillin
  image: alpine
  command: ["sh", "-c"]
  args: ["tail -f /dev/null & wait"]
  lifecycle:
    preStop:
      exec:
        command: ["sh", "-c", "sleep 5; pkill -TERM tail"]
- name: elasticsearch-master
  image: docker.elastic.co/elasticsearch/elasticsearch:6.6.0
  ...

This allows for the Elasticsearch containers to terminate before the sidecar container can, as documented in the pod termination flow. For fun, we can also avoid the lifecycle hook altogether and trap the SIGTERM like we did above:

containers:
- name: just-chillin
  image: alpine
  command: ["sh", "-c"]
  args: ["trap f TERM; f(){ pkill -TERM tail; sleep 5; }; tail -f /dev/null & wait"]
- name: elasticsearch-master
  image: docker.elastic.co/elasticsearch/elasticsearch:6.6.0
  ...

Note that in this case, the order of the sleep and the pkill isn't as important as in the workaround before it.

Apart from these workarounds, another solution might be the solution to the currently open issue kubernetes/kubernetes#28969. If I'm understanding the sticky IPs proposal correctly, pods in a StatefulSet would remain the same for each replica. So in our case, the eligible masters should ideally have no connection timeouts to the old master (since the old master pod will come back with the same IP address).

jordansissel · 2019-02-18T07:29:50Z

I haven't tried to reproduce, but it's possible that lowering discovery.zen.fd.ping_timeout (default is 30s) could help with this.

If not, we should look into what's causing this problem and pair up with the Elasticsearch team to find a solution.

I spent a very short time googling for similar issues, and found this forum post where docker/docker-compose appears to be having similar behaviors where the network interface is destroyed and the next pings wait for 30 seconds.

Crazybus · 2019-02-18T12:39:47Z

Firstly, thank you so much for writing up such a detailed thorough issue!

The fact that things work properly when you directly kill it is the most interesting part here. It is very possible that kubectl delete pod does not respect the terminationGracePeriod that is set by the helm-chart.

Couple of questions:

Can you reproduce this same issue when doing a rolling upgrade? As in letting Kubernetes restart the containers instead of a kubectl delete pod? Just setting a dummy environment variable would be a good way to trigger this for testing.
How long does it take for kubectl exec multi-master-2 -- kill -SIGTERM 1 to exit? The default terminationGracePeriod has been overridden to be 120 seconds in the helm chart. Running the kill directly from inside the container would not be subject to this timeout. So if it is taking longer to kill then that could help explain why it is more graceful.

andreykaipov · 2019-02-18T19:00:05Z

@Crazybus Rolling updates seem to have the same issue. After setting an extra env var and deploying, my requests still hang when the rolling update gets to the elected master pod.

Killing the process directly takes less than a second. After doing this, kubectl get pods will show the targeted pod was restarted once, which is why there's no outage (because the pod was never deleted and other eligible masters were still able to ping it).

I don't think the terminationGracePeriod would affect anything. As I understand it, upon pod deletion, if the processes in a container don't gracefully terminate before the grace period, then Kubernetes will just send a SIGKILL to them and delete the pod. In this case, Elasticsearch does terminate gracefully, but the problem is with other nodes trying to ping the old master's now nonexistent IP address.

I think the forum post @jordansissel linked is identical to this issue. For convenience, here's David's response:

docker stop is too aggressive here - it tears down the node and its network interface before a new node is elected as master. The new node then tries to contact the old master, but because the old master has completely vanished the connection attempt gets no response until it times out.

This is much more common within Docker than without: outside Docker the old master actively and immediately rejects the new master's connection attempt because its network interface is typically still alive.

In both environments, the old master's network is gone before re-election can finish.

I tried lowering discovery.zen.fd.ping_timeout and transport.tcp.connect_timeout in different combinations, and it seemed to help a bit, but not reliably. In the best case, elected master pod deletion only caused an outage of around 15 seconds, but a 30 second outage still happened often.

Looking carefully through the previous Helm chart issue again, I found @kimxogus linked to an issue he opened elastic/elasticsearch#36822 that has the extra suggestion of tweaking the net.ipv4.tcp_retries2 kernel parameter. Lowering this value through privileged init containers on GKE v1.11.7 nodes didn't seem to make a difference for me though. That issue was closed in favor of issue elastic/elasticsearch#29025, which may be a more permanent solution in Elasticsearch itself, by not reconnecting to the old unresponsive master during the re-election.

So far the only reliable workaround for me seems to be keeping the pod alive for a few extra seconds after Elasticsearch terminates so the other eligible masters can send their final ping.

Crazybus · 2019-02-19T10:44:06Z

@andreykaipov thanks again for all of the investigation and information you are adding! This is super useful and I feel like I know understand what is going on.

From an Elasticsearch point of view things are actually working as expected. The master disappears, and it waits for the 30 second ping timeout. Elasticsearch is configured by default to wait 30 seconds for a ping to timeout, which is different compared to be able to connect to the host but it not responding.

This issue is not something unique to this helm-chart or to Kubernetes, but will actually apply to any immutable infrastructure setup where the active master is deleted (or at least its IP address) directly after stopping it. Even if a workaround is put in place there is still going to be around 3 seconds of write downtime while a new master is being elected.

The ideal "perfect world" fixes to this problem:

SIGTERM to an active master waits for a new master to be re-elected before the Elasticsearch process exits
Adding a master step down API so that a preStop hook could wait for the master to step down before stopping it.

I'm going to sync with the Elasticsearch team to see how feasible they would be. I'm also making a note to test this in Elasticsearch 7 because it now uses a different discovery method which may or may not be affected by this.

Out of all of the workarounds you suggested I think the easiest to maintain is going to be having a dummy side car container. Instead of doing a 5 second sleep it could actually wait for the pod to no longer be the master when shutting down. Or even better it could wait until a new master has been elected before allowing the pod to be deleted.

Note: None of the below is tested, just an idea of how to solve this without relying on a hardcoded sleep time.

containers:
- name: just-chillin
  lifecycle:
    preStop:
      exec:
        command:
          - sh
          - -c
          - |
            #!/usr/bin/env bash
            set -eo pipefail

            http () {
                local path="${1}"
                if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
                  BASIC_AUTH="-u ${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
                else
                  BASIC_AUTH=''
                fi
                curl -XGET -s -k --fail ${BASIC_AUTH} {{ .Values.protocol }}://{{ .Values.masterService }}:{{ .Values.httpPort }}${path}
            }

            until http "/_cat/master" | grep -v $(hostname) ; do
              echo "This node is still master, waiting gracefully for it to step down"
            done
- name: elasticsearch-master
  image: docker.elastic.co/elasticsearch/elasticsearch:6.6.0

DaveCTurner · 2019-03-03T09:46:00Z

I'm going to sync with the Elasticsearch team to see how feasible they would be. I'm also making a note to test this in Elasticsearch 7 because it now uses a different discovery method which may or may not be affected by this.

It is still affected by this. The problem is caused by the new master attempting to get all the nodes in the cluster to reconnect to the old master as part of the process of winning its first election, and waiting for this to time out before proceeding is the problem described in elastic/elasticsearch#29025. There's a related discussion here. This is not really affected by the changes to how discovery works in 7.0.

In 6.x the best solution we have is to reduce the connection timeout. If your cluster is not talking to remote systems then the connect timeout can reasonably be very short (e.g. 1s). Although reducing net.ipv4.tcp_retries2 will not directly help here, it's also a good idea. Reducing discovery.zen.fd.ping_timeout is not a great idea because this makes the cluster more sensitive to a long GC, and if the cluster is under memory pressure then removing some of its nodes could well be counterproductive, whereas a lower net.ipv4.tcp_retries2 value allows us to be more sensitive to a network outage without also being sensitive of GC pauses.

In 7.x the same advice is basically true (at time of writing) but there is another option too: if you want to shut the master down then you can trigger an election first by excluding the current master from the voting configuration.

DaveWHarvey · 2019-03-08T22:24:12Z

Here is what worked for me, based on the above suggestion. I mounted a script in the container that I run instead of the docker entrypoint which below. It still takes 30s to timeout the old master, but the new master seemed to be operational in 4 seconds of the old master shutdown.
Note: I had already made another fix that seems conceptually necessary to add a pre-shutdown hook on the master to delay the master termination a bit if there is not a quorum + 1 of master nodes. On a k8s rolling upgrade, a restarted master node need to be "ready" when it has opened port 9200, i.e. before it has been added to the cluster, and that allows the rolling upgrade to terminate the existing master before the new master eligible node has fully joined, and master election might not have a quorum.

if [[ -z $NODE_MASTER || "$NODE_MASTER" = "true" ]] ; then

# Run ES as a background task, and forward SIGTERM to it, then wait for it to exit
trap 'kill $(jobs -p)' SIGTERM

/usr/local/bin/docker-entrypoint.sh elasticsearch &

wait

# now keep the pod alive for 30s after ES dies so that we will refuse connections from
# the new master rather than them needing to time  out
sleep 30

else

exec /usr/local/bin/docker-entrypoint.sh elasticsearch

fi

andreykaipov · 2019-03-13T03:57:29Z

I really appreciate all the attention this issue has received!

The workaround I decided to go with was wrapping the base Elasticsearch image with an entrypoint that traps some common exit signals, and allows the execution of a handler after Elasticsearch stops (see https://github.com/qoqodev/elasticsearch-hooks-docker). In this case, the "post-stop" handler would involve just sleeping, or waiting until a new master has been elected.

However, it looks like @DaveCTurner recently closed out the upstream issue elastic/elasticsearch#29025 with PR elastic/elasticsearch#31547 that should fix the slow re-election, so no workarounds should be necessary! Whenever those changes make it into an Elasticsearch release, whether it's 6.x or 7.x, I'll be glad to test it out! 😄

DaveCTurner · 2019-03-13T09:13:07Z

Indeed, we've had a few failed attempts to fix elastic/elasticsearch#29025 and this very thread prompted us to look at it again. The fix is elastic/elasticsearch#39629 which has been backported to the 7.x branch. This means that it won't be in 7.0 or earlier versions but it is expected to be in 7.1.

desaintmartin · 2019-04-26T05:52:27Z

I think this is the same problem as helm/charts#8785.

elastic#63 (comment)

Crazybus · 2019-05-03T13:16:12Z

This has been merged into master but not yet released. I'm leaving this until it is released and that others have also confirmed that this solution resolves the issue properly.

Crazybus · 2019-05-27T14:51:06Z

This has been merged and released. Thanks everyone for all of the help investigating and with contributing the fix!

Closes: #191 This fix which was added for #63 is no longer needed in Elasticsearch > 7.2.0 as this has been fixed upstream.

This commit removes the `masterTerminationFix` side-car container introduced in elastic#63 to fix slow elections issues when master node is deleted. This workaround is no more needed since Elasticsearch 7.2.

* [elasticsearch] fix values table formatting * [elasticsearch] remove masterTerminationFix This commit removes the `masterTerminationFix` side-car container introduced in #63 to fix slow elections issues when master node is deleted. This workaround is no more needed since Elasticsearch 7.2.

* [elasticsearch] fix values table formatting * [elasticsearch] remove masterTerminationFix This commit removes the `masterTerminationFix` side-car container introduced in elastic#63 to fix slow elections issues when master node is deleted. This workaround is no more needed since Elasticsearch 7.2.

* [elasticsearch] fix values table formatting * [elasticsearch] remove masterTerminationFix This commit removes the `masterTerminationFix` side-car container introduced in #63 to fix slow elections issues when master node is deleted. This workaround is no more needed since Elasticsearch 7.2.

* [elasticsearch] fix values table formatting * [elasticsearch] remove masterTerminationFix This commit removes the `masterTerminationFix` side-car container introduced in elastic#63 to fix slow elections issues when master node is deleted. This workaround is no more needed since Elasticsearch 7.2.

elastic/helm-charts#63 (comment)

kimxogus mentioned this issue Mar 19, 2019

custom command for elasticsearch master nodes #82

Closed

kimxogus mentioned this issue Apr 26, 2019

[elasticsearch] add master pre stop hook using voting_config_exclusion #108

Closed

kimxogus added a commit to kimxogus/elastic-charts that referenced this issue May 2, 2019

rewrite without voting_config_exclusion

e29d2db

elastic#63 (comment)

kimxogus added a commit to kimxogus/elastic-charts that referenced this issue May 2, 2019

rewrite without voting_config_exclusion

631ce4c

elastic#63 (comment)

kimxogus mentioned this issue May 2, 2019

graceful termination handler for master nodes #119

Merged

4 tasks

Crazybus closed this as completed in #119 May 3, 2019

Crazybus reopened this May 3, 2019

DaveCTurner mentioned this issue May 7, 2019

Investigating TestMutationMdiToDedicated failure elastic/cloud-on-k8s#614

Closed

Crazybus closed this as completed May 27, 2019

tetianakravchenko mentioned this issue Jun 28, 2019

[elasticsearch] Make master sidecar container optional #191

Closed

Crazybus added a commit that referenced this issue Jul 1, 2019

[elasticsearch] Disable masterTerminationFix by default

b786a50

Closes: #191 This fix which was added for #63 is no longer needed in Elasticsearch > 7.2.0 as this has been fixed upstream.

rewt mentioned this issue Aug 12, 2019

[elasticsearch] master pods not deploying #253

Closed

izakp mentioned this issue Aug 20, 2019

add elasticsearch wrapper to delay shutdown artsy/docker-images#48

Merged

fatmcgav mentioned this issue May 19, 2020

[elasticsearch] New readiness probes causing full cluster-restart #631

Closed

jmlrt mentioned this issue May 12, 2021

[elasticsearch] remove masterTerminationFix #1183

Merged

DaveCTurner mentioned this issue May 27, 2021

[6.8] [elasticsearch] remove masterTerminationFix (#1183) #1213

Closed

galina-tochilkin pushed a commit to mtp-devops/3d-party-helm that referenced this issue Dec 20, 2022

rewrite without voting_config_exclusion

188438b

elastic/helm-charts#63 (comment)

QWQyyy mentioned this issue May 9, 2023

How to correctly configure the concurrency of a single function? apache/openwhisk#5410

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow re-election when elected master pod is deleted #63

Slow re-election when elected master pod is deleted #63

andreykaipov commented Feb 17, 2019 •

edited

Loading

andreykaipov commented Feb 18, 2019 •

edited

Loading

jordansissel commented Feb 18, 2019

Crazybus commented Feb 18, 2019

andreykaipov commented Feb 18, 2019

Crazybus commented Feb 19, 2019

DaveCTurner commented Mar 3, 2019 •

edited

Loading

DaveWHarvey commented Mar 8, 2019 •

edited

Loading

andreykaipov commented Mar 13, 2019

DaveCTurner commented Mar 13, 2019

desaintmartin commented Apr 26, 2019

Crazybus commented May 3, 2019

Crazybus commented May 27, 2019

Slow re-election when elected master pod is deleted #63

Slow re-election when elected master pod is deleted #63

Comments

andreykaipov commented Feb 17, 2019 • edited Loading

andreykaipov commented Feb 18, 2019 • edited Loading

jordansissel commented Feb 18, 2019

Crazybus commented Feb 18, 2019

andreykaipov commented Feb 18, 2019

Crazybus commented Feb 19, 2019

DaveCTurner commented Mar 3, 2019 • edited Loading

DaveWHarvey commented Mar 8, 2019 • edited Loading

andreykaipov commented Mar 13, 2019

DaveCTurner commented Mar 13, 2019

desaintmartin commented Apr 26, 2019

Crazybus commented May 3, 2019

Crazybus commented May 27, 2019

andreykaipov commented Feb 17, 2019 •

edited

Loading

andreykaipov commented Feb 18, 2019 •

edited

Loading

DaveCTurner commented Mar 3, 2019 •

edited

Loading

DaveWHarvey commented Mar 8, 2019 •

edited

Loading