etcd timeouts and election problems #915

WillPlatnick · 2014-08-01T00:48:54Z

Hello,
I am new to etcd, built a cluster last night using 0.4.6 running on Debian Wheezy, and I am experiencing timeouts. The cluster has 3 active members, and 12 standbys.

I am really wondering why I am having timeouts. These are baremetal servers on the same gigabit LAN. At the same time I was running these tests, I was running ping tests and there was never any increase in latency. There's 12+ gigs free on each of these servers, and according to graphs, CPU on the box has not gone above 15%. Network pipe is nowhere near saturated. These boxes are active application servers, so we would know through tons of different metrics we track if there were issues with our network connection.

I've read the tuning guide, and for a gigabit LAN, it seems we really shouldn't have to do any tuning at all. Average latency (as determined by ICMP) is 50 microseconds.

Where do I go from here? What information would be most helpful?

[etcd] Jul 31 15:15:29.411 INFO      | app03: warning: heartbeat near election timeout: 168.757749ms
[etcd] Jul 31 15:17:29.570 INFO      | app03: warning: heartbeat near election timeout: 161.028528ms
[etcd] Jul 31 15:19:29.720 INFO      | app03: warning: heartbeat near election timeout: 160.664617ms
[etcd] Jul 31 15:23:30.030 INFO      | app03: warning: heartbeat near election timeout: 169.964666ms
[etcd] Jul 31 15:27:30.336 INFO      | app03: warning: heartbeat near election timeout: 176.612334ms
[etcd] Jul 31 15:29:30.493 INFO      | app03: warning: heartbeat near election timeout: 183.208334ms
[etcd] Jul 31 15:31:30.653 INFO      | app03: warning: heartbeat near election timeout: 193.302576ms
[etcd] Jul 31 15:33:30.815 INFO      | app03: warning: heartbeat near election timeout: 205.103987ms
[etcd] Jul 31 15:35:30.971 INFO      | app03: warning: heartbeat near election timeout: 160.976343ms
[etcd] Jul 31 15:39:31.276 INFO      | app03: warning: heartbeat near election timeout: 166.767389ms
[etcd] Jul 31 15:41:31.430 INFO      | app03: warning: heartbeat near election timeout: 169.963436ms
[etcd] Jul 31 15:43:31.588 INFO      | app03: warning: heartbeat near election timeout: 178.110018ms
[etcd] Jul 31 15:45:31.735 INFO      | app03: warning: heartbeat near election timeout: 174.857319ms
[etcd] Jul 31 15:49:32.052 INFO      | app03: warning: heartbeat near election timeout: 192.572154ms
[etcd] Jul 31 15:51:32.219 INFO      | app03: warning: heartbeat near election timeout: 208.909646ms
[etcd] Jul 31 15:55:32.538 INFO      | app03: warning: heartbeat near election timeout: 178.398576ms
[etcd] Jul 31 15:57:32.707 INFO      | app03: warning: heartbeat near election timeout: 197.037774ms
[etcd] Jul 31 15:59:32.860 INFO      | app03: state changed from 'follower' to 'candidate'.
[etcd] Jul 31 15:59:32.861 INFO      | app03: leader changed from 'app01' to ''.
[etcd] Jul 31 15:59:32.862 INFO      | app03: state changed from 'candidate' to 'leader'.
[etcd] Jul 31 15:59:32.862 INFO      | app03: leader changed from '' to 'app03'.

Current etcd usage from ps

root      8522  0.7  0.2  99936 89168 ?        Sl   Jul30  10:23 /usr/bin/etcd -config /etc/etcd/etcd.conf

We're also occasionally (a few per hour, more in the middle of the afternoon today, which is one of our slow times) seeing missed heartbeats. These started immediately after starting the cluster.

[etcd] Jul 31 19:27:49.027 INFO      | app01: warning: heartbeat time out peer="app02" missed=1 backoff="2s"
[etcd] Jul 31 19:44:34.067 INFO      | app01: warning: heartbeat time out peer="app02" missed=1 backoff="2s"

The text was updated successfully, but these errors were encountered:

WillPlatnick · 2014-08-01T00:59:13Z

While writing this, we suffered another failover.

Here's the log from the server that was the master

[etcd] Jul 31 19:14:34.516 INFO      | app01: state changed from 'follower' to 'candidate'.
[etcd] Jul 31 19:14:34.516 INFO      | app01: leader changed from 'app03' to ''.
[etcd] Jul 31 19:14:34.517 INFO      | app01: state changed from 'candidate' to 'leader'.
[etcd] Jul 31 19:14:34.517 INFO      | app01: leader changed from '' to 'app01'.
[etcd] Jul 31 19:14:34.567 INFO      | app01: warning: heartbeat time out peer="app03" missed=1 backoff="2s"
[etcd] Jul 31 19:27:49.027 INFO      | app01: warning: heartbeat time out peer="app02" missed=1 backoff="2s"
[etcd] Jul 31 19:44:34.067 INFO      | app01: warning: heartbeat time out peer="app02" missed=1 backoff="2s"
[etcd] Jul 31 19:53:51.331 INFO      | app01: state changed from 'leader' to 'follower'.
[etcd] Jul 31 19:53:51.331 INFO      | app01: term #7 started.
[etcd] Jul 31 19:53:51.331 INFO      | app01: leader changed from 'app01' to ''.

Here is the log from the server that took over as master

[etcd] Jul 31 19:14:34.516 INFO      | app02: leader changed from 'app03' to ''.
[etcd] Jul 31 19:15:47.974 INFO      | app02: warning: heartbeat near election timeout: 206.743366ms
[etcd] Jul 31 19:17:48.151 INFO      | app02: warning: heartbeat near election timeout: 183.463733ms
[etcd] Jul 31 19:19:48.326 INFO      | app02: warning: heartbeat near election timeout: 208.428745ms
[etcd] Jul 31 19:21:48.504 INFO      | app02: warning: heartbeat near election timeout: 186.814225ms
[etcd] Jul 31 19:23:48.681 INFO      | app02: warning: heartbeat near election timeout: 213.856737ms
[etcd] Jul 31 19:29:49.225 INFO      | app02: warning: heartbeat near election timeout: 207.1543ms
[etcd] Jul 31 19:31:49.391 INFO      | app02: warning: heartbeat near election timeout: 173.001583ms
[etcd] Jul 31 19:33:49.587 INFO      | app02: warning: heartbeat near election timeout: 219.494006ms
[etcd] Jul 31 19:35:49.749 INFO      | app02: warning: heartbeat near election timeout: 181.632944ms
[etcd] Jul 31 19:37:49.920 INFO      | app02: warning: heartbeat near election timeout: 202.313727ms
[etcd] Jul 31 19:39:50.086 INFO      | app02: warning: heartbeat near election timeout: 168.645856ms
[etcd] Jul 31 19:41:50.261 INFO      | app02: warning: heartbeat near election timeout: 193.292717ms
[etcd] Jul 31 19:43:50.440 INFO      | app02: warning: heartbeat near election timeout: 222.085741ms
[etcd] Jul 31 19:45:50.639 INFO      | app02: warning: heartbeat near election timeout: 221.036645ms
[etcd] Jul 31 19:47:50.815 INFO      | app02: warning: heartbeat near election timeout: 197.009793ms
[etcd] Jul 31 19:49:50.994 INFO      | app02: warning: heartbeat near election timeout: 220.67798ms
[etcd] Jul 31 19:51:51.157 INFO      | app02: warning: heartbeat near election timeout: 189.628257ms
[etcd] Jul 31 19:53:51.325 INFO      | app02: state changed from 'follower' to 'candidate'.
[etcd] Jul 31 19:53:51.325 INFO      | app02: leader changed from 'app01' to ''.
[etcd] Jul 31 19:53:51.326 INFO      | app02: state changed from 'candidate' to 'leader'.
[etcd] Jul 31 19:53:51.326 INFO      | app02: leader changed from '' to 'app02'.

I'm looking at my ping log, and the highest latency logged is is half a millisecond.

Here is my current leader info

{
    "followers": {
        "app01": {
            "counts": {
                "fail": 0,
                "success": 10626
            },
            "latency": {
                "average": 4.188319562676455,
                "current": 0.356693,
                "maximum": 172.631804,
                "minimum": 0.288977,
                "standardDeviation": 11.928285111683419
            }
        },
        "app03": {
            "counts": {
                "fail": 0,
                "success": 10588
            },
            "latency": {
                "average": 4.949293496883256,
                "current": 0.394716,
                "maximum": 340.710121,
                "minimum": 0.326099,
                "standardDeviation": 15.646117752647621
            }
        }
    },
    "leader": "app02"
}

WillPlatnick · 2014-08-01T12:53:34Z

I should also mention nothing is talking to the etcd cluster yet, so it's not processing commands or anything and getting messed up. My leader changed 10+ times while I slept last night.

WillPlatnick · 2014-08-01T15:56:20Z

Here is my self statistics

app01

{
    "leaderInfo": {
        "leader": "app02",
        "startTime": "2014-08-01T10:51:17.327092451-05:00",
        "uptime": "1m55.957327286s"
    },
    "name": "app01",
    "recvAppendRequestCnt": 1259105,
    "recvBandwidthRate": 900.1873044457652,
    "recvPkgRate": 20.100196593631022,
    "sendAppendRequestCnt": 2982741,
    "startTime": "2014-07-30T20:29:53.913685961-05:00",
    "state": "follower"
}

app02

{
    "leaderInfo": {
        "leader": "app02",
        "startTime": "2014-08-01T10:51:17.374857263-05:00",
        "uptime": "2m44.543917287s"
    },
    "name": "app02",
    "recvAppendRequestCnt": 2166321,
    "sendAppendRequestCnt": 1115667,
    "sendBandwidthRate": 1773.0685496022056,
    "sendPkgRate": 39.6039434800582,
    "startTime": "2014-07-30T20:49:59.389824018-05:00",
    "state": "leader"
}

app03

{
    "leaderInfo": {
        "leader": "app02",
        "startTime": "2014-08-01T10:51:17.323766255-05:00",
        "uptime": "3m8.317772415s"
    },
    "name": "app03",
    "recvAppendRequestCnt": 678255,
    "recvBandwidthRate": 900.3151676261217,
    "recvPkgRate": 20.10080749332712,
    "sendAppendRequestCnt": 885476,
    "startTime": "2014-07-31T19:14:34.293943442-05:00",
    "state": "follower"
}

WillPlatnick · 2014-08-01T16:37:18Z

I looked at my state information, and I am seeing get's increase even though no clients should be connected. I did an strace and I am seeing this over and over again for network traffic:

5251  accept4(9, {sa_family=AF_INET6, sin6_port=htons(21101), inet_pton(AF_INET6, "::ffff:10.17.185.139", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_CLOEXEC|SOCK_NONBLOCK) = 11
5251  getsockname(11, {sa_family=AF_INET6, sin6_port=htons(7001), inet_pton(AF_INET6, "::ffff:10.16.145.49", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
5251  setsockopt(11, SOL_TCP, TCP_NODELAY, [1], 4) = 0
5251  accept4(9, 0xc2164e9af0, [112], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
5251  --- SIGPIPE (Broken pipe) @ 0 (0) ---
5251  accept4(9, {sa_family=AF_INET6, sin6_port=htons(14356), inet_pton(AF_INET6, "::ffff:10.17.185.130", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_CLOEXEC|SOCK_NONBLOCK) = 11
5251  getsockname(11, {sa_family=AF_INET6, sin6_port=htons(7001), inet_pton(AF_INET6, "::ffff:10.16.145.49", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
5251  setsockopt(11, SOL_TCP, TCP_NODELAY, [1], 4) = 0
5251  accept4(9, 0xc2164e4f50, [112], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
5251  accept4(9, {sa_family=AF_INET6, sin6_port=htons(9108), inet_pton(AF_INET6, "::ffff:10.16.145.48", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_CLOEXEC|SOCK_NONBLOCK) = 11
5251  getsockname(11, {sa_family=AF_INET6, sin6_port=htons(7001), inet_pton(AF_INET6, "::ffff:10.16.145.49", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
5251  setsockopt(11, SOL_TCP, TCP_NODELAY, [1], 4) = 0
5251  accept4(9, 0xc2164d4230, [112], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
5251  --- SIGPIPE (Broken pipe) @ 0 (0) ---
5251  accept4(9, {sa_family=AF_INET6, sin6_port=htons(9332), inet_pton(AF_INET6, "::ffff:10.16.145.60", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_CLOEXEC|SOCK_NONBLOCK) = 11
5251  getsockname(11, {sa_family=AF_INET6, sin6_port=htons(7001), inet_pton(AF_INET6, "::ffff:10.16.145.49", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
5251  setsockopt(11, SOL_TCP, TCP_NODELAY, [1], 4) = 0
5251  accept4(9, 0xc2164c4850, [112], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
5220  accept4(9, {sa_family=AF_INET6, sin6_port=htons(38526), inet_pton(AF_INET6, "::ffff:10.16.145.58", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_CLOEXEC|SOCK_NONBLOCK) = 11
5220  getsockname(11, {sa_family=AF_INET6, sin6_port=htons(7001), inet_pton(AF_INET6, "::ffff:10.16.145.49", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
5220  setsockopt(11, SOL_TCP, TCP_NODELAY, [1], 4) = 0
5220  accept4(9, 0xc2164b7af0, [112], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)

yichengq · 2014-08-01T17:12:44Z

@WillPlatnick
Could you share the log of these three nodes and standby nodes to us?
Have you tried to use 3 active members only? Does it work well over night?
It is also helpful to share the cluster config info with us. curl http://127.0.0.1:7001/v2/admin/config

WillPlatnick · 2014-08-01T17:26:04Z

@unihorn I will blow my configs away and start a new cluster, standby

WillPlatnick · 2014-08-01T17:48:58Z

@unihorn
Here is my cluster config

{
    "activeSize": 3,
    "removeDelay": 180,
    "syncInterval": 5
}

I will give more info if/as things degrade.

WillPlatnick · 2014-08-01T19:55:19Z

So far, it's going just going as it did when I installed it earlier.

Here's how we start. With missed heartbeats, even though LAN communication is fine and these boxes are not taxed at all. I'll update with more logs as things progress.

[etcd] Aug  1 14:44:58.100 INFO      | app01: warning: heartbeat time out peer="app02" missed=1 backoff="2s"
[etcd] Aug  1 14:49:28.815 INFO      | app01: warning: heartbeat time out peer="app03" missed=1 backoff="2s"
[etcd] Aug  1 14:49:38.269 INFO      | app01: warning: heartbeat time out peer="app02" missed=1 backoff="2s"
[etcd] Aug  1 14:49:43.350 INFO      | app01: warning: heartbeat time out peer="app02" missed=1 backoff="2s"
[etcd] Aug  1 14:49:44.665 INFO      | app01: warning: heartbeat time out peer="app03" missed=1 backoff="2s"
[etcd] Aug  1 14:49:49.550 INFO      | app01: warning: heartbeat time out peer="app02" missed=1 backoff="2s"
[etcd] Aug  1 14:49:51.215 INFO      | app01: warning: heartbeat time out peer="app03" missed=1 backoff="2s"
[etcd] Aug  1 14:49:51.620 INFO      | app01: warning: heartbeat time out peer="app02" missed=5 backoff="4s"
[etcd] Aug  1 14:49:53.316 INFO      | app01: warning: heartbeat time out peer="app03" missed=6 backoff="4s"
[etcd] Aug  1 14:50:03.915 INFO      | app01: warning: heartbeat time out peer="app03" missed=1 backoff="2s"
[etcd] Aug  1 14:50:13.050 INFO      | app01: warning: heartbeat time out peer="app02" missed=1 backoff="2s"
[etcd] Aug  1 14:50:48.050 INFO      | app01: warning: heartbeat time out peer="app02" missed=1 backoff="2s"

yichengq · 2014-08-04T21:59:02Z

@WillPlatnick
CPU usage is always low for all etcds?
The cluster consists of 3 peers and 12 standbys still. Right?
Could you share the system log of etcd with me? That could be super helpful.

WillPlatnick · 2014-08-05T13:07:32Z

@unihorn
The cluster is just 3 actives, as per your request. Here are my etcd logs for all 3 servers in this gist. As you can see, we are going through constant elections, about once per minute right now.

https://gist.github.com/WillPlatnick/d4be17b1704ebb4ae511

CPU usage is higher than it should be since we're not querying etcd at all, it's just sitting there. It doesn't start off this high, it grows.

root      1400  4.3  1.0 345016 334352 ?       Sl   Aug01 239:54 /usr/bin/etcd -config /etc/etcd/etcd.conf

Here is my store information. Please note getsSuccess is growing, even though nothing is querying it.

{
    "compareAndDeleteFail": 0,
    "compareAndDeleteSuccess": 0,
    "compareAndSwapFail": 0,
    "compareAndSwapSuccess": 0,
    "createFail": 0,
    "createSuccess": 3,
    "deleteFail": 0,
    "deleteSuccess": 0,
    "expireCount": 0,
    "getsFail": 6,
    "getsSuccess": 502135,
    "setsFail": 0,
    "setsSuccess": 1,
    "updateFail": 0,
    "updateSuccess": 0,
    "watchers": 0
}

Here's my config

{
    "activeSize": 3,
    "removeDelay": 180,
    "syncInterval": 5
}

And here is an example of leader statistics:

{
    "followers": {
        "app01": {
            "counts": {
                "fail": 125,
                "success": 1610594
            },
            "latency": {
                "average": 5.244450236231638,
                "current": 0.744285,
                "maximum": 468.251326,
                "minimum": 0.172269,
                "standardDeviation": 16.623390540371723
            }
        },
        "app02": {
            "counts": {
                "fail": 260,
                "success": 1609752
            },
            "latency": {
                "average": 4.569998826313543,
                "current": 0.578571,
                "maximum": 742.599151,
                "minimum": 0.188828,
                "standardDeviation": 15.5175726296063
            }
        }
    },
    "leader": "app03"
}

yichengq · 2014-08-07T05:43:10Z

@WillPlatnick
Could you check the disk utilization of three machines? Etcd does a lot sync and write operations on disk.
May you give out the configuration of etcd?

The number of getsSuccess is reasonable because cluster info is also fetched from store.

I guess that you may turn off the snapshot feature, and that may be the reason for frequent leader changes. It makes etcd keep all entries in the memory and slows the heap allocator.
But this cannot explain the heartbeat miss at least once per five minutes well.
I start 3 etcd machines in AWS using m1.small. It doesn't report any heartbeat miss in 8 hours, and that should be the normal case.

WillPlatnick · 2014-08-07T12:54:17Z

Here is what the server looked like over the last 3 days:
https://www.evernote.com/shard/s5/sh/d8a44797-67bb-4aa6-a3a4-1178a39ac627/0d75285b02adbce9b90038b4427bca8c

Here's the disk usage:
https://www.evernote.com/shard/s5/sh/b768741e-9800-489f-a1ee-8c9af81a806b/6d59d4236224dc269aed7eb5b7ffebc4

And my config:

addr = "10.x.x.x:4001"
ca_file = ""
cert_file = ""
cors = []
cpu_profile_file = ""
data_dir = "/etc/etcd"
discovery = "https://discovery.etcd.io/xxxxxxxxxx"
http_read_timeout = 10.0
http_write_timeout = 10.0
key_file = ""
peers = []
peers_file = ""
max_cluster_size = 3
max_result_buffer = 1024
max_retry_attempts = 3
name = "app01"
snapshot = false
verbose = false
very_verbose = false

[peer]
addr = "10.x.x.x:7001"
ca_file = ""
cert_file = ""
key_file = ""

[cluster]
active_size = 3
remove_delay = 180.0
sync_interval = 5.0

yichengq · 2014-08-08T17:26:36Z

@WillPlatnick
The network bandwidth for 3-node etcd is about 10K/s. Could network usage from other processes account for the delay fluctuation?

I think if you could start etcd with snapshot = true, it could solve the frequent leader changes. If it doesn't do snapshot, it keeps all data in memory and allocator slows down the world.

WillPlatnick · 2014-08-08T19:08:54Z

We have a gig pipe, kernel has plenty of open ports and ulimit's seem fine, so I don't think so. We would also see it reflected on our app side as well, which we don't.

I have turned on snapshot (I just grabbed this config file defaults from your documentation, so that may be a good thing to have turned on by default) and I will let you know if frequent elections still happen.

WillPlatnick · 2014-08-11T12:31:54Z

I'm happy to say that since turning snapshot to true, the election issue seems to have stopped!

I am still getting missed heartbeats. What can I do to gather more info for this?

yichengq · 2014-08-14T07:56:45Z

@WillPlatnick
Do the machines run other processes also? Is it possible that they consume too many resources?

WillPlatnick · 2014-08-14T11:22:16Z

No I gave a graph that showed system resource usage. These boxes are not taxed at all when this happens.

On Thu, Aug 14, 2014 at 3:56 AM, Yicheng Qin notifications@github.com
wrote:

@WillPlatnick

Do the machines run other processes also? Is it possible that they consume too many resources?

Reply to this email directly or view it on GitHub:
#915 (comment)

carmstrong · 2014-08-21T22:39:41Z

@unihorn Should snapshot = true be the default, then? We've run into this more than once ourselves, and has been reported by our users.

yichengq · 2014-08-26T18:18:48Z

@carmstrong Yes, the default config file is an old one and has been updated in upstream.

@WillPlatnick I have no idea why that happens. We will possibly try to add more logging for it.

kelseyhightower · 2014-10-20T17:43:22Z

@WillPlatnick Sorry we are getting back to this issue so late, but I'm glad to see things have improved for you. As far as the timeouts are concerned, there are just so many reasons this can happen. I would recommend you following the tuning guide and bump the timeouts little by little until the issue goes away.

To find the root cause of the missed timeouts will require some in depth troubleshooting. If this is a production environment I would strongly recommend running etcd on a dedicate set of node without normal workloads.

With that said I'm closing this ticket out due to age, and the fact that things have improved for you.

arohter mentioned this issue Aug 1, 2014

Leadership thrashing and heartbeat near election timeout warnings #868

Closed

WillPlatnick changed the title ~~etcd timeouts~~ etcd timeouts and election problems Aug 1, 2014

jonboulle mentioned this issue Aug 4, 2014

After a while, etcd consumes most memory and CPU due to constant re-elections coreos/bugs#99

Closed

yichengq mentioned this issue Aug 9, 2014

docs: set snapshot to be true in default configuration #923

Merged

carmstrong mentioned this issue Aug 22, 2014

etcd 0.4.5 continues to break fleet and kill our deis cluster deis/deis#1677

Closed

carmstrong mentioned this issue Aug 26, 2014

etcd 0.4.6 flipping out deis/deis#1705

Closed

yichengq mentioned this issue Aug 26, 2014

more statistics on timing #960

Closed

kernel8liang mentioned this issue Aug 27, 2014

using pxe boot, the etcd always "heartbeat near election timeout" #963

Closed

yichengq added the bug label Aug 28, 2014

kelseyhightower closed this as completed Oct 20, 2014

xiang90 mentioned this issue Dec 15, 2014

stats #1867

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcd timeouts and election problems #915

etcd timeouts and election problems #915

WillPlatnick commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

yichengq commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

yichengq commented Aug 4, 2014

WillPlatnick commented Aug 5, 2014

yichengq commented Aug 7, 2014

WillPlatnick commented Aug 7, 2014

yichengq commented Aug 8, 2014

WillPlatnick commented Aug 8, 2014

WillPlatnick commented Aug 11, 2014

yichengq commented Aug 14, 2014

WillPlatnick commented Aug 14, 2014

Do the machines run other processes also? Is it possible that they consume too many resources?

carmstrong commented Aug 21, 2014

yichengq commented Aug 26, 2014

kelseyhightower commented Oct 20, 2014

etcd timeouts and election problems #915

etcd timeouts and election problems #915

Comments

WillPlatnick commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

yichengq commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

WillPlatnick commented Aug 1, 2014

yichengq commented Aug 4, 2014

WillPlatnick commented Aug 5, 2014

yichengq commented Aug 7, 2014

WillPlatnick commented Aug 7, 2014

yichengq commented Aug 8, 2014

WillPlatnick commented Aug 8, 2014

WillPlatnick commented Aug 11, 2014

yichengq commented Aug 14, 2014

WillPlatnick commented Aug 14, 2014

Do the machines run other processes also? Is it possible that they consume too many resources?

carmstrong commented Aug 21, 2014

yichengq commented Aug 26, 2014

kelseyhightower commented Oct 20, 2014