[ERR] serf: Rejected coordinate from HOST: round trip time not in valid range, duration -99.611868ms is not a positive value less than 10s #3704

mnuic · 2017-11-21T09:09:50Z

`consul version` for both Client and Server

Client: consul 1.0.1
Server: consul 1.0.1

`consul info` for both Client and Server

Client:

same as server

Server:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 32
	services = 45
build:
	prerelease =
	revision = 9564c29
	version = 1.0.1
consul:
	bootstrap = true
	known_datacenters = 7
	leader = true
	leader_addr = 10.0.66.150:8300
	server = true
raft:
	applied_index = 16074526
	commit_index = 16074526
	fsm_pending = 0
	last_contact = 0
	last_log_index = 16074526
	last_log_term = 15
	last_snapshot_index = 16070464
	last_snapshot_term = 15
	latest_configuration = [{Suffrage:Voter ID:386b24e2-c793-cd40-49dd-4116232b96bd Address:10.0.66.150:8300}]
	latest_configuration_index = 1
	num_peers = 0
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 15
runtime:
	arch = amd64
	cpu_count = 8
	goroutines = 472
	max_procs = 8
	os = linux
	version = go1.9.2
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 1
	event_time = 15
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 886
	members = 11
	query_queue = 0
	query_time = 1

Operating system and Environment details

Ubuntu 16.04.03LTS, Docker 17.09

Description of the Issue (and unexpected/desired result)

Upon upgrade consul to version 1.0.1 logs started to fill with messages:

a.b.c.d     2017/11/21 09:01:38 [ERR] serf: Rejected coordinate from HOST1: round trip time not in valid range, duration -206.486µs is not a positive value less than 10s
a.b.c.d     2017/11/21 09:02:14 [ERR] serf: Rejected coordinate from HOST2: round trip time not in valid range, duration -99.611868ms is not a positive value less than 10s
a.b.c.d     2017/11/21 09:04:28 [ERR] serf: Rejected coordinate from HOST3: round trip time not in valid range, duration -765.777µs is not a positive value less than 10s

Logs

no logs except for mention above

The text was updated successfully, but these errors were encountered:

slackpad · 2017-11-21T09:23:06Z

Hi @mnuic we tracked that down but the fix didn't make it into this release cycle but we will pick this up in the next minor release of Consul via hashicorp/memberlist#139. Sorry for the log noise - these can be safely ignored.

mnuic · 2017-11-21T09:29:06Z

@slackpad thank you for the info! Will wait for the next release for production use.

sofax · 2017-12-12T07:37:52Z

I'm afraid this is more than just log noise. consul 1.0.1 does break our test environment, whereas v0.9.3 works flawlessly. The above mentioned error messages are the only ones we see.

slackpad · 2017-12-12T08:57:08Z

@sofax can you provide more details about what is broken for you?

sofax · 2017-12-12T09:10:10Z

@slackpad:
It may or may not be related to this issue - all I can say is that we don't see any other error messages.

Here is the scenario:
We have some integration tests for service health checks, e.g. one with two instances of service A, where initially both instances return an unhealthy state. Then service instance #2 is set to "healthy" (i.e. its health check resource returns a healthy state), which - as expected - makes it available via Consul. However, service instance #1 is suddenly available too, even though its health check resource still returns "unhealthy".

This does not happen with Consul 0.9.3.

slackpad · 2017-12-12T14:30:35Z

@sofax thanks that's definitely not related to this error. Can you please open a new issue with some more details about how your test is working and we will take a look?

sofax · 2017-12-15T09:01:06Z

@slackpad:
Thanks - I think it turned out that the problem lies in our configuration (and in a misinterpration of the documentation or in a configuration example we found on the Internet, that was based on Consul > 0.9.3). We had the field id added to the check definition in both instances with the same value. v0.9.3 apparently/probably did not interpret that property at all, so it simply ignored it and assigned an automatic ID to the checks instead. v1.0.1 does interpret it though, but instead of treating the ID as local to the service instance (which IMO makes more sense), it seems to have global scope, so assigning the same ID to health checks for different service instances (of the same service) won't work.

bitmask777 · 2018-01-02T22:19:50Z

I've upgraded to 1.0.2 (on Windows) and am seeing these messages even though per the changelog this issue (GH-3704) is fixed in 1.0.2

Snipped from log after upgrade to 1.0.2:

2018/01/02 22:07:16 [ERR] serf: Rejected coordinate from host1: round trip time not in valid range, duration 0s is not a positive value less than 10s
2018/01/02 22:07:17 [ERR] serf: Rejected coordinate from host2: round trip time not in valid range, duration 0s is not a positive value less than 10s
2018/01/02 22:07:18 [ERR] serf: Rejected coordinate from host3: round trip time not in valid range, duration 0s is not a positive value less than 10s

I do see that the description of the fix seems specific to a negative value. Perhaps a value of 0 is an uncovered edge case?

I performed my upgrade in a test cluster. Unfortunately I can't proceed with a production upgrade until I resolve this issue since it creates so much noise in the logs.

slackpad added the type/bug Feature does not function as expected label Nov 21, 2017

slackpad added this to the 1.0.2 milestone Nov 21, 2017

slackpad self-assigned this Nov 21, 2017

slackpad closed this as completed in 9f29894 Nov 21, 2017

slackpad mentioned this issue Jan 9, 2018

Zero RTT measurement errors on Windows #3789

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ERR] serf: Rejected coordinate from HOST: round trip time not in valid range, duration -99.611868ms is not a positive value less than 10s #3704

[ERR] serf: Rejected coordinate from HOST: round trip time not in valid range, duration -99.611868ms is not a positive value less than 10s #3704

mnuic commented Nov 21, 2017

slackpad commented Nov 21, 2017

mnuic commented Nov 21, 2017

sofax commented Dec 12, 2017

slackpad commented Dec 12, 2017

sofax commented Dec 12, 2017 •

edited

Loading

slackpad commented Dec 12, 2017

sofax commented Dec 15, 2017

bitmask777 commented Jan 2, 2018

[ERR] serf: Rejected coordinate from HOST: round trip time not in valid range, duration -99.611868ms is not a positive value less than 10s #3704

[ERR] serf: Rejected coordinate from HOST: round trip time not in valid range, duration -99.611868ms is not a positive value less than 10s #3704

Comments

mnuic commented Nov 21, 2017

consul version for both Client and Server

consul info for both Client and Server

Operating system and Environment details

Description of the Issue (and unexpected/desired result)

Logs

slackpad commented Nov 21, 2017

mnuic commented Nov 21, 2017

sofax commented Dec 12, 2017

slackpad commented Dec 12, 2017

sofax commented Dec 12, 2017 • edited Loading

slackpad commented Dec 12, 2017

sofax commented Dec 15, 2017

bitmask777 commented Jan 2, 2018

`consul version` for both Client and Server

`consul info` for both Client and Server

sofax commented Dec 12, 2017 •

edited

Loading