Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic when dns query on non-server node #3407

Closed
MorphBonehunter opened this issue Aug 19, 2017 · 8 comments
Closed

panic when dns query on non-server node #3407

MorphBonehunter opened this issue Aug 19, 2017 · 8 comments
Assignees
Labels
type/bug Feature does not function as expected type/crash The issue description contains a golang panic and stack trace

Comments

@MorphBonehunter
Copy link

consul version for both Client and Server

Client: Consul v0.9.2
Server: Consul v0.9.2

consul info for both Client and Server

Client:

agent:
	check_monitors = 38
	check_ttls = 0
	checks = 38
	services = 12
build:
	prerelease = 
	revision = 75ca2ca
	version = 0.9.2
consul:
	known_servers = 1
	server = false
runtime:
	arch = amd64
	cpu_count = 4
	goroutines = 96
	max_procs = 4
	os = linux
	version = go1.8.3
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 25
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 99
	members = 3
	query_queue = 0
	query_time = 1

Server:

agent:
	check_monitors = 28
	check_ttls = 0
	checks = 28
	services = 11
build:
	prerelease = 
	revision = 75ca2ca
	version = 0.9.2
consul:
	bootstrap = true
	known_datacenters = 1
	leader = true
	leader_addr = 192.168.200.4:8300
	server = true
raft:
	applied_index = 682209
	commit_index = 682209
	fsm_pending = 0
	last_contact = 0
	last_log_index = 682209
	last_log_term = 133
	last_snapshot_index = 680555
	last_snapshot_term = 132
	latest_configuration = [{Suffrage:Voter ID:192.168.200.4:8300 Address:192.168.200.4:8300}]
	latest_configuration_index = 599656
	num_peers = 0
	protocol_version = 2
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 133
runtime:
	arch = amd64
	cpu_count = 2
	goroutines = 131
	max_procs = 2
	os = linux
	version = go1.8.3
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 25
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 99
	members = 3
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 7
	members = 1
	query_queue = 0
	query_time = 1

Operating system and Environment details

Using pre-compiled binaries from releases.hashicorp.com on Arch Linux

Description of the Issue (and unexpected/desired result)

I've build an test environment of 1 consul server node ("bootstrap_expect": 1) and 2 non-server nodes. If i try to query dns on the leader there is no problem, but if i try the same on an non-server node consul crash.
This could reproduced every time on amd64 and arch = arm (my second non-server node).

Reproduction steps

on server: dig -p 8600 @localhost -> no problem
on non-server: dig -p 8600 @localhost -> consul crash with panic

Log Fragments or Link to gist

log from the non-server after entering dig -p 8600 @localhost webserver.service.test.consul

Aug 19 12:35:24 evelyn consul[13385]:     2017/08/19 12:35:24 [DEBUG] agent: Node info in sync
Aug 19 12:35:24 evelyn consul[13385]:     2017/08/19 12:35:24 [DEBUG] dns: request for {webserver.service.test.consul. 6 1} (90.263µs) from client 127.0.0.1:37078 (udp)
Aug 19 12:35:24 evelyn consul[13385]: panic: runtime error: slice bounds out of range
Aug 19 12:35:24 evelyn consul[13385]: goroutine 150 [running]:
Aug 19 12:35:24 evelyn consul[13385]: github.com/hashicorp/consul/agent.(*DNSServer).nameservers(0xc420050b40, 0xc4202bd201, 0xc4201260d4, 0xc42024b080, 0x1, 0x1, 0x1a85340, 0xc420254200)
Aug 19 12:35:24 evelyn consul[13385]:         /gopath/src/github.com/hashicorp/consul/agent/dns.go:294 +0xb69
Aug 19 12:35:24 evelyn consul[13385]: github.com/hashicorp/consul/agent.(*DNSServer).handleQuery(0xc420050b40, 0x1a85340, 0xc420254200, 0xc4200e8630)
Aug 19 12:35:24 evelyn consul[13385]:         /gopath/src/github.com/hashicorp/consul/agent/dns.go:217 +0x636
Aug 19 12:35:24 evelyn consul[13385]: github.com/hashicorp/consul/agent.(*DNSServer).(github.com/hashicorp/consul/agent.handleQuery)-fm(0x1a85340, 0xc420254200, 0xc4200e8630)
Aug 19 12:35:24 evelyn consul[13385]:         /gopath/src/github.com/hashicorp/consul/agent/dns.go:81 +0x48
Aug 19 12:35:24 evelyn consul[13385]: github.com/hashicorp/consul/vendor/github.com/miekg/dns.HandlerFunc.ServeDNS(0xc420126130, 0x1a85340, 0xc420254200, 0xc4200e8630)
Aug 19 12:35:24 evelyn consul[13385]:         /gopath/src/github.com/hashicorp/consul/vendor/github.com/miekg/dns/server.go:84 +0x44
Aug 19 12:35:24 evelyn consul[13385]: github.com/hashicorp/consul/vendor/github.com/miekg/dns.(*ServeMux).ServeDNS(0xc420126110, 0x1a85340, 0xc420254200, 0xc4200e8630)
Aug 19 12:35:24 evelyn consul[13385]:         /gopath/src/github.com/hashicorp/consul/vendor/github.com/miekg/dns/server.go:210 +0x65
Aug 19 12:35:24 evelyn consul[13385]: github.com/hashicorp/consul/vendor/github.com/miekg/dns.(*Server).serve(0xc420067c70, 0x1a7b1e0, 0xc42011b5f0, 0x1a757a0, 0xc420126110, 0xc420600000, 0x49, 0xffff, 0xc420128090, 0xc42024aea0, ...)
Aug 19 12:35:24 evelyn consul[13385]:         /gopath/src/github.com/hashicorp/consul/vendor/github.com/miekg/dns/server.go:577 +0x2fb
Aug 19 12:35:24 evelyn consul[13385]: created by github.com/hashicorp/consul/vendor/github.com/miekg/dns.(*Server).serveUDP
Aug 19 12:35:24 evelyn consul[13385]:         /gopath/src/github.com/hashicorp/consul/vendor/github.com/miekg/dns/server.go:531 +0x2d5
Aug 19 12:35:24 evelyn systemd[1]: consul.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 19 12:35:24 evelyn systemd[1]: consul.service: Unit entered failed state.
Aug 19 12:35:24 evelyn systemd[1]: consul.service: Failed with result 'exit-code'.
Aug 19 12:35:34 evelyn systemd[1]: consul.service: Service hold-off time over, scheduling restart.
@magiconair
Copy link
Contributor

I'll have a look

@magiconair
Copy link
Contributor

magiconair commented Aug 20, 2017

This is code we've changed recently. Can you post curl localhost:8500/v1/agent/self?pretty from both the server and the non-server?

@magiconair magiconair added type/bug Feature does not function as expected type/crash The issue description contains a golang panic and stack trace labels Aug 20, 2017
@magiconair magiconair self-assigned this Aug 20, 2017
@MorphBonehunter
Copy link
Author

Ok, no problem.
As i've experiment further with consul, i've changed the "domain" and the "datacenter" in the meantime but this does not change the crash:

Aug 20 09:58:04 evelyn consul[14548]: panic: runtime error: slice bounds out of range
Aug 20 09:58:04 evelyn consul[14548]: goroutine 152775 [running]:
Aug 20 09:58:04 evelyn consul[14548]: github.com/hashicorp/consul/agent.(*DNSServer).nameservers(0xc420374840, 0xc4203d0501, 0xc42012ed64, 0xc4206a1860, 0x1, 0x1, 0x1a85340, 0xc420134000)
Aug 20 09:58:04 evelyn consul[14548]:         /gopath/src/github.com/hashicorp/consul/agent/dns.go:294 +0xb69
Aug 20 09:58:04 evelyn consul[14548]: github.com/hashicorp/consul/agent.(*DNSServer).handleQuery(0xc420374840, 0x1a85340, 0xc420134000, 0xc4200f63f0)
Aug 20 09:58:04 evelyn consul[14548]:         /gopath/src/github.com/hashicorp/consul/agent/dns.go:217 +0x636
Aug 20 09:58:04 evelyn consul[14548]: github.com/hashicorp/consul/agent.(*DNSServer).(github.com/hashicorp/consul/agent.handleQuery)-fm(0x1a85340, 0xc420134000, 0xc4200f63f0)
Aug 20 09:58:04 evelyn consul[14548]:         /gopath/src/github.com/hashicorp/consul/agent/dns.go:81 +0x48
Aug 20 09:58:04 evelyn consul[14548]: github.com/hashicorp/consul/vendor/github.com/miekg/dns.HandlerFunc.ServeDNS(0xc42012eda0, 0x1a85340, 0xc420134000, 0xc4200f63f0)
Aug 20 09:58:04 evelyn consul[14548]:         /gopath/src/github.com/hashicorp/consul/vendor/github.com/miekg/dns/server.go:84 +0x44
Aug 20 09:58:04 evelyn consul[14548]: github.com/hashicorp/consul/vendor/github.com/miekg/dns.(*ServeMux).ServeDNS(0xc42012ed80, 0x1a85340, 0xc420134000, 0xc4200f63f0)
Aug 20 09:58:04 evelyn consul[14548]:         /gopath/src/github.com/hashicorp/consul/vendor/github.com/miekg/dns/server.go:210 +0x65
Aug 20 09:58:04 evelyn consul[14548]: github.com/hashicorp/consul/vendor/github.com/miekg/dns.(*Server).serve(0xc4204640d0, 0x1a7b1e0, 0xc420358030, 0x1a757a0, 0xc42012ed80, 0xc4204dc000, 0x49, 0xffff, 0xc420130150, 0xc4206a17e0, ...)
Aug 20 09:58:04 evelyn consul[14548]:         /gopath/src/github.com/hashicorp/consul/vendor/github.com/miekg/dns/server.go:577 +0x2fb
Aug 20 09:58:04 evelyn consul[14548]: created by github.com/hashicorp/consul/vendor/github.com/miekg/dns.(*Server).serveUDP
Aug 20 09:58:04 evelyn consul[14548]:         /gopath/src/github.com/hashicorp/consul/vendor/github.com/miekg/dns/server.go:531 +0x2d5
Aug 20 09:58:04 evelyn systemd[1]: consul.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

So here are the output from curl.
server:

{
    "Config": {
        "DevMode": false,
        "Performance": {
            "RaftMultiplier": 0
        },
        "Bootstrap": true,
        "BootstrapExpect": 0,
        "Server": true,
        "NonVotingServer": false,
        "Datacenter": "underverse",
        "DataDir": "/var/lib/consul/",
        "DNSRecursor": "",
        "DNSRecursors": [],
        "DNSConfig": {
            "NodeTTL": 120000000000,
            "ServiceTTL": {
                "*": 30000000000
            },
            "AllowStale": true,
            "EnableTruncate": false,
            "UDPAnswerLimit": 3,
            "MaxStale": 315360000000000000,
            "OnlyPassing": false,
            "DisableCompression": false,
            "RecursorTimeout": 2000000000
        },
        "Domain": "net",
        "HTTPConfig": {
            "BlockEndpoints": null,
            "ResponseHeaders": null
        },
        "DisableKeyringFile": false,
        "EncryptVerifyIncoming": true,
        "EncryptVerifyOutgoing": true,
        "LogLevel": "warn",
        "NodeID": "d362ea44-979d-6b4e-1691-cefdf149be90",
        "DisableHostNodeID": true,
        "NodeName": "pandora",
        "ClientAddr": "0.0.0.0",
        "BindAddr": "192.168.200.4",
        "SerfWanBindAddr": "",
        "SerfLanBindAddr": "",
        "AdvertiseAddr": "192.168.200.4",
        "AdvertiseAddrs": {
            "SerfLan": null,
            "SerfLanRaw": "",
            "SerfWan": null,
            "SerfWanRaw": "",
            "RPC": null,
            "RPCRaw": ""
        },
        "AdvertiseAddrWan": "192.168.200.4",
        "TranslateWanAddrs": false,
        "Ports": {
            "DNS": 8600,
            "HTTP": 8500,
            "HTTPS": -1,
            "SerfLan": 8301,
            "SerfWan": 8302,
            "Server": 8300,
            "RPC": 0
        },
        "Addresses": {
            "DNS": "",
            "HTTP": "",
            "HTTPS": "",
            "RPC": ""
        },
        "TaggedAddresses": {
            "lan": "192.168.200.4",
            "wan": "192.168.200.4"
        },
        "LeaveOnTerm": false,
        "SkipLeaveOnInt": true,
        "Autopilot": {
            "CleanupDeadServers": null,
            "LastContactThresholdRaw": "",
            "MaxTrailingLogs": null,
            "ServerStabilizationTimeRaw": "",
            "RedundancyZoneTag": "",
            "DisableUpgradeMigration": null,
            "UpgradeVersionTag": ""
        },
        "Telemetry": {
            "StatsiteAddr": "",
            "StatsdAddr": "",
            "StatsitePrefix": "consul",
            "DisableHostname": false,
            "PrefixFilter": null,
            "FilterDefault": true,
            "DogStatsdAddr": "",
            "DogStatsdTags": null,
            "CirconusAPIApp": "",
            "CirconusAPIURL": "",
            "CirconusSubmissionInterval": "",
            "CirconusCheckSubmissionURL": "",
            "CirconusCheckID": "",
            "CirconusCheckForceMetricActivation": "",
            "CirconusCheckInstanceID": "",
            "CirconusCheckSearchTag": "",
            "CirconusCheckTags": "",
            "CirconusCheckDisplayName": "",
            "CirconusBrokerID": "",
            "CirconusBrokerSelectTag": ""
        },
        "Protocol": 2,
        "RaftProtocol": 0,
        "EnableDebug": false,
        "VerifyIncoming": false,
        "VerifyIncomingRPC": false,
        "VerifyIncomingHTTPS": false,
        "VerifyOutgoing": false,
        "VerifyServerHostname": false,
        "CAFile": "",
        "CAPath": "",
        "CertFile": "",
        "KeyFile": "",
        "ServerName": "",
        "TLSMinVersion": "tls10",
        "TLSCipherSuitesRaw": "",
        "TLSPreferServerCipherSuites": false,
        "StartJoin": [],
        "StartJoinWan": [],
        "RetryMaxAttempts": 0,
        "RetryIntervalRaw": "",
        "RetryJoinWan": [],
        "RetryMaxAttemptsWan": 0,
        "RetryIntervalWanRaw": "",
        "ReconnectTimeoutLan": 0,
        "ReconnectTimeoutLanRaw": "",
        "ReconnectTimeoutWan": 0,
        "ReconnectTimeoutWanRaw": "",
        "EnableUI": false,
        "UIDir": "",
        "PidFile": "",
        "EnableSyslog": false,
        "SyslogFacility": "LOCAL0",
        "RejoinAfterLeave": false,
        "EnableScriptChecks": true,
        "CheckUpdateInterval": 300000000000,
        "CheckReapInterval": 30000000000,
        "CheckDeregisterIntervalMin": 60000000000,
        "ACLDatacenter": "",
        "ACLTTL": 30000000000,
        "ACLTTLRaw": "",
        "ACLDefaultPolicy": "allow",
        "ACLDisabledTTL": 120000000000,
        "ACLDownPolicy": "extend-cache",
        "EnableACLReplication": false,
        "ACLEnforceVersion8": true,
        "Watches": null,
        "DisableRemoteExec": true,
        "DisableUpdateCheck": false,
        "DisableAnonymousSignature": false,
        "DisableCoordinates": false,
        "Revision": "75ca2ca",
        "Version": "0.9.2",
        "VersionPrerelease": "",
        "UnixSockets": {
            "Usr": "",
            "Grp": "",
            "Perms": ""
        },
        "SessionTTLMin": 0,
        "SessionTTLMinRaw": "",
        "DeprecatedHTTPAPIResponseHeaders": null,
        "DeprecatedRetryJoinEC2": {
            "Region": "",
            "TagKey": "",
            "TagValue": ""
        },
        "DeprecatedRetryJoinGCE": {
            "ProjectName": "",
            "ZonePattern": "",
            "TagValue": "",
            "CredentialsFile": ""
        },
        "DeprecatedRetryJoinAzure": {
            "TagName": "",
            "TagValue": ""
        }
    },
    "Coord": {
        "Vec": [
            0.012794973706575149,
            -0.037002499041168056,
            -0.004569816807829783,
            0.008440454588208928,
            0.051824919600846044,
            0.061080374500702646,
            0.06625808345721815,
            -0.04034362628376743
        ],
        "Error": 0.029365068203238143,
        "Adjustment": 0.001931106570462026,
        "Height": 0.0018465479909626499
    },
    "Member": {
        "Name": "pandora",
        "Addr": "192.168.200.4",
        "Port": 8301,
        "Tags": {
            "bootstrap": "1",
            "build": "0.9.2:75ca2ca",
            "dc": "underverse",
            "id": "d362ea44-979d-6b4e-1691-cefdf149be90",
            "port": "8300",
            "raft_vsn": "2",
            "role": "consul",
            "vsn": "2",
            "vsn_max": "3",
            "vsn_min": "2",
            "wan_join_port": "8302"
        },
        "Status": 1,
        "ProtocolMin": 1,
        "ProtocolMax": 5,
        "ProtocolCur": 2,
        "DelegateMin": 2,
        "DelegateMax": 5,
        "DelegateCur": 4
    },
    "Stats": {
        "agent": {
            "check_monitors": "28",
            "check_ttls": "0",
            "checks": "28",
            "services": "11"
        },
        "build": {
            "prerelease": "",
            "revision": "75ca2ca",
            "version": "0.9.2"
        },
        "consul": {
            "bootstrap": "true",
            "known_datacenters": "1",
            "leader": "true",
            "leader_addr": "192.168.200.4:8300",
            "server": "true"
        },
        "raft": {
            "applied_index": "774605",
            "commit_index": "774605",
            "fsm_pending": "0",
            "last_contact": "0",
            "last_log_index": "774605",
            "last_log_term": "134",
            "last_snapshot_index": "767162",
            "last_snapshot_term": "134",
            "latest_configuration": "[{Suffrage:Voter ID:192.168.200.4:8300 Address:192.168.200.4:8300}]",
            "latest_configuration_index": "599656",
            "num_peers": "0",
            "protocol_version": "2",
            "protocol_version_max": "3",
            "protocol_version_min": "0",
            "snapshot_version_max": "1",
            "snapshot_version_min": "0",
            "state": "Leader",
            "term": "134"
        },
        "runtime": {
            "arch": "amd64",
            "cpu_count": "2",
            "goroutines": "121",
            "max_procs": "2",
            "os": "linux",
            "version": "go1.8.3"
        },
        "serf_lan": {
            "coordinate_resets": "0",
            "encrypted": "false",
            "event_queue": "0",
            "event_time": "26",
            "failed": "0",
            "health_score": "0",
            "intent_queue": "0",
            "left": "0",
            "member_time": "107",
            "members": "3",
            "query_queue": "0",
            "query_time": "1"
        },
        "serf_wan": {
            "coordinate_resets": "0",
            "encrypted": "false",
            "event_queue": "0",
            "event_time": "1",
            "failed": "0",
            "health_score": "0",
            "intent_queue": "0",
            "left": "0",
            "member_time": "7",
            "members": "1",
            "query_queue": "0",
            "query_time": "1"
        }
    },
    "Meta": {}
}

non-server:

{
    "Config": {
        "DevMode": false,
        "Performance": {
            "RaftMultiplier": 0
        },
        "Bootstrap": false,
        "BootstrapExpect": 0,
        "Server": false,
        "NonVotingServer": false,
        "Datacenter": "underverse",
        "DataDir": "/var/lib/consul/",
        "DNSRecursor": "",
        "DNSRecursors": [],
        "DNSConfig": {
            "NodeTTL": 120000000000,
            "ServiceTTL": {
                "*": 30000000000
            },
            "AllowStale": true,
            "EnableTruncate": false,
            "UDPAnswerLimit": 3,
            "MaxStale": 315360000000000000,
            "OnlyPassing": false,
            "DisableCompression": false,
            "RecursorTimeout": 2000000000
        },
        "Domain": "net",
        "HTTPConfig": {
            "BlockEndpoints": null,
            "ResponseHeaders": null
        },
        "DisableKeyringFile": false,
        "EncryptVerifyIncoming": true,
        "EncryptVerifyOutgoing": true,
        "LogLevel": "warn",
        "NodeID": "ce562920-27f5-4236-c610-d0f2674a6ab9",
        "DisableHostNodeID": true,
        "NodeName": "evelyn",
        "ClientAddr": "127.0.0.1",
        "BindAddr": "0.0.0.0",
        "SerfWanBindAddr": "",
        "SerfLanBindAddr": "",
        "AdvertiseAddr": "10.153.200.5",
        "AdvertiseAddrs": {
            "SerfLan": null,
            "SerfLanRaw": "",
            "SerfWan": null,
            "SerfWanRaw": "",
            "RPC": null,
            "RPCRaw": ""
        },
        "AdvertiseAddrWan": "10.153.200.5",
        "TranslateWanAddrs": false,
        "Ports": {
            "DNS": 8600,
            "HTTP": 8500,
            "HTTPS": -1,
            "SerfLan": 8301,
            "SerfWan": 8302,
            "Server": 8300,
            "RPC": 0
        },
        "Addresses": {
            "DNS": "",
            "HTTP": "",
            "HTTPS": "",
            "RPC": ""
        },
        "TaggedAddresses": {
            "lan": "10.153.200.5",
            "wan": "10.153.200.5"
        },
        "LeaveOnTerm": true,
        "SkipLeaveOnInt": false,
        "Autopilot": {
            "CleanupDeadServers": null,
            "LastContactThresholdRaw": "",
            "MaxTrailingLogs": null,
            "ServerStabilizationTimeRaw": "",
            "RedundancyZoneTag": "",
            "DisableUpgradeMigration": null,
            "UpgradeVersionTag": ""
        },
        "Telemetry": {
            "StatsiteAddr": "",
            "StatsdAddr": "",
            "StatsitePrefix": "consul",
            "DisableHostname": false,
            "PrefixFilter": null,
            "FilterDefault": true,
            "DogStatsdAddr": "",
            "DogStatsdTags": null,
            "CirconusAPIApp": "",
            "CirconusAPIURL": "",
            "CirconusSubmissionInterval": "",
            "CirconusCheckSubmissionURL": "",
            "CirconusCheckID": "",
            "CirconusCheckForceMetricActivation": "",
            "CirconusCheckInstanceID": "",
            "CirconusCheckSearchTag": "",
            "CirconusCheckTags": "",
            "CirconusCheckDisplayName": "",
            "CirconusBrokerID": "",
            "CirconusBrokerSelectTag": ""
        },
        "Protocol": 2,
        "RaftProtocol": 0,
        "EnableDebug": false,
        "VerifyIncoming": false,
        "VerifyIncomingRPC": false,
        "VerifyIncomingHTTPS": false,
        "VerifyOutgoing": false,
        "VerifyServerHostname": false,
        "CAFile": "",
        "CAPath": "",
        "CertFile": "",
        "KeyFile": "",
        "ServerName": "",
        "TLSMinVersion": "tls10",
        "TLSCipherSuitesRaw": "",
        "TLSPreferServerCipherSuites": false,
        "StartJoin": [],
        "StartJoinWan": [],
        "RetryMaxAttempts": 0,
        "RetryIntervalRaw": "",
        "RetryJoinWan": [],
        "RetryMaxAttemptsWan": 0,
        "RetryIntervalWanRaw": "",
        "ReconnectTimeoutLan": 0,
        "ReconnectTimeoutLanRaw": "",
        "ReconnectTimeoutWan": 0,
        "ReconnectTimeoutWanRaw": "",
        "EnableUI": false,
        "UIDir": "",
        "PidFile": "",
        "EnableSyslog": false,
        "SyslogFacility": "LOCAL0",
        "RejoinAfterLeave": false,
        "EnableScriptChecks": true,
        "CheckUpdateInterval": 300000000000,
        "CheckReapInterval": 30000000000,
        "CheckDeregisterIntervalMin": 60000000000,
        "ACLDatacenter": "",
        "ACLTTL": 30000000000,
        "ACLTTLRaw": "",
        "ACLDefaultPolicy": "allow",
        "ACLDisabledTTL": 120000000000,
        "ACLDownPolicy": "extend-cache",
        "EnableACLReplication": false,
        "ACLEnforceVersion8": true,
        "Watches": null,
        "DisableRemoteExec": true,
        "DisableUpdateCheck": false,
        "DisableAnonymousSignature": false,
        "DisableCoordinates": false,
        "Revision": "75ca2ca",
        "Version": "0.9.2",
        "VersionPrerelease": "",
        "UnixSockets": {
            "Usr": "",
            "Grp": "",
            "Perms": ""
        },
        "SessionTTLMin": 0,
        "SessionTTLMinRaw": "",
        "DeprecatedHTTPAPIResponseHeaders": null,
        "DeprecatedRetryJoinEC2": {
            "Region": "",
            "TagKey": "",
            "TagValue": ""
        },
        "DeprecatedRetryJoinGCE": {
            "ProjectName": "",
            "ZonePattern": "",
            "TagValue": "",
            "CredentialsFile": ""
        },
        "DeprecatedRetryJoinAzure": {
            "TagName": "",
            "TagValue": ""
        }
    },
    "Coord": {
        "Vec": [
            0.016382470801779177,
            -0.04648184042843447,
            -0.005357491163239937,
            0.011283047926593553,
            0.06538375146262299,
            0.07685677755752458,
            0.08433750646975828,
            -0.05115800428465313
        ],
        "Error": 0.030374569450024886,
        "Adjustment": 0.0036972771671141594,
        "Height": 0.0003613624203095926
    },
    "Member": {
        "Name": "evelyn",
        "Addr": "10.153.200.5",
        "Port": 8301,
        "Tags": {
            "build": "0.9.2:75ca2ca",
            "dc": "underverse",
            "id": "ce562920-27f5-4236-c610-d0f2674a6ab9",
            "role": "node",
            "vsn": "2",
            "vsn_max": "3",
            "vsn_min": "2"
        },
        "Status": 1,
        "ProtocolMin": 1,
        "ProtocolMax": 5,
        "ProtocolCur": 2,
        "DelegateMin": 2,
        "DelegateMax": 5,
        "DelegateCur": 4
    },
    "Stats": {
        "agent": {
            "check_monitors": "38",
            "check_ttls": "0",
            "checks": "38",
            "services": "12"
        },
        "build": {
            "prerelease": "",
            "revision": "75ca2ca",
            "version": "0.9.2"
        },
        "consul": {
            "known_servers": "1",
            "server": "false"
        },
        "runtime": {
            "arch": "amd64",
            "cpu_count": "4",
            "goroutines": "91",
            "max_procs": "4",
            "os": "linux",
            "version": "go1.8.3"
        },
        "serf_lan": {
            "coordinate_resets": "0",
            "encrypted": "false",
            "event_queue": "0",
            "event_time": "26",
            "failed": "0",
            "health_score": "0",
            "intent_queue": "0",
            "left": "0",
            "member_time": "107",
            "members": "3",
            "query_queue": "0",
            "query_time": "1"
        }
    },
    "Meta": {}
}

@magiconair
Copy link
Contributor

OK, I can reproduce it and found the root cause. I should have a fix by tomorrow. Thx for reporting this!

@slackpad
Copy link
Contributor

slackpad commented Aug 20, 2017

Looking over this with @magiconair it comes down to a difference in behavior from ServerAddrs() on clients and servers. The servers are using the RPC router, which is based on the WAN information, so names are suffixed there with the datacenter. The LAN information on the clients doesn't have the suffix.

Thinking about this, I think a better approach might be to perform an RPC to get the Consul servers. This would be a query similar to this:

https://github.com/hashicorp/consul/blob/v0.9.2/agent/dns.go#L684-L712

Using this constant to get the consul service name:

https://github.com/hashicorp/consul/blob/v0.9.2/agent/structs/catalog.go#L20

The benefit of this is that all the code moves into dns.go and we end up with identical behavior on the client and server. We also are using the consistent information in the state store to answer the request, which is behavior I think folks would expect for this. We could remove the ServerAddrs() method from the delegate and the downstream methods as well, so it would be a little simpler overall.

magiconair added a commit that referenced this issue Aug 21, 2017
This patch replaces the code which determines the list of servers in the
current cluster with an RPC call to get the list of active consul
service instances which only run on servers.

This replaces the previous implementation which was more complex and
relied on serf messages which can provide a different view than the
consistent response from the raft log.

As a side effect it makes the implementation independent of the server
and the agent which means it works consistently across both. Different
behavior for server and agent was the root cause for the bug in
http://github.com/hashicorp/consul/issue/3047.

Fixes #3407
@magiconair
Copy link
Contributor

I've implemented a patch based on @slackpad's comments in #3408 but I'm still a bit puzzled on how you triggered the panic.

To get into that codepath you have to explicitly query for either an SOA or an NS record which you are not doing with dig -p 8600 @localhost abc.service.dc1.consul.. I've set up a two node cluster with one server and one agent and registered a service and was able to query the ip address of the service with dig on both server and agent.

I was only able to trigger the panic with these two queries:

dig -p 8600 @localhost dc1.consul. ns
dig -p 8600 @localhost dc1.consul. soa

Maybe I'm missing something obvious or non-obvious like dig automatically querying for NS and/or SOA under certain circumstances.

@MorphBonehunter
Copy link
Author

@magiconair you are right, that was an mistake i made...
my full query was in form of dig -p 8600 @localhost soa abc.service.dc1.consul and i realy managed it to copy/paste/bash history this query on every single try and only change the last part (i have to tcpdump it to realise my mistake!).
So sorry for that confusion 😢 , the panic occurs indeed only on NS and SOA.

@magiconair
Copy link
Contributor

No worries. I'm glad we found that out since that makes the difference between DNS completely broken or new feature broken. :)

magiconair added a commit that referenced this issue Aug 21, 2017
This patch replaces the code which determines the list of servers in the
current cluster with an RPC call to get the list of active consul
service instances which only run on servers.

This replaces the previous implementation which was more complex and
relied on serf messages which can provide a different view than the
consistent response from the raft log.

As a side effect it makes the implementation independent of the server
and the agent which means it works consistently across both. Different
behavior for server and agent was the root cause for the bug in
http://github.com/hashicorp/consul/issue/3047.

Fixes #3407
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected type/crash The issue description contains a golang panic and stack trace
Projects
None yet
Development

No branches or pull requests

3 participants