Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downloader: Lock-up during sync due to circular return logic #16539

Closed
veox opened this issue Apr 20, 2018 · 15 comments
Closed

downloader: Lock-up during sync due to circular return logic #16539

veox opened this issue Apr 20, 2018 · 15 comments
Assignees

Comments

@veox
Copy link
Contributor

veox commented Apr 20, 2018

EDIT: Original title: Lock-up during initial sync

EDIT: See this comment for current best guess on cause.


System information

Geth version: v1.8.4-stable-2423ae01/linux-amd64/go1.10 (installed via Ubuntu PPA package)
OS & Version: Ubuntu 16.04.4 LTS (Xenial Xerus)
Machine: KVM VPS

% uname -a
Linux <hostname> 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Expected behaviour

Continuous fast-sync.

Running via systemd with:

/usr/bin/geth --pprof --metrics --datadir /home/geth/.ethereum --cache 4096 --txpool.pricelimit 31337000 --syncmode fast --ethstats "veox-geth-lightserv-new-RESYNC:$SECRET@ethstats.net"

Actual behaviour

After seemingly-normal operation, and dropping off "stalling" peers once in a while, non-debug log output stops, as shown in this log tail.

At this point, in console using geth attach:

> eth.syncing
{
  currentBlock: 4460991,
  highestBlock: 5402762,
  knownStates: 15834634,
  pulledStates: 15823975,
  startingBlock: 4327715
}
> admin.peers.length
25

Setting

> debug.vmodule("p2p=4,downloader=4")

results in

DEBUG[04-20|15:06:23] Recalculated downloader QoS values       rtt=5.195478857s confidence=1.000 ttl=15.586452156s

being printed repeatedly (the time changes - as expected; rtt/ttl values don't).

Forcibly disconnecting a peer using admin.removePeer("<enode>") works, a new peer is selected from the pool. In other words: p2p still works fine(-ish?).

Steps to reproduce the behaviour

Not sure; this is possibly related to networking conditions on the machine.

Happens anywhere between 5 minutes and 1 hour after starting the node.

Rambling

If I had to hazard a guess, I'd say the node corners itself into selecting peers so fast, that a small traffic spike on the VPS tower makes them all look just slow enough to be dropped.

After that, either the downloader fails to realise the sync-peers are no longer there; QoS fails at hysteresis; all remaining peers are malicious; or something of the sort.

Backtrace

See gist.

@veox
Copy link
Contributor Author

veox commented Apr 20, 2018

I see that on every restart after such a failure, between the "shutting down" message and the actual shutdown, there's this error:

% journalctl --output=cat -b -u geth | grep ERR
ERROR[04-20|13:37:34] Failed to unregister sync peer           peer=d53f364380e3b4a1 err="peer is not registered"
ERROR[04-20|13:40:30] Failed to unregister sync peer           peer=543206370b63c847 err="peer is not registered"
ERROR[04-20|15:08:48] Failed to unregister sync peer           peer=f93f7593eb6bf220 err="peer is not registered"
ERROR[04-20|15:46:21] Failed to unregister sync peer           peer=5e1b61e2ac555a7a err="peer is not registered"

(Now running with debug.verbosity(4), because I don't know what modules I should be actually looking at.)


Grepping for those peer IDs (with some TRACEs from previous runs, when I was fiddling with the logger):

% journalctl --output=cat -b -u geth | grep -E "(d53f364380e3b4a1|543206370b63c847|f93f7593eb6bf220|5e1b61e2ac555a7a)"
ERROR[04-20|13:37:34] Failed to unregister sync peer           peer=d53f364380e3b4a1 err="peer is not registered"
TRACE[04-20|13:38:42] Received node data response              peer=543206370b63c847 count=300     dropped=false timeout=false
TRACE[04-20|13:38:42] Peer throughput measurements updated     peer=543206370b63c847 hps=13.076   bps=1.883  rps=0.607   sps=16.636 miss=0 rtt=8.817892115s
TRACE[04-20|13:38:42] Delivered new batch of data              peer=543206370b63c847 type=receipts  count=6
TRACE[04-20|13:38:42] Peer throughput measurements updated     peer=543206370b63c847 hps=18.804   bps=1.883  rps=0.607   sps=16.636 miss=0 rtt=8.209003701s
TRACE[04-20|13:38:42] Delivered new batch of data              peer=543206370b63c847 type=headers   count=192
TRACE[04-20|13:38:42] Requesting new batch of data             peer=543206370b63c847 type=headers   from=4207467
TRACE[04-20|13:38:43] Peer throughput measurements updated     peer=543206370b63c847 hps=18.804   bps=1.883  rps=0.607   sps=24.730 miss=0 rtt=7.69555938s
TRACE[04-20|13:38:43] Requesting new batch of data             peer=543206370b63c847 type=state     count=384
TRACE[04-20|13:38:43] Peer throughput measurements updated     peer=543206370b63c847 hps=18.804   bps=3.104  rps=0.607   sps=24.730 miss=0 rtt=7.167196521s
TRACE[04-20|13:38:43] Delivered new batch of data              peer=543206370b63c847 type=bodies    count=34:34
DEBUG[04-20|13:40:12] Stalling delivery, dropping              peer=543206370b63c847 type=receipts
DEBUG[04-20|13:40:12] Header download terminated               peer=543206370b63c847
ERROR[04-20|13:40:30] Failed to unregister sync peer           peer=543206370b63c847 err="peer is not registered"
DEBUG[04-20|13:40:30] Removing p2p peer                        id=543206370b63c847 conn=inbound duration=2m23.656s peers=20 req=false err="read tcp <my-ip-snipped>:30303-><their-ip-snipped>:43252: i/o timeout"
WARN [04-20|13:46:54] Stalling state sync, dropping peer       peer=5e1b61e2ac555a7a
DEBUG[04-20|13:46:54] Removing p2p peer                        id=5e1b61e2ac555a7a conn=inbound duration=13.707s   peers=15 req=false err="useless peer"
WARN [04-20|14:11:05] Stalling state sync, dropping peer       peer=f93f7593eb6bf220
DEBUG[04-20|15:08:48] Removing static node                     node=enode://f93f7593eb6bf2209ed1486e9b15b081a7085d7152dcfa36b62ae3ddcdda7b03c3c51ad8898a9a453e48dfad46d112735590ea9b5cb68a057fbb7b0a448a7015
ERROR[04-20|15:08:48] Failed to unregister sync peer           peer=f93f7593eb6bf220 err="peer is not registered"
DEBUG[04-20|15:08:48] Removing p2p peer                        id=f93f7593eb6bf220 conn=inbound duration=1h12m29.948s peers=24 req=false err="disconnect requested"
DEBUG[04-20|15:27:07] Unrequested node data                    peer=5e1b61e2ac555a7a len=2
DEBUG[04-20|15:27:07] Unrequested node data                    peer=5e1b61e2ac555a7a len=40
DEBUG[04-20|15:33:06] Header request timed out                 peer=5e1b61e2ac555a7a elapsed=6.000006s
ERROR[04-20|15:46:21] Failed to unregister sync peer           peer=5e1b61e2ac555a7a err="peer is not registered"
DEBUG[04-20|15:46:21] Removing p2p peer                        id=5e1b61e2ac555a7a conn=inbound duration=21m39.577s peers=6  req=false err="client quitting"

(Suggestions for egrep patterns welcome.)

That "Removing static node" message is from when I tried to manually remove the peer using admin.removePeer(<enode>), as described in the original report.

@veox
Copy link
Contributor Author

veox commented Apr 20, 2018

Prior to every lock-up, there is also this:

% journalctl --output=cat -b -u geth | grep Rolled
WARN [04-20|12:48:38] Rolled back headers                      count=2048 header=4222970->4220922 fast=4185744->4185744 block=0->0
WARN [04-20|13:40:12] Rolled back headers                      count=2048    header=4234602->4232554 fast=4197105->4197105 block=0->0
WARN [04-20|14:11:13] Rolled back headers                      count=2048 header=4497699->4495651 fast=4459957->4459957 block=0->0
WARN [04-20|15:33:07] Rolled back headers                      count=2048 header=4635071->4633023 fast=4598545->4598545 block=0->0
WARN [04-20|16:14:56] Rolled back headers                      count=2048 header=4860764->4858716 fast=4838505->4838505 block=0->0
WARN [04-20|16:43:17] Rolled back headers                      count=2048 header=4939473->4937425 fast=4902319->4902319 block=0->0
WARN [04-20|16:56:02] Rolled back headers                      count=2048 header=4941607->4939559 fast=4905619->4905619 block=0->0

@veox
Copy link
Contributor Author

veox commented Apr 20, 2018

The issue seems to disappear after the block sync is done; the node is currently mostly pulling in state, and hasn't experienced a lock-up (therefore hasn't been restarted) for over an hour.


EDIT: Untrue; they become more rare, but they don't disappear completely.

@flgh
Copy link

flgh commented Apr 20, 2018

same issue here:
instance: Geth/v1.8.4-stable-2423ae01/linux-amd64/go1.10
cmdline: ./geth-1.8.4 --fast --cache 4096 --rpc --rpcaddr xxx.xxx.xxx.xxx --rpcvhosts * --nat extip:xxx.xxx.xxx.xxx --port 30304 --lightserv 1 --lightpeers 2 --rpcapi admin,db,eth,debug,miner,net,shh,txpool,personal,web3 console

1st time after a Rolled back headers message,
2nd time:
INFO [04-20|21:43:02] Imported new block receipts count=679 elapsed=5.380s number=4786290 hash=716762��4a7038 size=61.98mB ignored=0
INFO [04-20|21:43:05] Imported new state entries count=1490 elapsed=5.710ms processed=13852964 pending=49488 retry=4 duplicate=2108 unexpected=6978
INFO [04-20|21:43:06] Imported new state entries count=1906 elapsed=11.368ms processed=13854870 pending=47850 retry=3 duplicate=2108 unexpected=6978
INFO [04-20|21:43:06] Imported new block receipts count=599 elapsed=4.256s number=4786889 hash=422d7f��04f93d size=53.36mB ignored=0
INFO [04-20|21:43:06] Imported new state entries count=1853 elapsed=10.791ms processed=13856723 pending=46175 retry=0 duplicate=2108 unexpected=6978
INFO [04-20|21:43:09] Imported new block receipts count=366 elapsed=2.297s number=4787255 hash=f8027a��fbdf2e size=31.49mB ignored=0
INFO [04-20|21:43:09] Imported new state entries count=1268 elapsed=5.896ms processed=13857991 pending=47443 retry=3 duplicate=2108 unexpected=6978
INFO [04-20|21:43:12] Imported new block receipts count=541 elapsed=2.908s number=4787796 hash=a3b931��406f54 size=44.17mB ignored=0
INFO [04-20|21:43:12] Imported new block receipts count=146 elapsed=679.101ms number=4787942 hash=0fbd56��fe5506 size=11.48mB ignored=0
INFO [04-20|21:43:13] Imported new block receipts count=24 elapsed=182.224ms number=4787966 hash=45d1f9��f74e67 size=1.71mB ignored=0
INFO [04-20|21:43:14] Imported new state entries count=1505 elapsed=15.724ms processed=13859496 pending=47645 retry=5 duplicate=2108 unexpected=6978
INFO [04-20|21:43:14] Imported new state entries count=1996 elapsed=11.544ms processed=13861492 pending=45424 retry=68 duplicate=2108 unexpected=6978

web3.eth.syncing
{
currentBlock: 4787966,
highestBlock: 5475341,
knownStates: 13906916,
pulledStates: 13861492,
startingBlock: 1398556
}

when stalled, %cpu goes down from ~200% to ~60%, disk i/o stays at ~40MB/s read ~40MB/s write

@flgh
Copy link

flgh commented Apr 20, 2018

once again:
INFO [04-20|22:27:57] Imported new block receipts count=427 elapsed=5.318s number=4875235 hash=8d2951��e1cbb7 size=63.80mB ignored=0
INFO [04-20|22:28:00] Imported new state entries count=411 elapsed=1.502ms processed=14691362 pending=54926 retry=0 duplicate=35 unexpected=162
INFO [04-20|22:28:03] Imported new block headers count=2048 elapsed=3.958s number=4912894 hash=2f370f��4c9606 ignored=0
WARN [04-20|22:28:04] Rolled back headers count=2048 header=4912894->4910846 fast=4875235->4875235 block=0->0
INFO [04-20|22:28:04] Imported new state entries count=1446 elapsed=5.436ms processed=14692808 pending=54292 retry=0 duplicate=35 unexpected=162
INFO [04-20|22:28:05] Imported new block receipts count=396 elapsed=7.629s number=4875631 hash=a7a58f��dd5b50 size=55.68mB ignored=0

web3.eth.syncing
{
currentBlock: 4875631,
highestBlock: 5475934,
knownStates: 14747100,
pulledStates: 14692808,
startingBlock: 4787966
}
admin.peers.forEach(function(value){console.log(value.network.remoteAddress+"\t"+value.name)})
103.218.156.70:43104 Geth/v1.8.2-stable-b8b9f7f4/linux-amd64/go1.9.2
222.211.220.180:48448 Geth/v1.8.3-stable/darwin-amd64/go1.10
163.158.112.239:59764 Geth/v1.8.2-stable-b8b9f7f4/windows-amd64/go1.9.2
37.187.138.208:30303 Geth/v1.8.0-unstable-02aeb3d7/linux-amd64/go1.9.2
177.47.25.73:30303 Geth/v1.8.2-stable-b8b9f7f4/linux-amd64/go1.9.4
139.59.89.55:30303 Geth/v1.8.2-stable/linux-amd64/go1.8
188.242.205.9:36933 Geth/v1.8.4-stable-2423ae01/windows-amd64/go1.10.1
34.229.52.243:52098 Geth/v1.8.3-stable/linux-amd64/go1.10
207.140.28.134:64547 Geth/v1.8.2-stable-b8b9f7f4/linux-amd64/go1.9.4
190.2.131.22:30303 Geth/v1.8.1-stable-1e67410e/linux-amd64/go1.9.4
111.223.244.41:30303 Geth/v1.8.3-stable-329ac18e/linux-amd64/go1.10
98.236.39.57:64075 Geth/v1.8.2-stable-b8b9f7f4/windows-amd64/go1.9.2
47.74.13.140:56860 Geth/v1.7.3-stable-4bb3c89d/linux-amd64/go1.9
46.4.68.99:53836 Geth/v1.7.3-stable-4bb3c89d/linux-amd64/go1.9.2
94.23.209.80:30303 Geth/myethereumpool.com/v1.8.0-unstable-c6069a62/linux-amd64/go1.9.2
18.233.150.214:55530 Geth/v1.8.3-stable/linux-amd64/go1.10
13.57.23.192:58834 Geth/v1.7.3-stable/linux-amd64/go1.8.4
122.128.111.212:58598 Geth/v1.8.2-stable-b8b9f7f4/linux-amd64/go1.9.4
91.244.6.40:50797 Geth/v1.8.1-stable-1e67410e/darwin-amd64/go1.9.4
18.232.102.125:35860 Geth/v1.8.3-stable/linux-amd64/go1.10
184.82.140.49:51900 Geth/v1.8.2-stable-b8b9f7f4/linux-amd64/go1.9.4
54.250.244.254:58398 Geth/v1.7.3-stable-4bb3c89d/linux-amd64/go1.9.2
74.75.46.39:30303 Geth/v1.8.4-stable-2423ae01/windows-amd64/go1.10.1
62.76.96.6:30303 Geth/v1.6.7-stable/freebsd-amd64/go1.9
98.242.137.155:51510 Parity/v1.7.8-stable-d5fcf3b-20171025/x86_64-windows-msvc/rustc1.20.0
54.173.98.115:55624 Geth/v1.8.3-stable/linux-amd64/go1.10
undefined

DEBUG[04-20|22:36:53] Bad discv4 packet addr=66.90.137.190:30303 err="bad hash"
DEBUG[04-20|22:36:53] Recalculated downloader QoS values rtt=2.339245322s confidence=1.000 ttl=7.017742983s
DEBUG[04-20|22:36:53] Bad discv4 packet addr=37.187.142.200:30303 err="bad hash"
DEBUG[04-20|22:36:53] Bad discv4 packet addr=37.187.142.200:30303 err="bad hash"
DEBUG[04-20|22:36:53] Bad discv4 packet addr=37.139.18.135:46969 err="bad hash"
DEBUG[04-20|22:36:53] Bad discv4 packet addr=37.139.18.135:46969 err="bad hash"
DEBUG[04-20|22:36:53] Bad discv4 packet addr=46.166.165.70:30303 err="bad hash"
DEBUG[04-20|22:36:53] Bad discv4 packet addr=46.166.165.70:30303 err="bad hash"
DEBUG[04-20|22:36:54] Bad discv4 packet addr=13.93.211.84:30303 err="bad hash"
DEBUG[04-20|22:36:54] Bad discv4 packet addr=13.93.211.84:30303 err="bad hash"
DEBUG[04-20|22:36:54] Revalidated node b=15 id=2b74e7004f2f5db2
DEBUG[04-20|22:36:54] Bad discv4 packet addr=190.153.141.26:30303 err="bad hash"
DEBUG[04-20|22:36:54] Bad discv4 packet addr=161.202.29.153:30303 err="bad hash"
DEBUG[04-20|22:36:54] Bad discv4 packet addr=161.202.29.153:30303 err="bad hash"
DEBUG[04-20|22:36:54] Adding p2p peer name=Pirl/v1.8.1-stable-9... addr=37.187.24.200:60606 peers=27
DEBUG[04-20|22:36:54] Ethereum peer connected id=cf065474d8ab50df conn=dyndial name=Pirl/v1.8.1-stable-908baf27/linux-amd64/go1.10
DEBUG[04-20|22:36:54] Ethereum handshake failed id=cf065474d8ab50df conn=dyndial err="Genesis block mismatch - 29a742ba74d89fc2 (!= d4e56740f876aef8)"
DEBUG[04-20|22:36:54] Removing p2p peer id=cf065474d8ab50df conn=dyndial duration=13.648ms peers=26 req=true err="subprotocol error"
DEBUG[04-20|22:36:54] Bad discv4 packet addr=73.26.151.112:30303 err="bad hash"
DEBUG[04-20|22:36:55] Bad discv4 packet addr=47.156.184.37:30303 err="bad hash"
DEBUG[04-20|22:36:55] Adding p2p peer name=Geth/v1.8.2-stable-b... addr=159.65.167.139:30303 peers=27
DEBUG[04-20|22:36:55] Ethereum peer connected id=505c74d11a29bb8b conn=dyndial name=Geth/v1.8.2-stable-b8b9f7f4/linux-amd64/go1.9.4
DEBUG[04-20|22:36:55] Ethereum handshake failed id=505c74d11a29bb8b conn=dyndial err=EOF
DEBUG[04-20|22:36:55] Removing p2p peer id=505c74d11a29bb8b conn=dyndial duration=106.743ms peers=26 req=true err="subprotocol error"
DEBUG[04-20|22:36:55] Recalculated downloader QoS values rtt=2.339245322s confidence=1.000 ttl=7.017742983s
DEBUG[04-20|22:36:55] Bad discv4 packet addr=69.137.42.160:30303 err="bad hash"
DEBUG[04-20|22:36:55] Bad discv4 packet addr=69.137.42.160:30303 err="bad hash"
DEBUG[04-20|22:36:55] Bad discv4 packet addr=35.183.17.229:30303 err="bad hash"
DEBUG[04-20|22:36:55] Bad discv4 packet addr=35.183.17.229:30303 err="bad hash"
DEBUG[04-20|22:36:56] Adding p2p peer name=Geth/v1.8.4-stable/d... addr=218.102.216.95:30303 peers=27
DEBUG[04-20|22:36:56] Ethereum peer connected id=5ffd6db3b5704cd0 conn=dyndial name=Geth/v1.8.4-stable/darwin-amd64/go1.10.1
DEBUG[04-20|22:36:56] Bad discv4 packet addr=106.51.16.180:6796 err="bad hash"
DEBUG[04-20|22:36:56] Fetching batch of headers id=5ffd6db3b5704cd0 conn=dyndial count=1 fromnum=1920000 skip=0 reverse=false
DEBUG[04-20|22:36:56] Peer discarded announcement peer=aad2e4d0768efa04 number=5476107 hash=c31dd5��33b3af distance=5476107
DEBUG[04-20|22:36:56] Peer discarded announcement peer=290f087f29279aab number=5476107 hash=c31dd5��33b3af distance=5476107
DEBUG[04-20|22:36:56] Bad discv4 packet addr=73.26.151.112:30303 err="bad hash"
DEBUG[04-20|22:36:56] Verified to be on the same side of the DAO fork id=5ffd6db3b5704cd0 conn=dyndial
DEBUG[04-20|22:36:56] Peer discarded announcement peer=d880025a24fe8e41 number=5476107 hash=c31dd5��33b3af distance=5476107
DEBUG[04-20|22:36:56] Peer discarded announcement peer=143ce4eed974fadf number=5476107 hash=c31dd5��33b3af distance=5476107
DEBUG[04-20|22:36:57] Bad discv4 packet addr=46.39.167.51:30303 err="bad hash"
DEBUG[04-20|22:36:57] Bad discv4 packet addr=46.39.167.51:30303 err="bad hash"
DEBUG[04-20|22:36:57] Bad discv4 packet addr=78.171.68.225:30303 err="bad hash"
DEBUG[04-20|22:36:57] Bad discv4 packet addr=78.171.68.225:30303 err="bad hash"
DEBUG[04-20|22:36:57] Bad discv4 packet addr=40.118.3.223:30304 err="bad hash"
DEBUG[04-20|22:36:57] Bad discv4 packet addr=40.118.3.223:30304 err="bad hash"
DEBUG[04-20|22:36:57] Bad discv4 packet addr=24.85.99.142:30303 err="bad hash"
DEBUG[04-20|22:36:57] Bad discv4 packet addr=24.85.99.142:30303 err="bad hash"
DEBUG[04-20|22:36:57] Discarded propagated block, too far away peer=cc5bafc20bf77b82 number=5476107 hash=c31dd5��33b3af distance=5476107
DEBUG[04-20|22:36:57] Bad discv4 packet addr=171.214.245.237:27648 err="bad hash"
DEBUG[04-20|22:36:57] Bad discv4 packet addr=171.214.245.237:27648 err="bad hash"
DEBUG[04-20|22:36:57] Bad discv4 packet addr=84.165.151.174:30303 err="bad hash"
DEBUG[04-20|22:36:57] Bad discv4 packet addr=84.165.151.174:30303 err="bad hash"
DEBUG[04-20|22:36:58] Peer discarded announcement peer=f25d29b773b2c9d9 number=5476107 hash=c31dd5��33b3af distance=5476107
DEBUG[04-20|22:36:58] Peer discarded announcement peer=f25d29b773b2c9d9 number=5476106 hash=aae813��076358 distance=5476106
DEBUG[04-20|22:36:58] Discarded propagated block, too far away peer=fb8158e0b4ec5c75 number=5476107 hash=c31dd5��33b3af distance=5476107
DEBUG[04-20|22:36:58] Recalculated downloader QoS values rtt=2.339245322s confidence=1.000 ttl=7.017742983s
DEBUG[04-20|22:36:58] Discarded propagated block, too far away peer=143ce4eed974fadf number=5476107 hash=ab5707��0d3567 distance=5476107
DEBUG[04-20|22:36:58] Bad discv4 packet addr=37.139.18.135:46969 err="bad hash"
DEBUG[04-20|22:36:58] Bad discv4 packet addr=37.139.18.135:46969 err="bad hash"
DEBUG[04-20|22:36:58] Peer discarded announcement peer=bb5f974b9b29475a number=5476107 hash=c31dd5��33b3af distance=5476107
DEBUG[04-20|22:36:58] Bad packet from 54.86.144.203:30303: bad prefix
DEBUG[04-20|22:36:58] Bad discv4 packet addr=190.153.141.26:30303 err="bad hash"
DEBUG[04-20|22:36:58] Bad discv4 packet addr=47.156.184.37:30303 err="bad hash"
DEBUG[04-20|22:36:58] Bad discv4 packet addr=65.181.55.82:27382 err="bad hash"
DEBUG[04-20|22:36:58] Bad discv4 packet addr=159.65.141.137:30303 err="bad hash"
DEBUG[04-20|22:36:58] Bad discv4 packet addr=159.65.141.137:30303 err="bad hash"
DEBUG[04-20|22:36:58] Bad discv4 packet addr=109.203.112.175:30303 err="bad hash"
DEBUG[04-20|22:36:58] invalid neighbour (59.58.56.140) from 23e8afc4bcd69322@109.203.112.175:30303: low port
DEBUG[04-20|22:36:58] Bad discv4 packet addr=109.203.112.175:30303 err="bad hash"
DEBUG[04-20|22:36:58] Peer discarded announcement peer=d880025a24fe8e41 number=5476107 hash=ab5707��0d3567 distance=5476107
DEBUG[04-20|22:36:59] Bad discv4 packet addr=84.160.185.4:30303 err="bad hash"
DEBUG[04-20|22:36:59] Discarded propagated block, too far away peer=1adfa9dc038e0ef3 number=5476107 hash=ab5707��0d3567 distance=5476107
DEBUG[04-20|22:36:59] Discarded propagated block, too far away peer=d880025a24fe8e41 number=5476108 hash=ed3e5f��d6336d distance=5476108
DEBUG[04-20|22:36:59] Peer discarded announcement peer=fb8158e0b4ec5c75 number=5476107 hash=ab5707��0d3567 distance=5476107
DEBUG[04-20|22:36:59] Peer discarded announcement peer=cc5bafc20bf77b82 number=5476107 hash=ab5707��0d3567 distance=5476107
DEBUG[04-20|22:36:59] Peer discarded announcement peer=d880025a24fe8e41 number=5476107 hash=b22055��6491f1 distance=5476107
DEBUG[04-20|22:36:59] Peer discarded announcement peer=0ac301696a4340e1 number=5476107 hash=c31dd5��33b3af distance=5476107
DEBUG[04-20|22:36:59] Peer discarded announcement peer=bb5f974b9b29475a number=5476107 hash=ab5707��0d3567 distance=5476107
DEBUG[04-20|22:37:00] Peer discarded announcement peer=143ce4eed974fadf number=5476107 hash=b22055��6491f1 distance=5476107
DEBUG[04-20|22:37:00] Peer discarded announcement peer=143ce4eed974fadf number=5476108 hash=2c83d2��46fd3b distance=5476108
DEBUG[04-20|22:37:00] Bad discv4 packet addr=217.210.201.99:30303 err="bad hash"
DEBUG[04-20|22:37:00] Replaced dead node b=8 id=087bb67fc15d3d28 ip=118.193.191.101 r=708e8597c380ea98 rip=35.197.91.177
DEBUG[04-20|22:37:00] Peer discarded announcement peer=d880025a24fe8e41 number=5476108 hash=2c83d2��46fd3b distance=5476108
DEBUG[04-20|22:37:00] Bad discv4 packet addr=35.182.255.28:30303 err="bad hash"
DEBUG[04-20|22:37:00] Recalculated downloader QoS values rtt=2.339245322s confidence=1.000 ttl=7.017742983s
DEBUG[04-20|22:37:00] Peer discarded announcement peer=290f087f29279aab number=5476107 hash=b22055��6491f1 distance=5476107
DEBUG[04-20|22:37:00] Peer discarded announcement peer=fb8158e0b4ec5c75 number=5476107 hash=b22055��6491f1 distance=5476107
DEBUG[04-20|22:37:00] Bad discv4 packet addr=46.166.165.70:30303 err="bad hash"
DEBUG[04-20|22:37:00] Bad discv4 packet addr=46.166.165.70:30303 err="bad hash"
DEBUG[04-20|22:37:00] Peer discarded announcement peer=143ce4eed974fadf number=5476108 hash=ed3e5f��d6336d distance=5476108
DEBUG[04-20|22:37:00] Bad packet from 34.228.62.228:30303: bad prefix
DEBUG[04-20|22:37:01] Discarded propagated block, too far away peer=bb5f974b9b29475a number=5476107 hash=b22055��6491f1 distance=5476107
DEBUG[04-20|22:37:01] Peer discarded announcement peer=fb8158e0b4ec5c75 number=5476108 hash=ed3e5f��d6336d distance=5476108
DEBUG[04-20|22:37:01] Bad discv4 packet addr=52.43.69.3:30303 err="bad hash"
DEBUG[04-20|22:37:01] Peer discarded announcement peer=bb5f974b9b29475a number=5476108 hash=ed3e5f��d6336d distance=5476108
DEBUG[04-20|22:37:01] Bad discv4 packet addr=217.210.201.99:30303 err="bad hash"
DEBUG[04-20|22:37:02] Bad discv4 packet addr=206.189.6.48:30304 err="bad hash"
DEBUG[04-20|22:37:02] Peer discarded announcement peer=bb5f974b9b29475a number=5476108 hash=2c83d2��46fd3b distance=5476108
DEBUG[04-20|22:37:02] Bad discv4 packet addr=52.212.134.10:30303 err="bad hash"
DEBUG[04-20|22:37:02] Bad discv4 packet addr=52.212.134.10:30303 err="bad hash"
DEBUG[04-20|22:37:02] Bad discv4 packet addr=65.181.55.82:27382 err="bad hash"
DEBUG[04-20|22:37:02] Bad discv4 packet addr=85.248.228.251:30303 err="bad hash"
DEBUG[04-20|22:37:02] Bad discv4 packet addr=85.248.228.251:30303 err="bad hash"
DEBUG[04-20|22:37:02] Recalculated downloader QoS values rtt=2.339245322s confidence=1.000 ttl=7.017742983s
DEBUG[04-20|22:37:02] Bad discv4 packet addr=82.202.236.216:30303 err="bad hash"
DEBUG[04-20|22:37:02] Peer discarded announcement peer=f25d29b773b2c9d9 number=5476108 hash=2c83d2��46fd3b distance=5476108
DEBUG[04-20|22:37:02] Peer discarded announcement peer=f25d29b773b2c9d9 number=5476107 hash=ab5707��0d3567 distance=5476107
DEBUG[04-20|22:37:02] Peer discarded announcement peer=f25d29b773b2c9d9 number=5476107 hash=b22055��6491f1 distance=5476107
DEBUG[04-20|22:37:02] Discarded propagated block, too far away peer=f25d29b773b2c9d9 number=5476108 hash=2c83d2��46fd3b distance=5476108
DEBUG[04-20|22:37:02] Bad discv4 packet addr=192.169.153.139:30303 err="bad hash"
DEBUG[04-20|22:37:02] Bad discv4 packet addr=192.169.153.139:30303 err="bad hash"
DEBUG[04-20|22:37:02] invalid neighbour (93.120.235.221) from 0d22a51566d803e5@192.169.153.139:30303: low port
DEBUG[04-20|22:37:02] Peer discarded announcement peer=290f087f29279aab number=5476106 hash=aae813��076358 distance=5476106
DEBUG[04-20|22:37:02] Bad discv4 packet addr=177.45.243.237:30303 err="bad hash"
DEBUG[04-20|22:37:02] invalid neighbour (36.80.25.41) from 74fd3df0d235874d@177.45.243.237:30303: low port
DEBUG[04-20|22:37:02] Bad discv4 packet addr=177.45.243.237:30303 err="bad hash"
DEBUG[04-20|22:37:03] Peer discarded announcement peer=2e14aa28f29d73b6 number=5476107 hash=c31dd5��33b3af distance=5476107
DEBUG[04-20|22:37:03] Bad discv4 packet addr=95.165.144.143:30303 err="bad hash"
DEBUG[04-20|22:37:03] Bad discv4 packet addr=61.100.188.81:30303 err="bad hash"
DEBUG[04-20|22:37:03] Bad discv4 packet addr=61.100.188.81:30303 err="bad hash"
DEBUG[04-20|22:37:03] Bad discv4 packet addr=165.227.84.0:30303 err="bad hash"
DEBUG[04-20|22:37:03] Bad discv4 packet addr=47.156.184.37:30303 err="bad hash"

@reductionista
Copy link
Contributor

reductionista commented Apr 20, 2018

I'm seeing a similar issue, deadlocking during state sync. At first I thought it was due to using an HDD for storage, which many have warned about; but even after switching to an SSD I still get the same deadlock.

There may be more than one issue here, as sometimes even after I try to kill geth it starts to shut down but never finishes. While other times it is able to shut itself down.

I recently saved a backtrace of an example of it trying to shut down after receiving SIGINT, and then sending it 9 more to force a panic & backtrace:

Backtrace

[Update: removed "suspicious" suggestion which turned out to be unrelated and probably not an issue.]

@CryptoKiddies
Copy link

CryptoKiddies commented Apr 21, 2018

I was going to open an issue as well, but mine is incredibly similar. I was hoping the stalls were confined to the sync process, but alas no. The issue continues after sync.

System information

Geth version: v1.8.4-stable-2423ae01/go1.10 (installed via Ubuntu PPA package)
OS & Version: Ubuntu 16.04.4 LTS
Hardware: AWS t.medium 4GB 2 core

Expected

  1. Fast sync in 3-4 hours.
  2. Continuous operation after sync.

Actual

  1. The sync stalled repeatedly and required systemd restarts to get geth communicating with the network again. This of course disrupted my sync, which required 4 days.

  2. The system stalls again. In verbosity mode=4, I see the syslog output come to a stop. In verbosity mode=5, I still see txn validation activity, however the latest block number stays fixed and state freezes, as verified by checking eth.blockNumber.

  3. The stall happens more suddenly if I use --cache=1024, although I no longer see the past Out of Memory error behavior of v1.7.

screen shot 2018-04-20 at 7 18 24 pm

I notice that a DHCPREQUEST follows interruption by 1 to 2 minutes, but I doubt this is relevant.

Steps to Reproduce Behavior

As the user stated above, possibly related to networking conditions on the machine. Happens anywhere between 2 minutes and 1 hour after starting the node.

Thoughts

I think this may involve a QoS issue. I notice that the nodes I end up networking with are not providing useful information and see continuous drops in the log output.

Does AWS throttle P2P connections? Could there be a problem with running Geth on AWS? Doubtful...

I need to roll out a production app very soon and need a stable version of geth. Systemd is useless in dealing with stalls, as there is no detectable failure for it to trigger restart. I would like to know if anyone recommends a specific stable version, v1.8.2 or v1.7 perhaps? Many thanks.

@reductionista
Copy link
Contributor

reductionista commented Apr 21, 2018

After examining this more closely, I think I see what's causing the deadlock. It's due to a bug in this commit on April 16, 2018:

eth/downloader: wait for all fetcher goroutines to exit before termin…

The problem is that wgCancel.Done() is only called when each of the fetchers return. But some of the fetchers (for example, the header fetcher as well as the state fetcher) can call dropPeer(). Inside dropPeer() (aka removePeer()) if it is the master peer it calls d.Cancel() which waits on wgCancel before it can return. But it will never finish waiting, since it is waiting on itself to return.

@holiman
Copy link
Contributor

holiman commented Apr 21, 2018

Thanks @reductionista for investigating! @fjl @karalabe ptal

@veox veox changed the title Lock-up during initial sync downloader: Lock-up during sync due to circular return logic Apr 21, 2018
@veox
Copy link
Contributor Author

veox commented Apr 21, 2018

I need to roll out a production app very soon and need a stable version of geth. (...) I would like to know if anyone recommends a specific stable version (...)?

@GeeeCoin If @reductionista has correctly identified the cause, rolling back one MINOR release (to v1.8.3) should work. I've rolled back a different server to v1.8.0 and it seems stable; but that has already fully synced previously.

@nionis
Copy link

nionis commented Apr 22, 2018

switched to v1.8.3 and the issue doesnt exist

@karalabe
Copy link
Member

@reductionista Very nice catch, thank you!

karalabe pushed a commit to reductionista/go-ethereum that referenced this issue Apr 23, 2018
@karalabe
Copy link
Member

Should be fixed on master, pushed out in 1.8.5.

@veox
Copy link
Contributor Author

veox commented Apr 23, 2018

Hey @reductionista, do you happen to have a public ENS name or ETH address?

I'd like to give a small tip.


reductionista.eth, I'd guess?..

@reductionista
Copy link
Contributor

@veox Wow, that's awefully sweet of you--yes, you can send it to reductionista.eth. Thanks!

TuitionCoin added a commit to FinTechToken/go-ethereum that referenced this issue May 8, 2018
* params, VERSION: v1.8.4 unstable

* core/vm: Fixed typos in core/vm/interpreter.go

* light: new CHT for ropsten (ethereum#16393)

* whisper: fix issue in topic list copy (ethereum#16381)

- Fixes ethereum#16271. What was appeneded was a pointer to
an object that changes during the iteration.
- The topic is allocated as a 4-byte array, fill partial topics
with 0s. Partial topics are currently disabled, but would
crash as they rely on the presence of byte number 3.

* core/state: uniform parameter style (ethereum#16398)

- Uniform code style.

* core/state: rework dirty handling to avoid quadratic overhead

* core/state: avoid linear overhead on journal dirty listing

* travis.yml: remove sudo requirement for PPA and Azure purge builders (ethereum#16404)

This is supposed to fix the FTP upload issue according to
travis-ci/travis-ci#9391.

* .gitattributes: enable solidity highlighting on github (ethereum#16425)

* crypto/secp256k1: catch curve parameter parse errors (ethereum#16392)

* core/state: avoid redundant addition to code size cache (ethereum#16427)

* cmd/geth: remove relOracle variable (ethereum#16434)

* eth: fix typos (ethereum#16414)

* accounts/abi: improve test coverage (ethereum#16044)

* README: change 'built in' to 'built-in'

* core/types: remove String methods from struct types (ethereum#16205)

Most of these methods did not contain all the relevant information
inside the object and were not using a similar formatting type.
Moreover, the existence of a suboptimal String method breaks usage
with more advanced data dumping tools like go-spew.

* Dockerfile: use non-privileged user account (ethereum#16052)

* cmd/geth: update template for 'geth bug' command (ethereum#16350)

* cmd/evm: print vm output when debug flag is on (ethereum#16326)

* bmt: fix comment typos (ethereum#16461)

* ethclient: remove empty object in newHeads subscription call (ethereum#16454)

* compression/rle: delete RLE compression (ethereum#16468)

* eth/downloader: flush state sync data before exit (ethereum#16280)

* state: handle nil in journal dirties

* core: add blockchain benchmarks

* cmd/puppeth: fix node deploys for updated dockerfile user

* Dockerfile.alltools: fix invalid command

* common: delete StringToAddress, StringToHash (ethereum#16436)

* common: delete StringToAddress, StringToHash

These functions are confusing because they don't parse hex, but use the
bytes of the string. This change removes them, replacing all uses of
StringToAddress(s) by BytesToAddress([]byte(s)).

* eth/filters: remove incorrect use of common.BytesToAddress

* build: add -e and -X flags to get more information on ethereum#16433 (ethereum#16443)

* core: remove stray account creations in state transition (ethereum#16470)

The 'from' and 'to' methods on StateTransitions are reader methods and
shouldn't have inadvertent side effects on state.

It is safe to remove the check in 'from' because account existence is
implicitly checked by the nonce and balance checks. If the account has
non-zero balance or nonce, it must exist. Even if the sender account has
nonce zero at the start of the state transition or no balance, the nonce
is incremented before execution and the account will be created at that
time.

It is safe to remove the check in 'to' because the EVM creates the
account if necessary.

Fixes ethereum#15119

* travis, appveyor: bump to Go 1.10.1

* travis.yml: add TEST_PACKAGES to speed up swarm testing (ethereum#16456)

This commit is meant to allow ecosystem projects such as ethersphere
to minimize CI build times by specifying an environment variable with
the packages to run tests on.

If the environment variable isn't defined the build script will test
all packages so this shouldn't affect the main go-ethereum repository.

* les: add ps.lock.Unlock() before return (ethereum#16360)

* core/state: fix bug in copy of copy State

* core/state: fix ripemd-cornercase in Copy

* core: txpool stable underprice drop order, perf fixes

* miner: remove contention on currentMu for pending data retrievals (ethereum#16497)

* ethdb: add leveldb write delay statistic (ethereum#16499)

* eth/downloader: wait for all fetcher goroutines to exit before terminating (ethereum#16509)

* cmd/clef, signer: initial poc of the standalone signer (ethereum#16154)

* signer: introduce external signer command

* cmd/signer, rpc: Implement new signer. Add info about remote user to Context

* signer: refactored request/response, made use of urfave.cli

* cmd/signer: Use common flags

* cmd/signer: methods to validate calldata against abi

* cmd/signer: work on abi parser

* signer: add mutex around UI

* cmd/signer: add json 4byte directory, remove passwords from api

* cmd/signer: minor changes

* cmd/signer: Use ErrRequestDenied, enable lightkdf

* cmd/signer: implement tests

* cmd/signer: made possible for UI to modify tx parameters

* cmd/signer: refactors, removed channels in ui comms, added UI-api via stdin/out

* cmd/signer: Made lowercase json-definitions, added UI-signer test functionality

* cmd/signer: update documentation

* cmd/signer: fix bugs, improve abi detection, abi argument display

* cmd/signer: minor change in json format

* cmd/signer: rework json communication

* cmd/signer: implement mixcase addresses in API, fix json id bug

* cmd/signer: rename fromaccount, update pythonpoc with new json encoding format

* cmd/signer: make use of new abi interface

* signer: documentation

* signer/main: remove redundant  option

* signer: implement audit logging

* signer: create package 'signer', minor changes

* common: add 0x-prefix to mixcaseaddress in json marshalling + validation

* signer, rules, storage: implement rules + ephemeral storage for signer rules

* signer: implement OnApprovedTx, change signing response (API BREAKAGE)

* signer: refactoring + documentation

* signer/rules: implement dispatching to next handler

* signer: docs

* signer/rules: hide json-conversion from users, ensure context is cleaned

* signer: docs

* signer: implement validation rules, change signature of call_info

* signer: fix log flaw with string pointer

* signer: implement custom 4byte databsae that saves submitted signatures

* signer/storage: implement aes-gcm-backed credential storage

* accounts: implement json unmarshalling of url

* signer: fix listresponse, fix gas->uint64

* node: make http/ipc start methods public

* signer: add ipc capability+review concerns

* accounts: correct docstring

* signer: address review concerns

* rpc: go fmt -s

* signer: review concerns+ baptize Clef

* signer,node: move Start-functions to separate file

* signer: formatting

* light: new CHTs (ethereum#16515)

* params: release Geth v1.8.4

* VERSION, params: begin v1.8.5 release cycle

* build: enable goimports and varcheck linters (ethereum#16446)

* core/asm: remove unused condition (ethereum#16487)

* cmd/utils: fix help template issue for subcommands (ethereum#16351)

* rpc: clean up IPC handler (ethereum#16524)

This avoids logging accept errors on shutdown and removes
a bit of duplication. It also fixes some goimports lint warnings.

* core/asm: accept uppercase instructions (ethereum#16531)

* all: fix various typos (ethereum#16533)

* fix typo

* fix typo

* fix typo

* rpc: handle HTTP response error codes (ethereum#16500)

* whisper/whisperv6: post returns the hash of sent message (ethereum#16495)

* ethclient: add DialContext and Close (ethereum#16318)

DialContext allows users to pass a Context object for cancellation.
Close closes the underlying RPC connection.

* vendor: update elastic/gosigar so that it compiles on OpenBSD (ethereum#16542)

* eth/downloader: fix for Issue ethereum#16539 (ethereum#16546)

* params: release Geth v1.8.5 - Dirty Derivative²

* VERSION, params: begin Geth 1.8.6 release cycle

* cmd/geth: update the copyright year in the geth command usage (ethereum#16537)

* Revert "Dockerfile.alltools: fix invalid command"

* Revert "cmd/puppeth: fix node deploys for updated dockerfile user"

* Dockerfile: revert the user change PR that broke all APIs

* Dockerfile: drop legacy discovery v5 port mappings

* params: release v1.8.6 to fix docker images

* VERSION, params: begin release cycle 1.8.7

* cmd/geth, mobile: add memsize to pprof server (ethereum#16532)

* cmd/geth, mobile: add memsize to pprof server

This is a temporary change, to be reverted before the next release.

* cmd/geth: fix variable name

* core/types: avoid duplicating transactions on changing signer (ethereum#16435)

* core/state: cache missing storage entries (ethereum#16584)

* cmd/utils: point users to --syncmode under DEPRECATED (ethereum#16572)

Indicate that --light and --fast options are replaced by --syncmode

* trie: remove unused `buf` parameter (ethereum#16583)

* core, eth: fix tracer dirty finalization

* travis.yml: remove obsolete brew-cask install

* whisper: Golint fixes in whisper packages (ethereum#16637)

* vendor: fix leveldb crash when bigger than 1 TiB

* core: ensure local transactions aren't discarded as underpriced

This fixes an issue where local transactions are discarded as
underpriced when the pool and queue are full.

* evm/main: use blocknumber from genesis

* accounts: golint updates for this or self warning (ethereum#16627)

* tests: golint fixes for tests directory (ethereum#16640)

* trie: golint iterator fixes (ethereum#16639)

* internal: golint updates for this or self warning (ethereum#16634)

* core: golint updates for this or self warning (ethereum#16633)

* build: Add ldflags -s -w when building aar

Smaller size on mobile is always good.
Might also solve our maven central upload problem

* cmd/clef: documentation about setup (ethereum#16568)

clef: documentation about setup

* params: release geth 1.8.7

* Zero Block Reward Post Byzantium

* Remove Difficulty Bomb

* 1 Second Block

* Lower Minimum Difficulty (#4)

* 1 Second Blocks (#5)

* Lower Minimum Difficulty

* One Second Blocks

* Lower minimum difficulty (#6)

* Enable geth compile.solidity for rpc (#7)
hackmod pushed a commit to OpenCommunityCoin/go-esn that referenced this issue Jul 9, 2018
mariameda pushed a commit to NiluPlatform/go-nilu that referenced this issue Aug 23, 2018
firmianavan pushed a commit to firmianavan/go-ethereum that referenced this issue Aug 28, 2018
TuitionCoin added a commit to FinTechToken/go-ethereum that referenced this issue Jun 16, 2019
* build: add -e and -X flags to get more information on ethereum#16433 (ethereum#16443)

* core: remove stray account creations in state transition (ethereum#16470)

The 'from' and 'to' methods on StateTransitions are reader methods and
shouldn't have inadvertent side effects on state.

It is safe to remove the check in 'from' because account existence is
implicitly checked by the nonce and balance checks. If the account has
non-zero balance or nonce, it must exist. Even if the sender account has
nonce zero at the start of the state transition or no balance, the nonce
is incremented before execution and the account will be created at that
time.

It is safe to remove the check in 'to' because the EVM creates the
account if necessary.

Fixes ethereum#15119

* travis, appveyor: bump to Go 1.10.1

* travis.yml: add TEST_PACKAGES to speed up swarm testing (ethereum#16456)

This commit is meant to allow ecosystem projects such as ethersphere
to minimize CI build times by specifying an environment variable with
the packages to run tests on.

If the environment variable isn't defined the build script will test
all packages so this shouldn't affect the main go-ethereum repository.

* les: add ps.lock.Unlock() before return (ethereum#16360)

* core/state: fix bug in copy of copy State

* core/state: fix ripemd-cornercase in Copy

* core: txpool stable underprice drop order, perf fixes

* miner: remove contention on currentMu for pending data retrievals (ethereum#16497)

* ethdb: add leveldb write delay statistic (ethereum#16499)

* eth/downloader: wait for all fetcher goroutines to exit before terminating (ethereum#16509)

* cmd/clef, signer: initial poc of the standalone signer (ethereum#16154)

* signer: introduce external signer command

* cmd/signer, rpc: Implement new signer. Add info about remote user to Context

* signer: refactored request/response, made use of urfave.cli

* cmd/signer: Use common flags

* cmd/signer: methods to validate calldata against abi

* cmd/signer: work on abi parser

* signer: add mutex around UI

* cmd/signer: add json 4byte directory, remove passwords from api

* cmd/signer: minor changes

* cmd/signer: Use ErrRequestDenied, enable lightkdf

* cmd/signer: implement tests

* cmd/signer: made possible for UI to modify tx parameters

* cmd/signer: refactors, removed channels in ui comms, added UI-api via stdin/out

* cmd/signer: Made lowercase json-definitions, added UI-signer test functionality

* cmd/signer: update documentation

* cmd/signer: fix bugs, improve abi detection, abi argument display

* cmd/signer: minor change in json format

* cmd/signer: rework json communication

* cmd/signer: implement mixcase addresses in API, fix json id bug

* cmd/signer: rename fromaccount, update pythonpoc with new json encoding format

* cmd/signer: make use of new abi interface

* signer: documentation

* signer/main: remove redundant  option

* signer: implement audit logging

* signer: create package 'signer', minor changes

* common: add 0x-prefix to mixcaseaddress in json marshalling + validation

* signer, rules, storage: implement rules + ephemeral storage for signer rules

* signer: implement OnApprovedTx, change signing response (API BREAKAGE)

* signer: refactoring + documentation

* signer/rules: implement dispatching to next handler

* signer: docs

* signer/rules: hide json-conversion from users, ensure context is cleaned

* signer: docs

* signer: implement validation rules, change signature of call_info

* signer: fix log flaw with string pointer

* signer: implement custom 4byte databsae that saves submitted signatures

* signer/storage: implement aes-gcm-backed credential storage

* accounts: implement json unmarshalling of url

* signer: fix listresponse, fix gas->uint64

* node: make http/ipc start methods public

* signer: add ipc capability+review concerns

* accounts: correct docstring

* signer: address review concerns

* rpc: go fmt -s

* signer: review concerns+ baptize Clef

* signer,node: move Start-functions to separate file

* signer: formatting

* light: new CHTs (ethereum#16515)

* params: release Geth v1.8.4

* VERSION, params: begin v1.8.5 release cycle

* build: enable goimports and varcheck linters (ethereum#16446)

* core/asm: remove unused condition (ethereum#16487)

* cmd/utils: fix help template issue for subcommands (ethereum#16351)

* rpc: clean up IPC handler (ethereum#16524)

This avoids logging accept errors on shutdown and removes
a bit of duplication. It also fixes some goimports lint warnings.

* core/asm: accept uppercase instructions (ethereum#16531)

* all: fix various typos (ethereum#16533)

* fix typo

* fix typo

* fix typo

* rpc: handle HTTP response error codes (ethereum#16500)

* whisper/whisperv6: post returns the hash of sent message (ethereum#16495)

* ethclient: add DialContext and Close (ethereum#16318)

DialContext allows users to pass a Context object for cancellation.
Close closes the underlying RPC connection.

* vendor: update elastic/gosigar so that it compiles on OpenBSD (ethereum#16542)

* eth/downloader: fix for Issue ethereum#16539 (ethereum#16546)

* params: release Geth v1.8.5 - Dirty Derivative²

* VERSION, params: begin Geth 1.8.6 release cycle

* cmd/geth: update the copyright year in the geth command usage (ethereum#16537)

* Revert "Dockerfile.alltools: fix invalid command"

* Revert "cmd/puppeth: fix node deploys for updated dockerfile user"

* Dockerfile: revert the user change PR that broke all APIs

* Dockerfile: drop legacy discovery v5 port mappings

* params: release v1.8.6 to fix docker images

* VERSION, params: begin release cycle 1.8.7

* cmd/geth, mobile: add memsize to pprof server (ethereum#16532)

* cmd/geth, mobile: add memsize to pprof server

This is a temporary change, to be reverted before the next release.

* cmd/geth: fix variable name

* core/types: avoid duplicating transactions on changing signer (ethereum#16435)

* core/state: cache missing storage entries (ethereum#16584)

* cmd/utils: point users to --syncmode under DEPRECATED (ethereum#16572)

Indicate that --light and --fast options are replaced by --syncmode

* trie: remove unused `buf` parameter (ethereum#16583)

* core, eth: fix tracer dirty finalization

* travis.yml: remove obsolete brew-cask install

* whisper: Golint fixes in whisper packages (ethereum#16637)

* vendor: fix leveldb crash when bigger than 1 TiB

* core: ensure local transactions aren't discarded as underpriced

This fixes an issue where local transactions are discarded as
underpriced when the pool and queue are full.

* evm/main: use blocknumber from genesis

* accounts: golint updates for this or self warning (ethereum#16627)

* tests: golint fixes for tests directory (ethereum#16640)

* trie: golint iterator fixes (ethereum#16639)

* internal: golint updates for this or self warning (ethereum#16634)

* core: golint updates for this or self warning (ethereum#16633)

* build: Add ldflags -s -w when building aar

Smaller size on mobile is always good.
Might also solve our maven central upload problem

* cmd/clef: documentation about setup (ethereum#16568)

clef: documentation about setup

* params: release geth 1.8.7

* VERSION, params: begin v1.8.8 release cycle

* log: changed if-else blocks to conform with golint (ethereum#16661)

* p2p: changed if-else blocks to conform with golint (ethereum#16660)

* les: changed if-else blocks to conform with golint (ethereum#16658)

* accounts: changed if-else blocks to conform with golint (ethereum#16654)

* rpc: golint error with context as last parameter (ethereum#16657)

* rpc/*: golint error with context as last parameter

* Update json.go

* metrics: golint updates for this or self warning (ethereum#16635)

* metrics/*: golint updates for this or self warning

* metrics/*: golint updates for this or self warning, updated pr from feedback

* consensus/ethash: fixed typo (ethereum#16665)

* event: golint updates for this or self warning (ethereum#16631)

* event/*: golint updates for this or self warning

* event/*: golint updates for this or self warning, pr updated per feedback

* eth: golint updates for this or self warning (ethereum#16632)

* eth/*:golint updates for this or self warning

* eth/*: golint updates for this or self warning, pr updated per feedback

* signer: fix golint errors (ethereum#16653)

* signer/*: golint fixes

Specifically naming and comment formatting for documentation

* signer/*: fixed naming error crashing build

* signer/*: corrected error

* signer/core: fix tiny error whitespace

* signer/rules: fix test refactor

* whisper/mailserver: pass init error to the caller (ethereum#16671)

* whisper/mailserver: pass init error to the caller

* whisper/mailserver: add returns to fmt.Errorf

* whisper/mailserver: check err in mailserver init test

* common: changed if-else blocks to conform with golint (ethereum#16656)

* mobile: add GetStatus Method for Receipt (ethereum#16598)

* core/rawdb: separate raw database access to own package (ethereum#16666)

* rlp: fix some golint warnings (ethereum#16659)

* p2p: fix some golint warnings (ethereum#16577)

* eth/filters: derive FilterCriteria from ethereum.FilterQuery (ethereum#16629)

* p2p/simulations/adapters: fix websocket log line parsing in exec adapter (ethereum#16667)

* build: specify the key to use when invoking gpg:sign-and-deploy-file (ethereum#16696)

* crypto: fix golint warnings (ethereum#16710)

* p2p: don't discard reason set by Disconnect (ethereum#16559)

Peer.run was discarding the reason for disconnection sent to the disc
channel by Disconnect.

* cmd: various golint fixes (ethereum#16700)

* cmd: various golint fixes

* cmd: update to pr change request

* cmd: update to pr change request

* eth: golint fixes to variable names (ethereum#16711)

* eth/filter: check nil pointer when unsubscribe (ethereum#16682)

* eth/filter: check nil pointer when unsubscribe

* eth/filters, accounts, rpc: abort system if subscribe failed

* eth/filter: add crit log before exit

* eth/filter, event: minor fixes

* whisper/shhclient: update call to shh_generateSymKeyFromPassword to pass a string (ethereum#16668)

* all: get rid of error when creating memory database (ethereum#16716)

* all: get rid of error when create mdb

* core: clean up variables definition

* all: inline mdb definition

* event: document select case slice use and add edge case test (ethereum#16680)

Feed keeps active subscription channels in a slice called 'f.sendCases'.
The Send method tracks the active cases in a local variable 'cases'
whose value is f.sendCases initially. 'cases' shrinks to a shorter
prefix of f.sendCases every time a send succeeds, moving the successful
case out of range of the active case list.

This can be confusing because the two slices share a backing array. Add
more comments to document what is going on. Also add a test for removing
a case that is in 'f.sentCases' but not 'cases'.

* travis: use Android NDK 16b (ethereum#16562)

* bmt: golint updates for this or self warning (ethereum#16628)

* bmt/*: golint updates for this or self warning

* Update bmt.go

* light: new CHT for mainnet and ropsten (ethereum#16736)

* params: release go-ethereum v1.8.8

* VERSION, params: start 1.8.9 release cycle

* accounts/abi: allow abi: tags when unpacking structs

Go code users can now tag event struct members with `abi:` to specify in what fields the event will be de-serialized.

See PR ethereum#16648 for details.

* travis: try to upgrade android builder to trusty

* p2p/enr: updates for discovery v4 compatibility (ethereum#16679)

This applies spec changes from ethereum/EIPs#1049 and adds support for
pluggable identity schemes.

Some care has been taken to make the "v4" scheme standalone. It uses
public APIs only and could be moved out of package enr at any time.

A couple of minor changes were needed to make identity schemes work:

- The sequence number is now updated in Set instead of when signing.
- Record is now copy-safe, i.e. calling Set on a shallow copy doesn't
  modify the record it was copied from.

* all: collate new transaction events together

* core, eth: minor txpool event cleanups

* travis, appveyor: bump Go release to 1.10.2

* core, consensus: fix some typos in comment code and output log

* eth: propagate blocks and transactions async

* trie: fixes to comply with golint (ethereum#16771)

* log: fixes for golint warnings (ethereum#16775)

* node: all golint warnings fixed (ethereum#16773)

* node: all golint warnings fixed

* node: rm per peter

* node: rm per peter

* vendor, ethdb: print warning log if leveldb is performing compaction (ethereum#16766)

* vendor: update leveldb package

* ethdb: print warning log if db is performing compaction

* ethdb: update annotation and log

* core/types: convert status type from uint to uint64 (ethereum#16784)

* trie: support proof generation from the iterator

* core/vm: fix typo in instructions.go (ethereum#16788)

* core: use a wrapped map to remove contention in `TxPool.Get`. (ethereum#16670)

* core: use a wrapped `map` and `sync.RWMutex` for `TxPool.all` to remove contention in `TxPool.Get`.

* core: Remove redundant `txLookup.Find` and improve comments on txLookup methods.

* trie: cleaner logic, one less func call

* eth, node, trie: fix minor typos (ethereum#16802)

* params: release go-ethereum v1.8.9

* VERSION, params: begin 1.8.10 release cycle

* ethereum: fix a typo in FilterQuery{} (ethereum#16827)

Fix a spelling mistake in comment

* eth/fetcher: reuse variables for hash and number (ethereum#16819)

* whisper/shhclient: update call to shh_post to expect string instead of bool (ethereum#16757)

Fixes ethereum#16756

* common: improve documentation comments (ethereum#16701)

This commit adds many comments and removes unused code.
It also removes the EmptyHash function, which had some uses
but was silly.

* core/vm: fix typo in comment

* p2p/discv5: add egress/ingress traffic metrics to discv5 udp transport (ethereum#16369)

* core: improve test for TransactionPriceNonceSort (ethereum#16413)

* trie: rename TrieSync to Sync and improve hexToKeybytes (ethereum#16804)

This removes a golint warning: type name will be used as trie.TrieSync by
other packages, and that stutters; consider calling this Sync.

In hexToKeybytes len(hex) is even and (even+1)/2 == even/2, remove the +1.

* core: fix transaction event asynchronicity

* params: release Geth 1.8.10 hotfix

* VERSION, params: begin 1.8.11 release cycle

* ethstats: fix last golint warning (ethereum#16837)

* console: squash golint warnings (ethereum#16836)

* rpc: use HTTP request context as top-level context (ethereum#16861)

* consensus/ethash: reduce keccak hash allocations (ethereum#16857)

Use Read instead of Sum to avoid internal allocations and
copying the state.

name                      old time/op  new time/op  delta
CacheGeneration-8          764ms ± 1%   579ms ± 1%  -24.22%  (p=0.000 n=20+17)
SmallDatasetGeneration-8  75.2ms ±12%  60.6ms ±10%  -19.37%  (p=0.000 n=20+20)
HashimotoLight-8          1.58ms ±11%  1.55ms ± 8%     ~     (p=0.322 n=20+19)
HashimotoFullSmall-8      4.90µs ± 1%  4.88µs ± 1%   -0.31%  (p=0.013 n=19+18)

* core, eth, trie: streaming GC for the trie cache (ethereum#16810)

* core, eth, trie: streaming GC for the trie cache

* trie: track memcache statistics

* rpc: set timeouts for http server, see ethereum#16859

* metrics: expvar support for ResettingTimer (ethereum#16878)

* metrics: expvar support for ResettingTimer

* metrics: use integers for percentiles; remove Overall

* metrics: fix edge-case panic for index-out-of-range

* cmd/geth: cap cache allowance

* core: fix typo in comment code

* les: add Skip overflow check to GetBlockHeadersMsg handler (ethereum#16891)

* eth/tracers: fix minor off-by-one error (ethereum#16879)

* tracing: fix minor off-by-one error

* tracers: go generate

* core: concurrent background transaction sender ecrecover

* miner: not call commitNewWork if it's a side block (ethereum#16751)

* cmd/abigen: support for reading solc output from stdin (ethereum#16683)

Allow the --abi flag to be given - to indicate that it should read the
ABI information from standard input. It expects to read the solc output
with the --combined-json flag providing bin, abi, userdoc, devdoc, and
metadata, and works very similarly to the internal invocation of solc,
except it allows external invocation of solc.

This facilitates integration with more complex solc invocations, such
as invocations that require path remapping or --allow-paths tweaks.

Simple usage example:

    solc --combined-json bin,abi,userdoc,devdoc,metadata *.sol | abigen --abi -

* params: fix golint warnings (ethereum#16853)

params: fix golint warnings

* vendor: added vendor packages necessary for the swarm-network-rewrite merge (ethereum#16792)

* vendor: added vendor packages necessary for the swarm-network-rewrite merge into ethereum master

* vendor: removed multihash deps

* trie: reduce hasher allocations (ethereum#16896)

* trie: reduce hasher allocations

name    old time/op    new time/op    delta
Hash-8    4.05µs ±12%    3.56µs ± 9%  -12.13%  (p=0.000 n=20+19)

name    old alloc/op   new alloc/op   delta
Hash-8    1.30kB ± 0%    0.66kB ± 0%  -49.15%  (p=0.000 n=20+20)

name    old allocs/op  new allocs/op  delta
Hash-8      11.0 ± 0%       8.0 ± 0%  -27.27%  (p=0.000 n=20+20)

* trie: bump initial buffer cap in hasher

* whisper: re-insert ethereum#16757 that has been lost during a merge (ethereum#16889)

* cmd/puppeth: fixed a typo in a wizard input query (ethereum#16910)

* core: relax type requirement for bc in ApplyTransaction (ethereum#16901)

* trie: avoid unnecessary slicing on shortnode decoding (ethereum#16917)

optimization code

* cmd/ethkey: add command to change key passphrase (ethereum#16516)

This change introduces 

    ethkey changepassphrase <keyfile>

to change the passphrase of a key file.

* metrics: return an empty snapshot for NilResettingTimer (ethereum#16930)

* light: new CHTs for mainnet and ropsten (ethereum#16926)

* ethclient: fix RPC parse error of Parity response (ethereum#16924)

The error produced when using a Parity RPC was the following:

ERROR: transaction did not get mined: failed to get tx for txid 0xbdeb094b3278019383c8da148ff1cb5b5dbd61bf8731bc2310ac1b8ed0235226: json: cannot unmarshal non-string into Go struct field txExtraInfo.blockHash of type common.Hash

* core: improve getBadBlocks to return full block rlp (ethereum#16902)

* core: improve getBadBlocks to return full block rlp

* core, eth, ethapi: changes to getBadBlocks formatting

* ethapi: address review concerns

* rpc: fix a comment typo (ethereum#16929)

* rpc: support returning nil pointer big.Ints (null)

* trie: don't report the root flushlist as an alloc

* metrics: removed repetitive calculations (ethereum#16944)

* core/rawdb: wrap db key creations (ethereum#16914)

* core/rawdb: use wrappered helper to assemble key

* core/rawdb: wrappered helper to assemble key

* core/rawdb: rewrite the wrapper, pass common.Hash

* ethdb: gracefullly handle quit channel (ethereum#16794)

* ethdb: gratefullly handle quit channel

* ethdb: minor polish

* internal/ethapi: reduce pendingTransactions to O(txs+accs) from O(txs*accs)

* les: pass server pool to protocol manager (ethereum#16947)

* metrics: fix gofmt linter warnings

* crypto: replace ToECDSAPub with error-checking func UnmarshalPubkey (ethereum#16932)

ToECDSAPub was unsafe because it returned a non-nil key with nil X, Y in
case of invalid input. This change replaces ToECDSAPub with
UnmarshalPubkey across the codebase.

* core, eth, les: more efficient hash-based header chain retrieval (ethereum#16946)

* les: fix retriever logic (ethereum#16776)

This PR fixes a retriever logic bug. When a peer had a soft timeout
and then a response arrived, it always assumed it was the same peer
even though it could have been a later requested one that did not time
out at all yet. In this case the logic went to an illegal state and
deadlocked, causing a goroutine leak.

Fixes ethereum#16243 and replaces ethereum#16359.
Thanks to @riceke for finding the bug in the logic.

* params: release go-ethereum v1.8.11

* VERSION, params: begin v1.8.12 release cycle

* core: change comment to match code more closely (ethereum#16963)

* internal/web3ext: fix method name for enabling mutex profiling (ethereum#16964)

* eth/fetcher: fix annotation (ethereum#16969)

* core/asm: correct comments typo (ethereum#16975)

core/asm/lexer: correct comments typo

* console: correct some comments typo (ethereum#16971)

console/console: correct some comments typo

*  ethereum#15685 made peer_test.go more portable by using random free port instead of hardcoded port 30303 (ethereum#15687)

Improves test portability by resolving 127.0.0.1:0
to get a random free port instead of the hard coded one. Now
the test works if you have a running node on the same
interface already.

Fixes ethereum#15685

* all: library changes for swarm-network-rewrite (ethereum#16898)

This commit adds all changes needed for the merge of swarm-network-rewrite.
The changes:

- build: increase linter timeout
- contracts/ens: export ensNode
- log: add Output method and enable fractional seconds in format
- metrics: relax test timeout
- p2p: reduced some log levels, updates to simulation packages
- rpc: increased maxClientSubscriptionBuffer to 20000

* core/vm: optimize MSTORE and SLOAD (ethereum#16939)

* vm/test: add tests+benchmarks for mstore

* core/vm: less alloc and copying for mstore

* core/vm: less allocs in sload

* vm: check for errors more correctly

* eth/filters: make filterLogs func more readable (ethereum#16920)

* cmd/utils: fix NetworkId default when -dev is set (ethereum#16833)

Prior to this change, when geth was started with `geth -dev -rpc`,
it would report a network id of `1` in response to the `net_version` RPC
request. But the actual network id it used to verify transactions
was `1337`.

This change causes geth instead respond with `1337` to the `net_version`
RPC when geth is started with `geth -dev -rpc`.

* travis, appveyor: update to Go 1.10.3

* common: all golint warnings removed (ethereum#16852)

* common: all golint warnings removed

* common: fixups

* eth: conform better to the golint standards (ethereum#16783)

* eth: made changes to conform better to the golint standards

* eth: fix comment nit

* core: reduce nesting in transaction pool code (ethereum#16980)

* bmt: fix package documentation comment (ethereum#16909)

* common/number: delete unused package (ethereum#16983)

This package was meant to hold an improved 256 bit integer library, but
the effort was abandoned in 2015. AFAIK nothing ever used this package.
Time to say goodbye.

* core/asm: correct comments typo (ethereum#16974)

* core/asm/compiler: correct comments typo

core/asm/compiler: correct comments typo

* Correct comments typo

* internal/debug: use pprof goroutine writer for debug_stacks (ethereum#16892)

* debug: Use pprof goroutine writer in debug.Stacks() to ensure all goroutines are captured.

* Up to 64MB limit, previous code only captured first 1MB of goroutines.

* internal/debug: simplify stacks handler

* fix typo

* fix pointer receiver

* accounts/keystore: assign schema as const instead of var (ethereum#16985)

* cmd: remove faucet/puppeth dead code (ethereum#16991)

* cmd/faucet: authGitHub is not used anymore

* cmd/puppeth: remove not used code

* mobile: correct comment typo in geth.go (ethereum#17021)

* accounts/usbwallet: correct comment typo (ethereum#17008)

* core: remove dead code, limit test code scope (ethereum#17006)

* core: move test util var/func to test file

* core: remove useless func

*  accounts/usbwallet: correct comment typo (ethereum#16998)

* signer: remove useless errorWrapper (ethereum#17003)

* travis: use NDK 17b for Android archives (ethereum#17029)

* tracers: fix err in 4byte, add some opcode analysis tools

* accounts: remove deadcode isSigned (ethereum#16990)

* mobile: correct comment typo in ethereum.go (ethereum#17040)

* cmd/geth: remove the tail "," from genesis config (ethereum#17028)

remove the tail "," from genesis config,  which will cause genesis config parse error .

* trie: cache collapsed tries node, not rlp blobs (ethereum#16876)

The current trie memory database/cache that we do pruning on stores
trie nodes as binary rlp encoded blobs, and also stores the node
relationships/references for GC purposes. However, most of the trie
nodes (everything apart from a value node) is in essence just a
collection of references.

This PR switches out the RLP encoded trie blobs with the
collapsed-but-not-serialized trie nodes. This permits most of the
references to be recovered from within the node data structure,
avoiding the need to track them a second time (expensive memory wise).

* swarm: network rewrite merge

* les: handle conn/disc/reg logic in the eventloop (ethereum#16981)

* les: handle conn/disc/reg logic in the eventloop

* les: try to dial before start eventloop

* les: handle disconnect logic more safely

* les: grammar fix

* log: Change time format

- Keep the tailing zeros.
- Limit precision to milliseconds.

* swarm/fuse: Disable fuse tests, they are flaky (ethereum#17072)

* swarm/pss: Hide big network tests under longrunning flag (ethereum#17074)

* whisper: Reduce message loop log from Warn to Info (ethereum#17055)

* core/vm: clear linter warnings (ethereum#17057)

* core/vm: clear linter warnings

* core/vm: review input

* core/vm.go: revert lint in noop as per request

* build: make build/goimports.sh more potable

* node: remove formatting from ResettingTimer metrics if requested in raw

* ethstats: comment minor correction (ethereum#17102)

spell correction from `repors` to `reports`

* ethdb, core: implement delete for db batch (ethereum#17101)

* vendor: update docker/docker/pkg/reexec so that it compiles on OpenBSD (ethereum#17084)

* trie: fix a temporary memory leak in the memcache

* cmd/geth: export metrics to InfluxDB (ethereum#16979)

* cmd/geth: add flags for metrics export

* cmd/geth: update usage fields for metrics flags

* metrics/influxdb: update reporter logger to adhere to geth logging convention

* node: documentation typo fix (ethereum#17113)

* core/vm: reuse bigint pools across transactions (ethereum#17070)

* core/vm: A pool for int pools

* core/vm: fix rebase issue

* core/vm: push leftover stack items after execution, not before

* cmd/p2psim: add exit error output and exit code (ethereum#17116)

* p2p/discover: move bond logic from table to transport (ethereum#17048)

* p2p/discover: move bond logic from table to transport

This commit moves node endpoint verification (bonding) from the table to
the UDP transport implementation. Previously, adding a node to the table
entailed pinging the node if needed. With this change, the ping-back
logic is embedded in the packet handler at a lower level.

It is easy to verify that the basic protocol is unchanged: we still
require a valid pong reply from the node before findnode is accepted.

The node database tracked the time of last ping sent to the node and
time of last valid pong received from the node. Node endpoints are
considered verified when a valid pong is received and the time of last
pong was called 'bond time'. The time of last ping sent was unused. In
this commit, the last ping database entry is repurposed to mean last
ping _received_. This entry is now used to track whether the node needs
to be pinged back.

The other big change is how nodes are added to the table. We used to add
nodes in Table.bond, which ran when a remote node pinged us or when we
encountered the node in a neighbors reply. The transport now adds to the
table directly after the endpoint is verified through ping. To ensure
that the Table can't be filled just by pinging the node repeatedly, we
retain the isInitDone check. During init, only nodes from neighbors
replies are added.

* p2p/discover: reduce findnode failure counter on success

* p2p/discover: remove unused parameter of loadSeedNodes

* p2p/discover: improve ping-back check and comments

* p2p/discover: add neighbors reply nodes always, not just during init

* consensus/ethash: fixed documentation typo (ethereum#17121)

"proot-of-work" to "proof-of-work"

* light: new CHTs (ethereum#17124)

* les: add announcement safety check to light fetcher (ethereum#17034)

* params: v1.8.12 stable

* 1.8.12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants