-
Notifications
You must be signed in to change notification settings - Fork 3.7k
ship stopped writing new blocks #9483
Comments
the node went on with the head block, so it's difficult to reproduce or test a bugfix. But what it should've done is stop and not continue after there was a failure. |
Have you been able to do this successfully before? Just trying to determine if something may have changed recently that caused this. |
no, it's the first time that I see the state history data corrupted. But what is a systematic problem, is that nodeos prints a warning and goes ahead, instead of aborting if the data is corrupted. |
Just experienced it with 2.0.6 on WAX on two independent servers within a couple of days. It just stops updating state history silently. The network has been experiencing intensive forks. This behavior is quite random. I'm maintaining 4 ship servers on WAX, and only one falls out at a time. |
It looks like it's related to increased forks frequency, so #10080 might be related |
We have seen this several times on EOS recently where we get this warning with every block:
And we have to stop the node and restore from backup to fix it. This is on 2.0.9. |
in my case, logging was limited to error level, so no logs here |
Do you have a full log available. Interested in if there was a warn/error before this started warning on every block. Once in this state it really can't recover without a restore from backup. |
Would recommend you run with at least |
Looking closer, it actually happened with a restart (we do auto stop + zfs snap + start). On stop it put this in the log:
And then when it started it immediately had the missed block in state history issue. |
see "crash on exit with error "corrupted size vs. prev_size" #8450 |
Yep, I thought that one was fixed long ago but doesn't look like it or maybe a recent regression. |
whatever the cause is, nodeos should stop if it cannot keep writing the ship data. Currently this failure is difficult to detect because it keeps sending empty ship data via websicket. |
Had a WAX SHiP node start missing blocks in the SHiP logs and see the following just before it started missing blocks: fc: exeption: 8 out_of_range_exception: Out of Range write datastream of length 11327 over by -51 |
Had this error occurring again few days ago, and that was the only node where I forgot to put
Could also be related to #9972 |
there's at least one occurrence of this error with |
One more occasion, and this time it's not at head block: I was resynching a Telos ship node from genesis, and it stopped about halfway through the history (last block in blocks log: 103875765, current heads block is above 180M) |
attached, is the log from the Telos node (log level is set to error). The node started from near the genesis on Nov 12 , then stopped on Nov 14 12:06:20. later on, I tried to start it on Nov 17. |
the error is quite definitely related to how busy the storage is for ship files. Seems like the write operation times out too early. |
eosio-2.0.7, Ubuntu 18.04. The node has resynched Telos from genesis. All of a sudden, it stopped writing state history logs, without any messages related to the moment when it stopped (log level is set to error).
config.ini (
producer_api_plugin
was added several hours after this happened):After restarting nodeos, it prints the blocks range in state history, and doesn't collect any new data. The websocket is responding, but obviously not delivering any traces.
In attached log, I restarted nodeos with default logging at Sep 10 07:32:54.It started printing "missed a block in trace_history.log"
telos.nodeos.log
The text was updated successfully, but these errors were encountered: