ship stopped writing new blocks #9483

cc32d9 · 2020-09-10T08:51:28Z

eosio-2.0.7, Ubuntu 18.04. The node has resynched Telos from genesis. All of a sudden, it stopped writing state history logs, without any messages related to the moment when it stopped (log level is set to error).

config.ini (producer_api_plugin was added several hours after this happened):

chain-state-db-size-mb = 65536
reversible-blocks-db-size-mb = 2048

wasm-runtime = eos-vm-jit
eos-vm-oc-enable = true

p2p-accept-transactions = false
api-accept-transactions = false

http-server-address = 0.0.0.0:8802
p2p-listen-endpoint = 0.0.0.0:9802
p2p-server-address = :9857

access-control-allow-origin = *

#verbose-http-errors = true
contracts-console = true

max-clients = 100

plugin = eosio::chain_plugin
plugin = eosio::chain_api_plugin
plugin = eosio::db_size_api_plugin

plugin = eosio::producer_api_plugin

plugin = eosio::state_history_plugin
trace-history = true
chain-state-history = true
trace-history-debug-mode = true
state-history-endpoint = 0.0.0.0:8082

sync-fetch-span = 10

p2p-peer-address = p2p.telosarabia.net:9876
p2p-peer-address = telos.cryptolions.io:9871
p2p-peer-address = telos.eosdublin.io:9876

root@eosio:/srv/telos# ls -al data/state-history/
total 291419499
drwxr-xr-x 2 root root            6 Sep  6 13:31 .
drwxr-xr-x 6 root root            6 Sep  6 13:31 ..
-rw-r--r-- 1 root root    831008728 Sep  9 08:12 chain_state_history.index
-rw-r--r-- 1 root root 117728914750 Sep  9 08:12 chain_state_history.log
-rw-r--r-- 1 root root    831008728 Sep  9 08:12 trace_history.index
-rw-r--r-- 1 root root 178821446038 Sep  9 08:12 trace_history.log
root@eosio:/srv/telos# ls -al data/blocks/
total 47089027
drwxr-xr-x 3 root root            5 Sep  6 13:31 .
drwxr-xr-x 6 root root            6 Sep  6 13:31 ..
-rw-r--r-- 1 root root    877516224 Sep 10 07:29 blocks.index
-rw-r--r-- 1 root root 128877141011 Sep 10 07:29 blocks.log
drwxr-xr-x 2 root root            3 Sep  6 13:31 reversible

After restarting nodeos, it prints the blocks range in state history, and doesn't collect any new data. The websocket is responding, but obviously not delivering any traces.

In attached log, I restarted nodeos with default logging at Sep 10 07:32:54.It started printing "missed a block in trace_history.log"

telos.nodeos.log

The text was updated successfully, but these errors were encountered:

cc32d9 · 2020-09-10T08:59:05Z

the node went on with the head block, so it's difficult to reproduce or test a bugfix. But what it should've done is stop and not continue after there was a failure.

aclark-b1 · 2020-09-16T17:10:23Z

Have you been able to do this successfully before? Just trying to determine if something may have changed recently that caused this.

cc32d9 · 2020-09-16T17:41:49Z

no, it's the first time that I see the state history data corrupted. But what is a systematic problem, is that nodeos prints a warning and goes ahead, instead of aborting if the data is corrupted.

cc32d9 · 2021-02-22T22:26:06Z

Just experienced it with 2.0.6 on WAX on two independent servers within a couple of days. It just stops updating state history silently. The network has been experiencing intensive forks. This behavior is quite random. I'm maintaining 4 ship servers on WAX, and only one falls out at a time.

cc32d9 · 2021-02-22T22:29:25Z

It looks like it's related to increased forks frequency, so #10080 might be related

n8d · 2021-02-22T22:30:05Z

We have seen this several times on EOS recently where we get this warning with every block:

missed a block in chain_state_history.log
    {"name":"chain_state_history"}
    nodeos  state_history_log.hpp:92 write_entry

And we have to stop the node and restore from backup to fix it. This is on 2.0.9.

cc32d9 · 2021-02-22T22:40:17Z

in my case, logging was limited to error level, so no logs here

heifner · 2021-02-23T17:51:59Z

We have seen this several times on EOS recently where we get this warning with every block:

Do you have a full log available. Interested in if there was a warn/error before this started warning on every block. Once in this state it really can't recover without a restore from backup.

heifner · 2021-02-23T17:53:32Z

in my case, logging was limited to error level, so no logs here

Would recommend you run with at least warn level logging.

n8d · 2021-02-23T22:22:36Z

We have seen this several times on EOS recently where we get this warning with every block:

Do you have a full log available. Interested in if there was a warn/error before this started warning on every block. Once in this state it really can't recover without a restore from backup.

Looking closer, it actually happened with a restart (we do auto stop + zfs snap + start). On stop it put this in the log:

corrupted size vs. prev_size

And then when it started it immediately had the missed block in state history issue.

matthewdarwin · 2021-02-23T22:26:53Z

see "crash on exit with error "corrupted size vs. prev_size" #8450

n8d · 2021-02-23T22:48:33Z

see "crash on exit with error "corrupted size vs. prev_size" #8450

Yep, I thought that one was fixed long ago but doesn't look like it or maybe a recent regression.

cc32d9 · 2021-02-24T22:43:26Z

whatever the cause is, nodeos should stop if it cannot keep writing the ship data. Currently this failure is difficult to detect because it keeps sending empty ship data via websicket.

eosusa · 2021-02-25T16:34:25Z

Had a WAX SHiP node start missing blocks in the SHiP logs and see the following just before it started missing blocks:

fc: exeption: 8 out_of_range_exception: Out of Range write datastream of length 11327 over by -51
{"method:"::"write", "len":"11327","over":"-51"}
nodeos datastream.cpp:6 throw_datastream_range_error

cc32d9 · 2021-09-14T14:43:00Z

Had this error occurring again few days ago, and that was the only node where I forgot to put read-mode=head. It was isolated from speculative transactions though:

p2p-accept-transactions = false
api-accept-transactions = false

Could also be related to #9972

cc32d9 · 2021-10-21T17:03:47Z

there's at least one occurrence of this error with read-mode=head. 2.0.13 (WAX) could not write to state history and stopped.

cc32d9 · 2021-11-17T08:21:52Z

One more occasion, and this time it's not at head block: I was resynching a Telos ship node from genesis, and it stopped about halfway through the history (last block in blocks log: 103875765, current heads block is above 180M)

cc32d9 · 2021-11-17T10:55:16Z

attached, is the log from the Telos node (log level is set to error). The node started from near the genesis on Nov 12 , then stopped on Nov 14 12:06:20. later on, I tried to start it on Nov 17.
telos_ship.log

cc32d9 · 2021-12-03T11:10:33Z

the error is quite definitely related to how busy the storage is for ship files. Seems like the write operation times out too early.

cc32d9 mentioned this issue Mar 7, 2021

stop nodeos on ship error #10113

Open

sanaraufx added the track-in-jira label Apr 14, 2021

This was referenced May 3, 2021

state-history-exit-on-write-failure-2.0.x #10332

Merged

state-history-exit-on-write-failure #10339

Merged

state-history-exit-on-failure-2.1 #10340

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ship stopped writing new blocks #9483

ship stopped writing new blocks #9483

cc32d9 commented Sep 10, 2020

cc32d9 commented Sep 10, 2020

aclark-b1 commented Sep 16, 2020

cc32d9 commented Sep 16, 2020

cc32d9 commented Feb 22, 2021

cc32d9 commented Feb 22, 2021

n8d commented Feb 22, 2021 •

edited

Loading

cc32d9 commented Feb 22, 2021

heifner commented Feb 23, 2021

heifner commented Feb 23, 2021

n8d commented Feb 23, 2021 •

edited

Loading

matthewdarwin commented Feb 23, 2021

n8d commented Feb 23, 2021 •

edited

Loading

cc32d9 commented Feb 24, 2021

eosusa commented Feb 25, 2021

cc32d9 commented Sep 14, 2021 •

edited

Loading

cc32d9 commented Oct 21, 2021

cc32d9 commented Nov 17, 2021

cc32d9 commented Nov 17, 2021

cc32d9 commented Dec 3, 2021

ship stopped writing new blocks #9483

ship stopped writing new blocks #9483

Comments

cc32d9 commented Sep 10, 2020

cc32d9 commented Sep 10, 2020

aclark-b1 commented Sep 16, 2020

cc32d9 commented Sep 16, 2020

cc32d9 commented Feb 22, 2021

cc32d9 commented Feb 22, 2021

n8d commented Feb 22, 2021 • edited Loading

cc32d9 commented Feb 22, 2021

heifner commented Feb 23, 2021

heifner commented Feb 23, 2021

n8d commented Feb 23, 2021 • edited Loading

matthewdarwin commented Feb 23, 2021

n8d commented Feb 23, 2021 • edited Loading

cc32d9 commented Feb 24, 2021

eosusa commented Feb 25, 2021

cc32d9 commented Sep 14, 2021 • edited Loading

cc32d9 commented Oct 21, 2021

cc32d9 commented Nov 17, 2021

cc32d9 commented Nov 17, 2021

cc32d9 commented Dec 3, 2021

n8d commented Feb 22, 2021 •

edited

Loading

n8d commented Feb 23, 2021 •

edited

Loading

n8d commented Feb 23, 2021 •

edited

Loading

cc32d9 commented Sep 14, 2021 •

edited

Loading