Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block propagation delay under load #525

Closed
heifner opened this issue Nov 30, 2022 · 3 comments · Fixed by #649
Closed

Block propagation delay under load #525

heifner opened this issue Nov 30, 2022 · 3 comments · Fixed by #649
Assignees
Labels
actionable enhancement New feature or request 👍 lgtm OCI Work exclusive to OCI team

Comments

@heifner
Copy link
Member

heifner commented Nov 30, 2022

During start_block a deadline is set so that trx processing does not exceed the configured block production window according to produce-time-offset-us and cpu-effort-percent. Pending transactions are processed in a tight loop on the main thread until all pending are processed or the deadline is hit.

For a production block, if a block comes in during the block production window it is dropped.

For a speculative block, the incoming block is not processed until the main thread is released for processing the next task. Currently this means that the incoming block is not processed until after the completion of start_block. This can delay the incoming block processing by as much as 500ms.

Block propagation can be improved during high load by exiting speculative block start_block when a block comes into the node so that it can be processed immediately after the execution of the current trx.

@enf-ci-bot enf-ci-bot moved this to Todo in Team Backlog Nov 30, 2022
@heifner heifner added enhancement New feature or request actionable and removed triage labels Nov 30, 2022
@heifner
Copy link
Member Author

heifner commented Nov 30, 2022

I marked this as an enhancement, but I do think it should be back-ported to 3.1 & 3.2.

@heifner
Copy link
Member Author

heifner commented Nov 30, 2022

This currently prevents eosnetworkfoundation/product#127 from propagating a block as quickly as it could without this change. If 127 is implemented so that it does not need the main thread for block header validation then the block can be propagated quickly without this change.

However, even with 127 implemented so it does not need the main thread, this solution is still needed because you do want to start applying the state changes from the incoming block as soon as possible. For a speculative block you want to abort it as soon as you have a new block to apply.

@heifner heifner self-assigned this Dec 6, 2022
@heifner heifner added the OCI Work exclusive to OCI team label Dec 6, 2022
@heifner heifner moved this from Todo to In Progress in Team Backlog Dec 6, 2022
@heifner
Copy link
Member Author

heifner commented Dec 8, 2022

Example where this is needed:

Dec  7 07:45:26 xxx nodeos[676220]: info  2022-12-07T07:45:26.419 nodeos    producer_plugin.cpp:502       on_incoming_block    ] Received block 7e7eaf8bed3b3731... #217868155 @ 2022-12-07T07:45:26.500 signed by eosauthority [trxs: 32, lib: 217867829, confirmed: 0, net: 4336, cpu: 5498, elapsed: 4955, time: 5589, latency: -80 ms]
Dec  7 07:45:26 xxx nodeos[676220]: debug 2022-12-07T07:45:26.419 nodeos    producer_plugin.cpp:1685      start_block          ] Starting block #217868156 at 2022-12-07T07:45:26.419 producer eosauthority
Dec  7 07:45:26 xxx nodeos[676220]: debug 2022-12-07T07:45:26.419 nodeos    producer_plugin.cpp:1842      remove_expired_trxs  ] Processed 2 expired transactions of the 3192 transactions in the unapplied queue, Persistent expired 0, Other expired 2
Dec  7 07:45:26 xxx nodeos[676220]: debug 2022-12-07T07:45:26.420 nodeos    subjective_billing.hpp:204    remove_expired       ] Processed 37910 subjective billed transactions, Expired 41
Dec  7 07:45:26 xxx nodeos[676220]: debug 2022-12-07T07:45:26.420 nodeos    producer_plugin.cpp:2281      process_incoming_trx ] Processing 2926 pending transactions
Dec  7 07:45:26 xxx nodeos[676220]: warn  2022-12-07T07:45:26.503 net-9     net_plugin.cpp:2689           process_next_trx_mes ] ["eosn-wax-seed54:9876 - 594c2df" - 42 10.88.141.115:9876] Dropping trx, too many trx in progress 104858348 bytes
Dec  7 07:45:26 xxx nodeos[676220]: warn  2022-12-07T07:45:26.600 net-3     net_plugin.cpp:2689           process_next_trx_mes ] ["405c2c85f3b4:14999 - 6809b41" - 20 51.222.156.169:14998] Dropping trx, too many trx in progress 104857946 bytes
Dec  7 07:45:26 xxx nodeos[676220]: warn  2022-12-07T07:45:26.800 net-8     net_plugin.cpp:2689           process_next_trx_mes ] ["ec1183b43025:9876 - ee0ed8e" - 10 51.222.153.167:35777] Dropping trx, too many trx in progress 104858072 bytes
Dec  7 07:45:26 xxx nodeos[676220]: warn  2022-12-07T07:45:26.815 net-4     net_plugin.cpp:2689           process_next_trx_mes ] ["us-api01.eosams.xeos.me:9101 - 663d936" - 26 207.244.100.59:9101] Dropping trx, too many trx in progress 104857986 bytes
Dec  7 07:45:26 xxx nodeos[676220]: debug 2022-12-07T07:45:26.900 nodeos    producer_plugin.cpp:2306      process_incoming_trx ] Processed 948 pending transactions, 1978 left
Dec  7 07:45:26 xxx nodeos[676220]: debug 2022-12-07T07:45:26.900 nodeos    producer_plugin.cpp:2368      schedule_production_ ] Speculative Block Created
Dec  7 07:45:26 xxx nodeos[676220]: debug 2022-12-07T07:45:26.903 nodeos    producer_plugin.cpp:438       on_incoming_block    ] received incoming block 217868156 0cfc677c8a77f63239c21150ca47e72ad61770af45e5cf1b37c98002d1ad68b8
Dec  7 07:45:26 xxx nodeos[676220]: debug 2022-12-07T07:45:26.903 nodeos    producer_plugin.cpp:187       report               ] Block trx idle: 3457us out of 483979us, success: 6, 1481us, fail: 942, 301972us, other: 177069us

It is very likely that block 217868156 came in before 26.900, but had to wait until then because of the speculative block processing in start_block.

@heifner heifner moved this from In Progress to Awaiting Review in Team Backlog Dec 8, 2022
heifner added a commit that referenced this issue Dec 8, 2022
…ative mode. Use deadline when in production mode.
heifner added a commit that referenced this issue Dec 8, 2022
heifner added a commit that referenced this issue Dec 9, 2022
heifner added a commit that referenced this issue Dec 12, 2022
heifner added a commit that referenced this issue Dec 13, 2022
heifner added a commit that referenced this issue Dec 14, 2022
heifner added a commit that referenced this issue Dec 14, 2022
heifner added a commit that referenced this issue Dec 14, 2022
…uld restart block because speculative block may have been interrupted.
heifner added a commit that referenced this issue Dec 15, 2022
heifner added a commit that referenced this issue Dec 15, 2022
heifner added a commit that referenced this issue Dec 15, 2022
…uld restart block because speculative block may have been interrupted.
heifner added a commit that referenced this issue Dec 15, 2022
heifner added a commit that referenced this issue Dec 16, 2022
heifner added a commit that referenced this issue Jan 18, 2023
[3.1] Interrupt speculative start_block when a block is received
heifner added a commit that referenced this issue Jan 18, 2023
heifner added a commit that referenced this issue Jan 18, 2023
…merge

[3.1 -> 3.2] Interrupt speculative start_block when a block is received
heifner added a commit that referenced this issue Jan 18, 2023
heifner added a commit that referenced this issue Jan 18, 2023
…-merge

[3.2 -> main] Interrupt speculative start_block when a block is received
@github-project-automation github-project-automation bot moved this from Awaiting Review to Done in Team Backlog Jan 18, 2023
heifner added a commit that referenced this issue Oct 2, 2024
heifner added a commit that referenced this issue Oct 2, 2024
heifner added a commit that referenced this issue Oct 2, 2024
heifner added a commit that referenced this issue Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
actionable enhancement New feature or request 👍 lgtm OCI Work exclusive to OCI team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants