-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smarter block deadlines for speculative blocks #1481
Conversation
…, calculate_next_block_slot into block_timing_util
…or can produce.
plugins/producer_plugin/include/eosio/producer_plugin/block_timing_util.hpp
Show resolved
Hide resolved
What is the rationale for these two changes? |
_producer_watermarks); | ||
_pending_block_deadline = wake_time ? *wake_time : fc::time_point::maximum(); | ||
} else { | ||
_pending_block_deadline = fc::time_point::maximum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it should be:
_pending_block_deadline = chain.head_block_time() + fc::seconds(5);
To be consistent with line 1883:
if (in_speculating_mode()) {
auto head_block_age = now - chain.head_block_time();
if (head_block_age > fc::seconds(5))
return start_block_result::waiting_for_block;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point. I wrote up a paragraph why it shouldn't correspond to the other 5 seconds and in doing so convinced myself of the opposite. Thanks.
I do wonder the wisdom of the current 5 second limit on speculative block production. The idea is that you don't want to speculative execute on too old of state. However, if a block producer misses a round then this node will stop processing trxs and wait for a block. Since everyone is doing that, then all trxs are held up until a block producer produces a block and sends it along.
I wonder if the new IF provides us with a different liveliness we could use to help make better determination. If we know that a series of blocks were not produced we could safely speculative execute trxs on older state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the node is connected to multiple peers, shouldn't we assume that its state will be up-to-date, as it will receive blocks when they are produced? So I don't see the rationale to assume that our state is old vs the whole chain (again provided that we are connected to multiple peers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See updated PR description.
plugins/producer_plugin/include/eosio/producer_plugin/block_timing_util.hpp
Outdated
Show resolved
Hide resolved
plugins/producer_plugin/include/eosio/producer_plugin/block_timing_util.hpp
Outdated
Show resolved
Hide resolved
plugins/producer_plugin/include/eosio/producer_plugin/block_timing_util.hpp
Outdated
Show resolved
Hide resolved
plugins/producer_plugin/include/eosio/producer_plugin/block_timing_util.hpp
Outdated
Show resolved
Hide resolved
plugins/producer_plugin/include/eosio/producer_plugin/block_timing_util.hpp
Show resolved
Hide resolved
plugins/producer_plugin/include/eosio/producer_plugin/block_timing_util.hpp
Show resolved
Hide resolved
…spond to existing limit of 5secs on speculative execution on stale state.
Before this PR, block deadline was set for speculative blocks the same as production blocks for block production time. Because of when blocks are processed this could result in small block speculative windows. When a block came in close to deadline it would cause us to repeatedly start block with a very small deadline, abort block because of deadline, start another block and immediately abort again until the block time window was passed. |
Please update comment at line 47:
|
plugins/producer_plugin/include/eosio/producer_plugin/block_timing_util.hpp
Show resolved
Hide resolved
BOOST_CHECK_EQUAL(calculate_producer_wake_up_time(full_cpu_effort, 2, block_timestamp, producers, active_schedule, empty_watermarks), block_time); | ||
} | ||
} | ||
{ // We have all producers in active_schedule in active_schedule of 21 (plus a couple of extra producers configured), we should produce every block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor: change in active_schedule in active_schedule
to in active_schedule
Change: Start speculative blocks every block interval but not more often. Fixes a corner case of rapid start-block/abort block burst for speculative blocks. Also create speculative blocks even when state is stale (over 5 seconds old).
Before this PR, block deadline was set for speculative blocks the same as production blocks for block production time. Because of when blocks are processed this could result in small block speculative windows. When a block came in close to deadline it would cause us to repeatedly start block with a very small deadline, abort block because of deadline, start another block and immediately abort again until the block time window was passed.
In addition, speculative blocks were not started on "stale" state (head block older than 5 seconds). Note this would be the case if a block producer was down and missed a production round (6 seconds).
The conditions where the stale condition would most likely be triggered (block producer being down or node disconnected from network for some time) are situations in which it most likely does undesirable things for both read-only transactions and non-read-only transactions. It unnecessarily delays a read-only transaction (and dry-run transaction) that may not care that they are potentially getting back stale data. And if they did care, they could protect themselves by setting the expiration appropriately.
It also causes similar problems with write transactions. If the stale condition is due to a block producer being down rather than the node being disconnected from the network, with this PR changes, a successfully executed write transaction can propagate through the network and reach a live block producer that queues up the transaction for when it is its turn to produce a block. Before this PR, the write transaction is simply queued up to be speculatively executed once a new block arrives. If the expiration time of the transaction was set low, there is a good chance the transaction just fails due to expiration by that point.
But it isn't all beneficial. There is a downside with this PR to write transactions, if the stale condition is due to a temporary network disruption (say for 10 seconds). In that case, queuing the write transaction for when the network is restored makes it more likely for that transaction to get into the block (assuming using the normal
send_transaction
endpoint without retries). But with this PR, the network disruption leads to the write transaction being silently dropped despite a successful trace being returned to the client. This is because the transaction is speculatively executed and attempts to send out the transaction which would be dropped if there are no currently available connected peers.Includes refactor of
producer_watermarks
,calculate_producer_wake_up_time
,calculate_next_block_slot
intoblock_timing_util
. No change to functionality. Created new tests around the moved functions.Resolves #1432