-
Notifications
You must be signed in to change notification settings - Fork 114
Cluster Design
As a cluster consists of multiple processes potentially running on multiple machines - master owning the root chain, slaves each owning one or more shard chains, it requires multiple RPCs among them to reach a consistent cluster state when adding new blocks into the cluster. The order of RPCs is also important to allow a consistent cluster state to be recovered no matter when the cluster is stopped and restarted.
Whenever a new block is received by the cluster, either actively being downloaded by the synchronizers or passively being broadcasted from peers, it will first be validated and added to a specific blockchain by one of the servers in the cluster, and then the server will forward relevant information to all the other servers in the cluster to let everyone know about this new block. Consistent cluster state is reached when all the servers in the cluster have persisted the relevant information regarding the new block. (Note that when neighbor rule comes into play the consistent state is only considering neighbor shards.)
The current implementation of pyquarkchain holds a strong assumption on consistent state. Best efforts are made to crash the cluster (through check
) if inconsistency is detected. It is better to crash onsite and leave a useful stacktrace for debugging than allowing inconsistent data leading to eventual failures. In the current cluster implementation NO attempt has been made to keep the cluster functioning in the event of component failures, e.g., a crashing server, internal RPC timeout. We simply kill all the servers in the cluster when any of them failed. Since blockchain data is naturally replicated across all the peer clusters, there is no need to worry about data loss. Nevertheless data consistency remains as a design challenge to achieve a functional cluster.
The following explains the specific ordering of events and provides code pointers reagarding some of the important cluster operations.
There are 3 ways to add shard blocks.
-
Synchronizer
(in shard.py) that download blocks starting from a given header going backward to the latest known block in local db -
Shard.handle_new_block
which handles the blocks broadcasted from peers. This path is mainly for propagating new blocks mined across the network. (CommandOp.NEW_BLOCK_MINOR
) -
SlaveServer.handle_sync_minor_block_list_request
is called by Master to download and add missing shard blocks when syncing root blocks
Ultimately they follow the same path.
Each step starts after the previous one completes (i.e. all the RPCs returned).
- The shard validates the block, updating the local shard chain, EVM state, and put the block in db. (
Shard.add_block
) - The shard broadcasts the block (header hash, [xshard tx]) (
CrossShardTransactionDeposits
) to all its neighbor shards in parallel. (SlaveServer.broadcast_xshard_tx_list
) - The shard sends the block header to root chain and the root chain will persist the header in its db. (
SlaveServer.send_minor_block_header_to_master
)
This process guarantees that if the root chain has a shard block then this exact shard block must have been known to all the shards reaching a consistent cluster state - all the processes in the cluster have this shard block. This guarantee is required for adding root block (as described below) to succeed.
Adding different shard blocks concurrently will not affect the consistent state of each other (think how blocks are stored in the db). However, special care is needed when adding the same block concurrently. The add block call should only return after the cluster has reached consistency on the block being added, i.e., completing all the three steps above. You may see in Shard.add_block
later calls of adding the same block will block on a future in Shard.add_block_futures
if a previous call has not finished.
Simpler than adding shard blocks all root blocks are added through a single entry point - Synchronizer
(in master.py) which only allows adding one root block at a time. Adding root blocks concurrently could easily lead to inconsistent view on Shard servers and thus is forbidden in the current implementation.
Each step starts after the previous one completes (i.e. all the RPCs returned).
- If the root block contains shard blocks that are unknown to the root chain, it will ask shards to download and add those missing shard blocks from peer cluster. (
SyncMinorBlockListRequest
) - The root chain validates the block, updating the local root chain, and put the block in db. (
Master.add_root_block
) - The root chain broadcast the root block to all the shards making the
CrossShardTransactionDeposit
s available for the destination shard to confirm. (AddRootBlockRequest
)
It takes the following steps to initialize a cluster. All the connections are bidirectional.
- Start master and slave servers (implemented in
cluster.py
)
Master server handles the rest in MasterServer.__init_cluster()
- Master initiates connections to all the slave servers following
ClusterConfig.SLAVE_LIST
(MasterServer.__connect_to_slaves
) - Master directs slaves to connect to each other (
MasterServer.__setup_slave_to_slave_connections
)
All the servers are connected with each other after the above steps
- Recover cluster state as described below (
MasterServer.__init_shards
)
Recovery means reconstructing the necessary in-memory data structure from db to allow the cluster to function.
- The root chain first identifies the head of the chain by looking up the
tipHash
from db. However, there is no guarantee that the cluster has reached a consistent state regarding this block. But its parent is guaranteed to be consistent in the cluster. So the parent block oftipHash
becomes the new head and recent blocks are loaded into memory following thehash_prev_block
hash pointer. (RootDb.__recover_from_db
) - The root chain broadcast the recovered new head to all the shards. (
MasterServer.__init_shards
) - Each shard finds the latest shard block confirmed by the root blocks up to the recovered new head and use it as the new shard head. Then it recovers the shard chain from the new shard head following the prev hash pointer. Only recent blocks are loaded into memory. (
ShardState.init_from_root_block
)
Shard chains can be created at certain root heights (specified by ShardGenesis.ROOT_HEIGHT
). Whenever a Slave server receives a root block from the Master it will check the config and see if the height of the root block has reached the ShardGenesis.ROOT_HEIGHT
of any shards (SlaveServer.create_shards
. If such shard is found it will be created and its genesis block will be created referring to the root block (Shard.__init_genesis_state
). If later comes another root block with the same height (due to fork) a new genesis shard block will be created referring to the new root block. Eventually, the genesis block referring to the root block on the longest root chain will win, just like any other shard blocks following the same root chain first consensus.
Official Website: https://www.quarkchain.io/
Discord Developer Community: https://discord.me/quarkchain