Skip to content

Cluster Design

qcdll edited this page Apr 5, 2019 · 8 revisions

QuarkChain Cluster

As a cluster consists of multiple processes potentially running on multiple machines - master owning the root chain, slaves each owning one or more shard chains, it requires multiple RPCs among them to reach a consistent cluster state when adding new blocks into the cluster. The order of RPCs is also important to allow a consistent cluster state to be recovered no matter when the cluster is stopped and restarted.

Whenever a new block is received by the cluster, either actively being downloaded by the synchronizers or passively being broadcasted from peers, it will first be validated and added to a specific blockchain by one of the servers in the cluster, and then the server will forward relevant information to all the other servers in the cluster to let everyone know about this new block. Consistent cluster state is reached when all the servers in the cluster have persisted the relevant information regarding the new block. (Note that when neighbor rule comes into play the consistent state is only considering neighbor shards.)

The current implementation of pyquarkchain holds a strong assumption on consistent state. Best efforts are made to crash the cluster (through check) if inconsistency is detected. It is better to crash onsite and leave a useful stacktrace for debugging than allowing inconsistent data leading to eventual failures. In the current cluster implementation NO attempt has been made to keep the cluster functioning in the event of component failures, e.g., a crashing server, internal RPC timeout. We simply kill all the servers in the cluster when any of them failed. Since blockchain data is naturally replicated across all the peer clusters, there is no need to worry about data loss. Nevertheless data consistency remains as a design challenge to achieve a functional cluster.

The following explains the specific ordering of events and provides code pointers reagarding some of the important cluster operations.

Adding Shard Blocks

There are 3 ways to add shard blocks.

  1. Synchronizer (in shard.py) that download blocks starting from a given header going backward to the latest known block in local db
  2. Shard.handle_new_block which handles the blocks broadcasted from peers. This path is mainly for propagating new blocks mined across the network. (CommandOp.NEW_BLOCK_MINOR)
  3. SlaveServer.handle_sync_minor_block_list_request is called by Master to download and add missing shard blocks when syncing root blocks

Ultimately they follow the same path.

Each step starts after the previous one completes (i.e. all the RPCs returned).

  1. The shard validates the block, updating the local shard chain, EVM state, and put the block in db. (Shard.add_block)
  2. The shard broadcasts the block (header hash, [xshard tx]) (CrossShardTransactionDeposits) to all its neighbor shards in parallel. (SlaveServer.broadcast_xshard_tx_list)
  3. The shard sends the block header to root chain and the root chain will persist the header in its db. (SlaveServer.send_minor_block_header_to_master)

This process guarantees that if the root chain has a shard block then this exact shard block must have been known to all the shards reaching a consistent cluster state - all the processes in the cluster have this shard block. This guarantee is required for adding root block (as described below) to succeed.

Note on Concurrency

Adding different shard blocks concurrently will not affect the consistent state of each other (think how blocks are stored in the db). However, special care is needed when adding the same block concurrently. The add block call should only return after the cluster has reached consistency on the block being added, i.e., completing all the three steps above. You may see in Shard.add_block later calls of adding the same block will block on a future in Shard.add_block_futures if a previous call has not finished.

Adding Root Blocks

Simpler than adding shard blocks all root blocks are added through a single entry point - Synchronizer (in master.py) which only allows adding one root block at a time. Adding root blocks concurrently could easily lead to inconsistent view on Shard servers and thus is forbidden in the current implementation.

Each step starts after the previous one completes (i.e. all the RPCs returned).

  1. If the root block contains shard blocks that are unknown to the root chain, it will ask shards to download and add those missing shard blocks from peer cluster. (SyncMinorBlockListRequest)
  2. The root chain validates the block, updating the local root chain, and put the block in db. (Master.add_root_block)
  3. The root chain broadcast the root block to all the shards making the CrossShardTransactionDeposits available for the destination shard to confirm. (AddRootBlockRequest)

Cluster Initialization

It takes the following steps to initialize a cluster. All the connections are bidirectional.

  1. Start master and slave servers (implemented in cluster.py)

Master server handles the rest in MasterServer.__init_cluster()

  1. Master initiates connections to all the slave servers following ClusterConfig.SLAVE_LIST (MasterServer.__connect_to_slaves)
  2. Master directs slaves to connect to each other (MasterServer.__setup_slave_to_slave_connections)

All the servers are connected with each other after the above steps

  1. Recover cluster state as described below (MasterServer.__init_shards)

Recovering Cluster State

Recovery means reconstructing the necessary in-memory data structure from db to allow the cluster to function.

  1. The root chain first identifies the head of the chain by looking up the tipHash from db. However, there is no guarantee that the cluster has reached a consistent state regarding this block. But its parent is guaranteed to be consistent in the cluster. So the parent block of tipHash becomes the new head and recent blocks are loaded into memory following the hash_prev_block hash pointer. (RootDb.__recover_from_db)
  2. The root chain broadcast the recovered new head to all the shards. (MasterServer.__init_shards)
  3. Each shard finds the latest shard block confirmed by the root blocks up to the recovered new head and use it as the new shard head. Then it recovers the shard chain from the new shard head following the prev hash pointer. Only recent blocks are loaded into memory. (ShardState.init_from_root_block)

Adding Shard Dynamically

Shard chains can be created at certain root heights (specified by ShardGenesis.ROOT_HEIGHT). Whenever a Slave server receives a root block from the Master it will check the config and see if the height of the root block has reached the ShardGenesis.ROOT_HEIGHT of any shards (SlaveServer.create_shards. If such shard is found it will be created and its genesis block will be created referring to the root block (Shard.__init_genesis_state). If later comes another root block with the same height (due to fork) a new genesis shard block will be created referring to the new root block. Eventually, the genesis block referring to the root block on the longest root chain will win, just like any other shard blocks following the same root chain first consensus.