Example of 4 Brokers Cluster Linux Doc:
- The fist broker of the cluster becomes the controller.
- It is just a normal broker + aditional responsibilities.
- There is only one of controller in the cluster.
- All brokers keep an eye on controller broker in the zookeeper, when it dies, any other broker is chosen to become next controller. (This ensures there is always one controller)
- Zookeeper maintains the list of active Brokers.
- Controller montitors the list of active Brokers.
- Controller is also responsible to assign the task b/w brokers.
-
In the diagram above we saw Leaders and Followers. The job of the follower partitions is to copy messages from Leader and keep themselves as up-to-date. This happens as infinite loop between followers and leaders (below). When the leader fails/dies, any one follower replaces it (so keeping in sync with data is important).
-
Leaders aslo maintains the list of underlying followers (as ISR List - In Sync Replicas), and persists in zookeeper.
-
The ISR List is dynamic. Below logic explains the dynamics:
- ISR is List could be empty when all the followers are dead, then there is another concept called 'Minimum ISR' . This configuration helps to keep a mininum number of ISR/followers, because, let's assume there are no followers and Leader keeps getting data, commits it within itself and subsequently keeps sending success acknowledgement to Producer. What if the Leader dies ? The data is lost as it had no followers.
- Producer connects with any broker to get the metadata. The metadata consists of all the leaders.
- Kakfa uses the list to write/prodduce data on the leader.
- The data appends on partion as below:
FunExperimentExample: Add multiple messages with key as "abc", let say the resultant # will point to Partition2.
Add n data to a topic, and on search you will find all n data items will be landing on Partition2.
- Only Topic and Message Value are mandatory parameters of Producer Record object
- Key:If you provide key, the hash of the key is calculated and according to the result the the partition is chosen for data to reside.When you don't provide the key the data goes in Round Robin fashion amongst the various partitions
- Timestamp: For create time (time when msg is produced) value is 0. For Log append time (time when message is received) the value is 1.
- Partition: You can specify manually the partion you want to send data to (not optimal approach)
- Serializer Serializes the Key & Message Value
- Partitioner: Kafka Default partitioner works by checking the 'Partition' value of Producer Record.
- Buffers offer async data transfer + network optimisation