Skip to content

Latest commit

 

History

History
85 lines (59 loc) · 6.07 KB

KafKaBasics.MD

File metadata and controls

85 lines (59 loc) · 6.07 KB

Kafka Cluster:

image

A Kafka cluster is a distributed architecture - However, it does not follows master - slave architecture. Instesd a Kafka cluster is managed by zookeeper.

image Example of 4 Brokers Cluster Linux Doc: image

Controller:

  • The fist broker of the cluster becomes the controller.
  • It is just a normal broker + aditional responsibilities.
  • There is only one of controller in the cluster.
  • All brokers keep an eye on controller broker in the zookeeper, when it dies, any other broker is chosen to become next controller. (This ensures there is always one controller)
  • Zookeeper maintains the list of active Brokers.
  • Controller montitors the list of active Brokers.
  • Controller is also responsible to assign the task b/w brokers.

5 Brokers | 2 Racks

image

1 Topics | 10 Partitions | 3 Replication

image

Broker selection sequence

image

Partion & Replication Distribution

image

If Rack#1 goes down, we still have data copis of every partition in Rack#2

image

Example: image

  • In the diagram above we saw Leaders and Followers. The job of the follower partitions is to copy messages from Leader and keep themselves as up-to-date. This happens as infinite loop between followers and leaders (below). When the leader fails/dies, any one follower replaces it (so keeping in sync with data is important). image

  • Leaders aslo maintains the list of underlying followers (as ISR List - In Sync Replicas), and persists in zookeeper.

  • The ISR List is dynamic. Below logic explains the dynamics:

image

'Not to Far' is defined as time taken between follower asking for data + receiving + persisting + re-asking. Default value of NTF < 10 seconds.This time is configurable. - There could be a situation where none of the followers are in sync (ISR List is empty) and what if the Leader dies. To handle this situation there is a concept of 'Committed & Uncommitted' messages. When the Leader receives a message from Producer, and after it gets copied to all followers, it is considered as 'committed'. At any time let assume none of the followers could gets themself updated and ISR List gets empty and Leader dies. In that situation all the followers will have last committed data, and as their leader has died, there was no success acknowledgement sent back to I/O threads of producer, and for those uncommitted messages Producer will retry sending the uncommitted messages. And by the time retry works and any of the follower takes up the charge of leader and gets the rest of messages from Producer & its fellow members (partitions) are now its follower, and the process continues.
  • ISR is List could be empty when all the followers are dead, then there is another concept called 'Minimum ISR' . This configuration helps to keep a mininum number of ISR/followers, because, let's assume there are no followers and Leader keeps getting data, commits it within itself and subsequently keeps sending success acknowledgement to Producer. What if the Leader dies ? The data is lost as it had no followers.

Producer

  • Producer connects with any broker to get the metadata. The metadata consists of all the leaders.
  • Kakfa uses the list to write/prodduce data on the leader.
  • The data appends on partion as below: image
FunExperimentExample: Add multiple messages with key as "abc", let say the resultant # will point to Partition2.
Add n data to a topic, and on search you will find all n data items will be landing on Partition2.

Internal working of Producer

image

  • Only Topic and Message Value are mandatory parameters of Producer Record object
  • Key:If you provide key, the hash of the key is calculated and according to the result the the partition is chosen for data to reside.When you don't provide the key the data goes in Round Robin fashion amongst the various partitions
  • Timestamp: For create time (time when msg is produced) value is 0. For Log append time (time when message is received) the value is 1.
  • Partition: You can specify manually the partion you want to send data to (not optimal approach)
  • Serializer Serializes the Key & Message Value
  • Partitioner: Kafka Default partitioner works by checking the 'Partition' value of Producer Record.
  • Buffers offer async data transfer + network optimisation

Consumers

image

image