Kafka Quick Start-RdKafka Source Analysis

Kafka Quick Start (11) - RdKafka Source Analysis ...
1. RdKafka C Source Code Analysis
2. Source Code Analysis of RdKafka C++.
3. RdKafka Multithreaded Design
Kafka Quick Start (11) - RdKafka Source Analysis

1. RdKafka C Source Code Analysis

1. Kafka OP Queue

RdKafka encapsulates Kafka Broke's interactions and internally implemented operations into an Operator structure and then puts them into the OP processing queue for unified processing.The Kafka OP queue is the pipeline for inter-thread communication.
The RdKafka queue is defined in rdkafka_In the queue.h file, queue-related operations are encapsulated in the rdsysqueue.h file.
(1) Kafka OP Queue

typedef struct rd_kafka_q_s rd_kafka_q_t; struct rd_kafka_q_s { mtx_t rkq_lock;// Queue Operation Locking cnd_t rkq_cond; // Wake up the waiting thread with a conditional variable when a new element is placed in the queue struct rd_kafka_q_s *rkq_fwdq; // Forwarded/Routed queue struct rd_kafka_op_tailq rkq_q; // Queue stored by a queued operation int rkq_qlen; /* Number of entries in queue */ int64_t rkq_qsize; /* Size of all entries in queue */ int rkq_refcnt; // Reference Count int rkq_flags; // Status of the current queue rd_kafka_t *rkq_rk;// Queue Associated Kafka Handler Object struct rd_kafka_q_io *rkq_qio; //Writing data to fd wakes up waiting threads when new elements are placed in the queue rd_kafka_q_serve_cb_t *rkq_serve; // Callback function performed when an operation in the queue is executed void *rkq_opaque; const char *rkq_name; // queue name }; // Kafka Operator Queue, External Interface typedef struct rd_kafka_queue_s rd_kafka_queue_t; struct rd_kafka_queue_s { rd_kafka_q_t *rkqu_q;// Kafka OP Queue rd_kafka_t *rkqu_rk;// Queue Associated Kafka Handler int rkqu_is_owner; }; rd_kafka_queue_t *rd_kafka_queue_new (rd_kafka_t *rk) { rd_kafka_q_t *rkq; rd_kafka_queue_t *rkqu; rkq = rd_kafka_q_new(rk); rkqu = rd_kafka_queue_new0(rk, rkq); rd_kafka_q_destroy(rkq); return rkqu; }

Create OP Queue

rd_kafka_queue_t *rd_kafka_queue_get_main (rd_kafka_t *rk) { return rd_kafka_queue_new0(rk, rk->rk_rep); }

Get the OP queue that RdKafka interacts with the application

rd_kafka_queue_t *rd_kafka_queue_get_consumer (rd_kafka_t *rk) { if (!rk->rk_cgrp) return NULL; return rd_kafka_queue_new0(rk, rk->rk_cgrp->rkcg_q); }

Get the OP Queue of Consumers

rd_kafka_queue_t *rd_kafka_queue_get_partition (rd_kafka_t *rk, const char *topic, int32_t partition) { shptr_rd_kafka_toppar_t *s_rktp; rd_kafka_toppar_t *rktp; rd_kafka_queue_t *result; if (rk->rk_type == RD_KAFKA_PRODUCER) return NULL; s_rktp = rd_kafka_toppar_get2(rk, topic, partition, 0, /* no ua_on_miss */ 1 /* create_on_miss */); if (!s_rktp) return NULL; rktp = rd_kafka_toppar_s2i(s_rktp); result = rd_kafka_queue_new0(rk, rktp->rktp_fetchq); rd_kafka_toppar_destroy(s_rktp); return result; }

Get Topic's partitioned OP queue

rd_kafka_op_t *rd_kafka_q_pop_serve (rd_kafka_q_t *rkq, rd_ts_t timeout_us, int32_t version, rd_kafka_q_cb_type_t cb_type, rd_kafka_q_serve_cb_t *callback, void *opaque);

Process an OP operation in the OP queue, press version to filter the processable OP, wait if none, and exit if timeout occurs.

int rd_kafka_q_serve (rd_kafka_q_t *rkq, int timeout_ms, int max_cnt, rd_kafka_q_cb_type_t cb_type, rd_kafka_q_serve_cb_t *callback, void *opaque);

OP for batch OP queue

int rd_kafka_q_serve_rkmessages (rd_kafka_q_t *rkq, int timeout_ms, rd_kafka_message_t **rkmessages, size_t rkmessages_size);

Processing RD_KAFKA_OP_FETCH OP Operation

int rd_kafka_q_purge0 (rd_kafka_q_t *rkq, int do_lock); #define rd_kafka_q_purge(rkq) rd_kafka_q_purge0(rkq, 1/*lock*/)

Clear all OP operations from the OP queue
rd_kafka_queue_t *rd_kafka_queue_get_background (rd_kafka_t *rk);
Get Background OP Queue for Kafka Handle

2. Kafka OP Operation

RaKafka OP operation encapsulated in rdkafka_op.h file.

typedef enum { RD_KAFKA_OP_NONE, // No type specified RD_KAFKA_OP_FETCH, // Kafka thread -> Application RD_KAFKA_OP_ERR, // Kafka thread -> Application RD_KAFKA_OP_CONSUMER_ERR, // Kafka thread -> Application RD_KAFKA_OP_DR, // Kafka thread->Application:Produce message delivery report RD_KAFKA_OP_STATS, // Kafka thread -> Application RD_KAFKA_OP_OFFSET_COMMIT, // any -> toppar's Broker thread RD_KAFKA_OP_NODE_UPDATE, // any -> Broker thread: node update RD_KAFKA_OP_XMIT_BUF, // transmit buffer: any -> broker thread RD_KAFKA_OP_RECV_BUF, // received response buffer: broker thr -> any RD_KAFKA_OP_XMIT_RETRY, // retry buffer xmit: any -> broker thread RD_KAFKA_OP_FETCH_START, // Application -> toppar's handler thread RD_KAFKA_OP_FETCH_STOP, // Application -> toppar's handler thread RD_KAFKA_OP_SEEK, // Application -> toppar's handler thread RD_KAFKA_OP_PAUSE, // Application -> toppar's handler thread RD_KAFKA_OP_OFFSET_FETCH, // Broker->broker thread: fetch offsets for topic RD_KAFKA_OP_PARTITION_JOIN, // cgrp op:add toppar to cgrp,broker op:add toppar to broker RD_KAFKA_OP_PARTITION_LEAVE, // cgrp op:remove toppar from cgrp,broker op:remove toppar from rkb RD_KAFKA_OP_REBALANCE, // broker thread -> app:group rebalance RD_KAFKA_OP_TERMINATE, // For generic use RD_KAFKA_OP_COORD_QUERY, // Query for coordinator RD_KAFKA_OP_SUBSCRIBE, // New subscription RD_KAFKA_OP_ASSIGN, // New assignment RD_KAFKA_OP_GET_SUBSCRIPTION,// Get current subscription Reuses u.subscribe RD_KAFKA_OP_GET_ASSIGNMENT, // Get current assignment Reuses u.assign RD_KAFKA_OP_THROTTLE, // Throttle info RD_KAFKA_OP_NAME, // Request name RD_KAFKA_OP_OFFSET_RESET, // Offset reset RD_KAFKA_OP_METADATA, // Metadata response RD_KAFKA_OP_LOG, // Log RD_KAFKA_OP_WAKEUP, // Wake-up signaling RD_KAFKA_OP_CREATETOPICS, // Admin: CreateTopics: u.admin_request RD_KAFKA_OP_DELETETOPICS, // Admin: DeleteTopics: u.admin_request RD_KAFKA_OP_CREATEPARTITIONS,// Admin: CreatePartitions: u.admin_request RD_KAFKA_OP_ALTERCONFIGS, // Admin: AlterConfigs: u.admin_request RD_KAFKA_OP_DESCRIBECONFIGS, // Admin: DescribeConfigs: u.admin_request RD_KAFKA_OP_ADMIN_RESULT, // Admin API .._result_t RD_KAFKA_OP_PURGE, // Purge queues RD_KAFKA_OP_CONNECT, // Connect (to broker) RD_KAFKA_OP_OAUTHBEARER_REFRESH, // Refresh OAUTHBEARER token RD_KAFKA_OP_MOCK, // Mock cluster command RD_KAFKA_OP_BROKER_MONITOR, // Broker state change RD_KAFKA_OP_TXN, // Transaction command RD_KAFKA_OP__END // Operation Ender } rd_kafka_op_type_t;

Rd_Kafka_Op_Type_The t enumeration type defines all OP operation types for RaKafka.

typedef enum { RD_KAFKA_PRIO_NORMAL = 0, // normal priority RD_KAFKA_PRIO_MEDIUM, // intermediate RD_KAFKA_PRIO_HIGH, // senior RD_KAFKA_PRIO_FLASH // Top priority: immediate } rd_kafka_prio_t;

Rd_Kafka_Prio_The t enumeration type defines all the priority of the Kafka OP operation.

typedef enum { RD_KAFKA_OP_RES_PASS, // Not handled, pass to caller RD_KAFKA_OP_RES_HANDLED, // Op was handled (through callbacks) RD_KAFKA_OP_RES_KEEP, // Op has been handled by callback function, but is prohibited by op_handle() destroyed RD_KAFKA_OP_RES_YIELD // Callback called yield } rd_kafka_op_res_t;

Rd_Kafka_Op_Res_The t enumeration type defines the return type after OP is processed.
If RD_is returnedKAFKA_OP_RES_YIELD, handler handler handler function needs to determine whether OP needs to be re-queued or destroyed.

typedef enum { RD_KAFKA_Q_CB_INVALID, // Illegal, unused RD_KAFKA_Q_CB_CALLBACK, // Trigger callback function based on OP RD_KAFKA_Q_CB_RETURN, // Return OP instead of triggering callback function RD_KAFKA_Q_CB_FORCE_RETURN, // Returns OP whether or not a callback function is triggered RD_KAFKA_Q_CB_EVENT // Return Event OP instead of triggering callback function } rd_kafka_q_cb_type_t;

Rd_Kafka_Q_Cb_Type_The t enumeration type defines all types of callback functions for OP operations in the OP queue.
The OP queue execution callback function type is defined as follows:

typedef rd_kafka_op_res_t (rd_kafka_q_serve_cb_t) (rd_kafka_t *rk, struct rd_kafka_q_s *rkq, struct rd_kafka_op_s *rko, rd_kafka_q_cb_type_t cb_type, void *opaque);

The OP callback function is defined as follows:

typedef rd_kafka_op_res_t (rd_kafka_op_cb_t) (rd_kafka_t *rk, rd_kafka_q_t *rkq, struct rd_kafka_op_s *rko);

The OP execution result data structure is defined as follows:

typedef struct rd_kafka_replyq_s { rd_kafka_q_t *q;// OP Execution Result Storage Queue int32_t version;// Edition } rd_kafka_replyq_t;

The Kafka OP data structure is defined as follows:

struct rd_kafka_op_s { TAILQ_ENTRY(rd_kafka_op_s) rko_link;// Add TAILQ field rd_kafka_op_type_t rko_type; // OP Type rd_kafka_event_type_t rko_evtype;// Event Type int rko_flags; // OP Identification int32_t rko_version;// Edition rd_kafka_resp_err_t rko_err;// int32_t rko_len; // rd_kafka_prio_t rko_prio; // OP Priority shptr_rd_kafka_toppar_t *rko_rktp;// Associate TopicPartition rd_kafka_replyq_t rko_replyq;// rd_kafka_q_serve_cb_t *rko_serve;// OP Queue Callback Function void *rko_serve_opaque;// OP Queue Callback Function Parameters rd_kafka_t *rko_rk;// Kafka Handle rd_kafka_op_cb_t *rko_op_cb; // OP Callback Function union { struct { rd_kafka_buf_t *rkbuf; rd_kafka_msg_t rkm; int evidx; } fetch; struct { rd_kafka_topic_partition_list_t *partitions; int do_free; // free .partitions on destroy() } offset_fetch; struct { rd_kafka_topic_partition_list_t *partitions; void (*cb) (rd_kafka_t *rk, rd_kafka_resp_err_t err, rd_kafka_topic_partition_list_t *offsets, void *opaque); void *opaque; int silent_empty; // Fail silently if there are no offsets to commit. rd_ts_t ts_timeout; char *reason; } offset_commit; struct { rd_kafka_topic_partition_list_t *topics; } subscribe; struct { rd_kafka_topic_partition_list_t *partitions; } assign; struct { rd_kafka_topic_partition_list_t *partitions; } rebalance; struct { char *str; } name; struct { int64_t offset; char *errstr; rd_kafka_msg_t rkm; int fatal; } err; struct { int throttle_time; int32_t nodeid; char *nodename; } throttle; struct { char *json; size_t json_len; } stats; struct { rd_kafka_buf_t *rkbuf; } xbuf; // RD_KAFKA_OP_METADATA struct { rd_kafka_metadata_t *md; int force; // force request regardless of outstanding metadata requests. } metadata; struct { shptr_rd_kafka_itopic_t *s_rkt; rd_kafka_msgq_t msgq; rd_kafka_msgq_t msgq2; int do_purge2; } dr; struct { int32_t nodeid; char nodename[RD_KAFKA_NODENAME_SIZE]; } node; struct { int64_t offset; char *reason; } offset_reset; struct { int64_t offset; struct rd_kafka_cgrp_s *rkcg; } fetch_start; // reused for SEEK struct { int pause; int flag; } pause; struct { char fac[64]; int level; char *str; } log; struct { rd_kafka_AdminOptions_t options; rd_ts_t abs_timeout; // Absolute timeout rd_kafka_timer_t tmr; // Timeout timer struct rd_kafka_enq_once_s *eonce; // Queued OP only once, used to trigger OP requests for Broker state changes rd_list_t args; // Type depends on request, e.g. rd_kafka_NewTopic_t for CreateTopics rd_kafka_buf_t *reply_buf; // Protocol reply struct rd_kafka_admin_worker_cbs *cbs; // Worker state enum { RD_KAFKA_ADMIN_STATE_INIT, RD_KAFKA_ADMIN_STATE_WAIT_BROKER, RD_KAFKA_ADMIN_STATE_WAIT_CONTROLLER, RD_KAFKA_ADMIN_STATE_CONSTRUCT_REQUEST, RD_KAFKA_ADMIN_STATE_WAIT_RESPONSE, } state; int32_t broker_id; // Requested broker id to communicate with. // Application's reply queue rd_kafka_replyq_t replyq; rd_kafka_event_type_t reply_event_type; } admin_request; struct { rd_kafka_op_type_t reqtype; // Request op type char *errstr; // error message rd_list_t results; // Type depends on request type: void *opaque; // Application's opaque as set by rd_kafka_AdminOptions_set_opaque } admin_result; struct { int flags; // purge_flags from rd_kafka_purge() } purge; // Mock cluster command struct { enum { RD_KAFKA_MOCK_CMD_TOPIC_SET_ERROR, RD_KAFKA_MOCK_CMD_TOPIC_CREATE, RD_KAFKA_MOCK_CMD_PART_SET_LEADER, RD_KAFKA_MOCK_CMD_PART_SET_FOLLOWER, RD_KAFKA_MOCK_CMD_PART_SET_FOLLOWER_WMARKS, RD_KAFKA_MOCK_CMD_BROKER_SET_UPDOWN, RD_KAFKA_MOCK_CMD_BROKER_SET_RACK, RD_KAFKA_MOCK_CMD_COORD_SET, RD_KAFKA_MOCK_CMD_APIVERSION_SET, } cmd; rd_kafka_resp_err_t err; // Error for:TOPIC_SET_ERROR char *name; // For:TOPIC_SET_ERROR,TOPIC_CREATE,PART_SET_FOLLOWER,PART_SET_FOLLOWER_WMARKS,BROKER_SET_RACK,COORD_SET (key_type) char *str; // For:COORD_SET (key) int32_t partition; // For:PART_SET_FOLLOWER,PART_SET_FOLLOWER_WMARKS,PART_SET_LEADER,APIVERSION_SET (ApiKey) int32_t broker_id; // For:PART_SET_FOLLOWER,PART_SET_LEADER,BROKER_SET_UPDOWN,BROKER_SET_RACK,COORD_SET int64_t lo; // Low offset, for:TOPIC_CREATE (part cnt),PART_SET_FOLLOWER_WMARKS,BROKER_SET_UPDOWN, APIVERSION_SET (minver); int64_t hi; // High offset, for:TOPIC_CREATE (repl fact),PART_SET_FOLLOWER_WMARKS,APIVERSION_SET (maxver) } mock; struct { struct rd_kafka_broker_s *rkb; // Broker with State Change void (*cb) (struct rd_kafka_broker_s *rkb);// Callback function to be triggered by the OP processing thread } broker_monitor; struct { rd_kafka_error_t *error; // Error Object char *group_id; // Consumer group ID to submit displacement int timeout_ms; /**< Operation timeout */ rd_ts_t abs_timeout; /**< Absolute time */ rd_kafka_topic_partition_list_t *offsets;// Displacement to commit } txn; } rko_u; }; typedef struct rd_kafka_op_s rd_kafka_event_t;

const char *rd_kafka_op2str (rd_kafka_op_type_t type);
Returns the corresponding string of OP type
void rd_kafka_op_destroy (rd_kafka_op_t *rko);
Destroy OP Object

rd_kafka_op_t *rd_kafka_op_new0 (const char *source, rd_kafka_op_type_t type); #define rd_kafka_op_new(type) rd_kafka_op_new0(NULL, type)

Generate OP Object

rd_kafka_op_res_t rd_kafka_op_call (rd_kafka_t *rk, rd_kafka_q_t *rkq, rd_kafka_op_t *rko);

Callback function to call OP

rd_kafka_op_res_t rd_kafka_op_handle_std (rd_kafka_t *rk, rd_kafka_q_t *rkq, rd_kafka_op_t *rko, int cb_type) { if (cb_type == RD_KAFKA_Q_CB_FORCE_RETURN) return RD_KAFKA_OP_RES_PASS; else if (unlikely(rd_kafka_op_is_ctrl_msg(rko))) { rd_kafka_op_offset_store(rk, rko); return RD_KAFKA_OP_RES_HANDLED; } else if (cb_type != RD_KAFKA_Q_CB_EVENT && rko->rko_type & RD_KAFKA_OP_CB) return rd_kafka_op_call(rk, rkq, rko); else if (rko->rko_type == RD_KAFKA_OP_RECV_BUF) rd_kafka_buf_handle_op(rko, rko->rko_err); else if (cb_type != RD_KAFKA_Q_CB_RETURN && rko->rko_type & RD_KAFKA_OP_REPLY && rko->rko_err == RD_KAFKA_RESP_ERR__DESTROY) return RD_KAFKA_OP_RES_HANDLED; else return RD_KAFKA_OP_RES_PASS; return RD_KAFKA_OP_RES_HANDLED; }

Standardize OP

rd_kafka_op_res_t rd_kafka_op_handle (rd_kafka_t *rk, rd_kafka_q_t *rkq, rd_kafka_op_t *rko, rd_kafka_q_cb_type_t cb_type, void *opaque, rd_kafka_q_serve_cb_t *callback) { rd_kafka_op_res_t res; if (rko->rko_serve) { callback = rko->rko_serve; opaque = rko->rko_serve_opaque; rko->rko_serve = NULL; rko->rko_serve_opaque = NULL; } res = rd_kafka_op_handle_std(rk, rkq, rko, cb_type); if (res == RD_KAFKA_OP_RES_KEEP) { return res; } if (res == RD_KAFKA_OP_RES_HANDLED) { rd_kafka_op_destroy(rko); return res; } else if (unlikely(res == RD_KAFKA_OP_RES_YIELD)) return res; if (callback) res = callback(rk, rkq, rko, cb_type, opaque); return res; }

Processing OP

3,Kafka Message

rd_kafka_message_t is defined in the rdkafka.h file:

typedef struct rd_kafka_message_s { rd_kafka_resp_err_t err; // Non-zero indicates error message rd_kafka_topic_t *rkt; // Associate Topic int32_t partition; // partition void *payload; // Message Data size_t len; // Er is 0 for message data length, non-0 for err or information length void *key; // Er = 0 for message key size_t key_len; // Er = 0 indicates the length of the message key int64_t offset; // displacement void *_private; // For Constumer, it is a private pointer to RdKafka; for Producer, it is a dr_msg_cb } rd_kafka_message_t;

The data produced by Kafka Producer will eventually be encapsulated as rd_after calling the interface at the application layerKafka_Message_T structure, which also encapsulates rd_when Consumer calls back to the application layer from Broker consumed dataKafka_Message_T structure.
rd_kafka_msg_t and rd_kafka_msgq_t Defined in rdkafka_msg.h file:

typedef struct rd_kafka_msg_s { rd_kafka_message_t rkm_rkmessage; // Kafka message, first field if necessary TAILQ_ENTRY(rd_kafka_msg_s) rkm_link;// Add TAILQ field int rkm_flags; // Message Type Identification rd_kafka_timestamp_type_t rkm_tstype; // Message Timestamp int64_t rkm_timestamp;// Timestamp for V1 message format rd_kafka_headers_t *rkm_headers; rd_kafka_msg_status_t rkm_status; // Message Persistence Status union { struct { rd_ts_t ts_timeout; // Message Timeout rd_ts_t ts_enq; // Queuing or production message timestamp rd_ts_t ts_backoff; uint64_t msgid; // Message ID for ordering, starting from 1 uint64_t last_msgid; // int retries; // retry count } producer; #define rkm_ts_timeout rkm_u.producer.ts_timeout #define rkm_ts_enq rkm_u.producer.ts_enq #define rkm_msgid rkm_u.producer.msgid struct { rd_kafkap_bytes_t binhdrs; } consumer; } rkm_u; } rd_kafka_msg_t; TAILQ_HEAD(rd_kafka_msgs_head_s, rd_kafka_msg_s); typedef struct rd_kafka_msgq_s { struct rd_kafka_msgs_head_s rkmq_msgs; /* TAILQ_HEAD */ int32_t rkmq_msg_cnt; int64_t rkmq_msg_bytes; } rd_kafka_msgq_t; Kafka Message queue static rd_kafka_msg_t *rd_kafka_message2msg (rd_kafka_message_t *rkmessage) { return (rd_kafka_msg_t *)rkmessage; }

Will rd_Kafka_Message_Convert T-type messages to rd_kafka_msg_t-type message

int rd_kafka_msg_new (rd_kafka_itopic_t *rkt, int32_t force_partition, int msgflags, char *payload, size_t len, const void *keydata, size_t keylen, void *msg_opaque);

Create a new Kafka message and queue it to the appropriate partition.
static void rd_kafka_msgq_concat (rd_kafka_msgq_t *dst,rd_kafka_msgq_t *src);
Merge all messages from the src message queue to the end of the dst message queue and the src will be emptied.
static void rd_kafka_msgq_move (rd_kafka_msgq_t *dst,rd_kafka_msgq_t *src);
Move all elements of the src message queue to the dst message queue and the src will be emptied
static void rd_kafka_msgq_purge (rd_kafka_t *rk, rd_kafka_msgq_t *rkmq);
Empty the message queue for Kafka Handle

static rd_kafka_msg_t *rd_kafka_msgq_deq (rd_kafka_msgq_t *rkmq, rd_kafka_msg_t *rkm, int do_count);

Delete rkm messages from message queue rkmq
static rd_kafka_msg_t *rd_kafka_msgq_pop (rd_kafka_msgq_t *rkmq);
Delete rkm messages from message queue rkmq

int rd_kafka_msgq_enq_sorted (const rd_kafka_itopic_t *rkt, rd_kafka_msgq_t *rkmq, rd_kafka_msg_t *rkm);

Insert rkm messages into rnkmq message queue sorted by message ID
static void rd_kafka_msgq_insert (rd_kafka_msgq_t *rkmq,rd_kafka_msg_t *rkm);
Insert rkm message into message queue rkmq header
static int rd_kafka_msgq_enq (rd_kafka_msgq_t *rkmq,rd_kafka_msg_t *rkm);
Append rkm message to rkmq message queue

int rd_kafka_msgq_age_scan (struct rd_kafka_toppar_s *rktp, rd_kafka_msgq_t *rkmq, rd_kafka_msgq_t *timedout, rd_ts_t now, rd_ts_t *abs_next_timeout);

Scan the rkmq message queue, add timeout messages to the timeout message queue, and delete them from the rkmq message queue.

int rd_kafka_msg_partitioner (rd_kafka_itopic_t *rkt, rd_kafka_msg_t *rkm, rd_dolock_t do_lock);

Partition allocation of rkm messages written to rkt topics
rd_kafka_message_t *rd_kafka_message_get (struct rd_kafka_op_s *rko);
Extract message from OP operation
rd_kafka_message_t *rd_kafka_message_new (void);
Create empty Kafka message

4,Kafka Topic

Kafka Topic-related encapsulation at rdkafka_In the topic.h file.

struct rd_kafka_itopic_s { TAILQ_ENTRY(rd_kafka_itopic_s) rkt_link; rd_refcnt_t rkt_refcnt; // Introduction Count rwlock_t rkt_lock; rd_kafkap_str_t *rkt_topic; // Topic Name shptr_rd_kafka_toppar_t *rkt_ua; // Unassigned partition shptr_rd_kafka_toppar_t **rkt_p; // Chain list with TopicPartition int32_t rkt_partition_cnt; // Partition Count rd_list_t rkt_desp; rd_ts_t rkt_ts_metadata; // Latest Meta Update Timestamp mtx_t rkt_app_lock; rd_kafka_topic_t *rkt_app_rkt; // Topic Pointer to Application Layer int rkt_app_refcnt; enum { RD_KAFKA_TOPIC_S_UNKNOWN, RD_KAFKA_TOPIC_S_EXISTS, RD_KAFKA_TOPIC_S_NOTEXISTS, } rkt_state; // Topic status int rkt_flags; // rd_kafka_t *rkt_rk; // Kafka Handle rd_avg_t rkt_avg_batchsize; rd_avg_t rkt_avg_batchcnt; shptr_rd_kafka_itopic_t *rkt_shptr_app; rd_kafka_topic_conf_t rkt_conf; // Topic Configuration };
shptr_rd_kafka_itopic_t *rd_kafka_topic_new0 (rd_kafka_t *rk, const char *topic, rd_kafka_topic_conf_t *conf, int *existing, int do_lock);

Create rd_kafka_itopic_s object
void rd_kafka_local_topics_to_list (rd_kafka_t *rk, rd_list_t *topics);
Get the current rd_Kafka_All topic names held by the t object, saved in one rd_In list
void rd_kafka_topic_scan_all (rd_kafka_t *rk, rd_ts_t now);
Scan all partitions of all topics held by Kafka Handle, filter out timeout messages for unassigned partitions, Topic to be created on Broker, Topic with too old Meta data to be updated, Topic with unknown Leader.

static int rd_kafka_topic_partition_cnt_update (rd_kafka_itopic_t *rkt, int32_t partition_cnt);

Update the number of partitions for the top, return 1 if the number of partitions changes, or 0 if not.

rd_kafka_topic_t *rd_kafka_topic_new (rd_kafka_t *rk, const char *topic, rd_kafka_topic_conf_t *conf);

Create Topic Object

static void rd_kafka_topic_assign_uas (rd_kafka_itopic_t *rkt, rd_kafka_resp_err_t err);

Allocate messages on unassigned partitions to available partitions

int rd_kafka_topic_partition_available (const rd_kafka_topic_t *app_rkt, int32_t partition);

Query if Topic's partition is available, that is, if it does not have a Leader

5,Kafka TopicPartition

rd_kafka_topic_partition_t is defined in the rdkafka.h file.

typedef struct rd_kafka_topic_partition_s { char *topic; // Topic Name int32_t partition; // partition int64_t offset; // displacement void *metadata; // metadata size_t metadata_size; void *opaque; rd_kafka_resp_err_t err; void *_private; } rd_kafka_topic_partition_t;
typedef struct rd_kafka_topic_partition_list_s { int cnt; // Current number of arguments int size; // Allocate Array Size rd_kafka_topic_partition_t *elems; // array } rd_kafka_topic_partition_list_t; struct rd_kafka_toppar_s { TAILQ_ENTRY(rd_kafka_toppar_s) rktp_rklink; TAILQ_ENTRY(rd_kafka_toppar_s) rktp_rkblink; CIRCLEQ_ENTRY(rd_kafka_toppar_s) rktp_activelink; TAILQ_ENTRY(rd_kafka_toppar_s) rktp_rktlink; TAILQ_ENTRY(rd_kafka_toppar_s) rktp_cgrplink; TAILQ_ENTRY(rd_kafka_toppar_s) rktp_txnlink; rd_kafka_itopic_t *rktp_rkt; shptr_rd_kafka_itopic_t *rktp_s_rkt; // Point to Topic Object int32_t rktp_partition; // partition int32_t rktp_leader_id; // Current Leader ID int32_t rktp_broker_id; // Current Broker ID rd_kafka_broker_t *rktp_leader; // Current Leader Broker rd_kafka_broker_t *rktp_broker; // Current preferred Broker rd_kafka_broker_t *rktp_next_broker; // Next preferred Broker rd_refcnt_t rktp_refcnt; // Reference Count mtx_t rktp_lock; rd_kafka_q_t *rktp_msgq_wakeup_q; // Wake Up Message Queue rd_kafka_msgq_t rktp_msgq; // rd_kafka_msgq_t rktp_xmit_msgq; int rktp_fetch; rd_kafka_q_t *rktp_fetchq; // Queue to cancel interest from Broker rd_kafka_q_t *rktp_ops; // Main Thread OP Queue rd_atomic32_t rktp_msgs_inflight; uint64_t rktp_msgid; // Current/Latest Message ID struct { rd_kafka_pid_t pid; uint64_t acked_msgid; uint64_t epoch_base_msgid; int32_t next_ack_seq; int32_t next_err_seq; rd_bool_t wait_drain; } rktp_eos; rd_atomic32_t rktp_version; // Latest OP Version int32_t rktp_op_version; // OP version of the current command received from Broker int32_t rktp_fetch_version; // OP version of current Fetch enum { RD_KAFKA_TOPPAR_FETCH_NONE = 0, RD_KAFKA_TOPPAR_FETCH_STOPPING, RD_KAFKA_TOPPAR_FETCH_STOPPED, RD_KAFKA_TOPPAR_FETCH_OFFSET_QUERY, RD_KAFKA_TOPPAR_FETCH_OFFSET_WAIT, RD_KAFKA_TOPPAR_FETCH_ACTIVE, } rktp_fetch_state; int32_t rktp_fetch_msg_max_bytes; rd_ts_t rktp_ts_fetch_backoff; int64_t rktp_query_offset; int64_t rktp_next_offset; int64_t rktp_last_next_offset; int64_t rktp_app_offset; // int64_t rktp_stored_offset; // Recently stored displacements, possibly not submitted int64_t rktp_committing_offset; // Displacement is currently being submitted int64_t rktp_committed_offset; // Latest Submitted Displacement rd_ts_t rktp_ts_committed_offset; // Timestamp for the latest submitted displacement struct offset_stats rktp_offsets; // struct offset_stats rktp_offsets_fin; // int64_t rktp_ls_offset; // Current Latest Stable Displacement int64_t rktp_hi_offset; // Current high water level int64_t rktp_lo_offset; // Current low water level rd_ts_t rktp_ts_offset_lag; char *rktp_offset_path; // Displacement file path FILE *rktp_offset_fp; // Displacement File Descriptor rd_kafka_cgrp_t *rktp_cgrp; int rktp_assigned; rd_kafka_replyq_t rktp_replyq; // int rktp_flags; // Partition Status shptr_rd_kafka_toppar_t *rktp_s_for_desp; // rkt_desp Chain List Pointer shptr_rd_kafka_toppar_t *rktp_s_for_cgrp; // rkcg_toppars list pointer shptr_rd_kafka_toppar_t *rktp_s_for_rkb; // rkb_toppars list pointer rd_kafka_timer_t rktp_offset_query_tmr; // Displacement Query Timer rd_kafka_timer_t rktp_offset_commit_tmr; // Displacement Submission Timer rd_kafka_timer_t rktp_offset_sync_tmr; // Displacement File Synchronization Timer rd_kafka_timer_t rktp_consumer_lag_tmr; // Consumer Lag Monitor Timer rd_interval_t rktp_lease_intvl; // Preferred copy lease rd_interval_t rktp_new_lease_intvl; // Interval to create a new Preferred copy lease rd_interval_t rktp_new_lease_log_intvl; // rd_interval_t rktp_metadata_intvl; // Maximum frequency of Meta requests for Preferred copies int rktp_wait_consumer_lag_resp; struct rd_kafka_toppar_err rktp_last_err; struct { rd_atomic64_t tx_msgs; // Number of messages sent by the producer rd_atomic64_t tx_msg_bytes; // Number of bytes sent by the producer rd_atomic64_t rx_msgs; // Number of messages received by consumers rd_atomic64_t rx_msg_bytes; // Consumer consumption bytes rd_atomic64_t producer_enq_msgs; // Number of producer paired messages rd_atomic64_t rx_ver_drops; // Number of expired messages discarded by consumers } rktp_c; };

6,Kafka Transport

The communication between RdKafka and Broker network does not need to support high concurrency, so RdKafka chooses the Poll network IO model to encapsulate the transport data transfer layer.
The connection between RdKafka and Kafka Broker is TCP, so the packages need to be unpacked according to the Kafka Message protocol: the first four bytes are the payload length; the payload part is divided into header and body parts, which receives 4 bytes, that is, the payload length, and then payload content according to the payload length.
rd_kafka_transport_s is defined in rdkafka_transport_init.h file:

struct rd_kafka_transport_s { rd_socket_t rktrans_s; // Socket fd communicating with Broker rd_kafka_broker_t *rktrans_rkb; // Connected Broker struct { void *state; int complete; struct msghdr msg; struct iovec iov[2]; char *recv_buf; int recv_of; int recv_len; } rktrans_sasl; // SASL privilege validation rd_kafka_buf_t *rktrans_recv_buf; // Receive Data Buffer rd_pollfd_t rktrans_pfd[2]; // fd:TCP Socket, Wake up fd for Poll IO model int rktrans_pfd_cnt; // size_t rktrans_rcvbuf_size; // Socket Receive Data Buffer Size size_t rktrans_sndbuf_size; // Socket send data Buffer size }; typedef struct rd_kafka_transport_s rd_kafka_transport_t; rd_kafka_transport_t *rd_kafka_transport_connect (rd_kafka_broker_t *rkb, const rd_sockaddr_inx_t *sinx, char *errstr, size_t errstr_size) { rd_kafka_transport_t *rktrans; int s = -1; int r; rkb->rkb_addr_last = sinx; s = rkb->rkb_rk->rk_conf.socket_cb(sinx->in.sin_family, SOCK_STREAM, IPPROTO_TCP, rkb->rkb_rk->rk_conf.opaque); if (s == -1) { rd_snprintf(errstr, errstr_size, "Failed to create socket: %s", rd_socket_strerror(rd_socket_errno)); return NULL; } rktrans = rd_kafka_transport_new(rkb, s, errstr, errstr_size); if (!rktrans) goto err; rd_rkb_dbg(rkb, BROKER, "CONNECT", "Connecting to %s (%s) " "with socket %i", rd_sockaddr2str(sinx, RD_SOCKADDR2STR_F_FAMILY | RD_SOCKADDR2STR_F_PORT), rd_kafka_secproto_names[rkb->rkb_proto], s); /* Connect to broker */ if (rkb->rkb_rk->rk_conf.connect_cb) { rd_kafka_broker_lock(rkb); /* for rkb_nodename */ r = rkb->rkb_rk->rk_conf.connect_cb( s, (struct sockaddr *)sinx, RD_SOCKADDR_INX_LEN(sinx), rkb->rkb_nodename, rkb->rkb_rk->rk_conf.opaque); rd_kafka_broker_unlock(rkb); } else { if (connect(s, (struct sockaddr *)sinx, RD_SOCKADDR_INX_LEN(sinx)) == RD_SOCKET_ERROR && (rd_socket_errno != EINPROGRESS )) r = rd_socket_errno; else r = 0; } if (r != 0) { rd_rkb_dbg(rkb, BROKER, "CONNECT", "couldn't connect to %s: %s (%i)", rd_sockaddr2str(sinx, RD_SOCKADDR2STR_F_PORT | RD_SOCKADDR2STR_F_FAMILY), rd_socket_strerror(r), r); rd_snprintf(errstr, errstr_size, "Failed to connect to broker at %s: %s", rd_sockaddr2str(sinx, RD_SOCKADDR2STR_F_NICE), rd_socket_strerror(r)); goto err; } /* Set up transport handle */ rktrans->rktrans_pfd[rktrans->rktrans_pfd_cnt++].fd = s; if (rkb->rkb_wakeup_fd[0] != -1) { rktrans->rktrans_pfd[rktrans->rktrans_pfd_cnt].events = POLLIN; rktrans->rktrans_pfd[rktrans->rktrans_pfd_cnt++].fd = rkb->rkb_wakeup_fd[0]; } /* Poll writability to trigger on connection success/failure. */ rd_kafka_transport_poll_set(rktrans, POLLOUT); return rktrans; err: if (s != -1) rd_kafka_transport_close0(rkb->rkb_rk, s); if (rktrans) rd_kafka_transport_close(rktrans); return NULL; }

Establish a TCP connection with Broker to initialize rd_kafka_transport_s object and returns
int rd_kafka_transport_io_serve (rd_kafka_transport_t *rktrans, int timeout_ms);
Poll and handle IO operations
void rd_kafka_transport_io_event (rd_kafka_transport_t *rktrans, int events);
Processing IO operations

ssize_t rd_kafka_transport_socket_sendmsg (rd_kafka_transport_t *rktrans, rd_slice_t *slice, char *errstr, size_t errstr_size);

Encapsulation of sendmsg method for system call

ssize_t rd_kafka_transport_send (rd_kafka_transport_t *rktrans, rd_slice_t *slice, char *errstr, size_t errstr_size);

Encapsulation of send method for system call

ssize_t rd_kafka_transport_recv (rd_kafka_transport_t *rktrans, rd_buf_t *rbuf, char *errstr, size_t errstr_size);

Encapsulation of system call recv method

rd_kafka_transport_t *rd_kafka_transport_new (rd_kafka_broker_t *rkb, rd_socket_t s, char *errstr, size_t errstr_size);

Create rd_using an existing Socket Kafka_Transport_T object
int rd_kafka_transport_poll(rd_kafka_transport_t *rktrans, int tmout);
Poll Method Encapsulation

ssize_t rd_kafka_transport_socket_recv (rd_kafka_transport_t *rktrans, rd_buf_t *buf, char *errstr, size_t errstr_size) { #ifndef _MSC_VER // Windows System Call Encapsulation return rd_kafka_transport_socket_recvmsg(rktrans, buf, errstr, errstr_size); #endif // Linux System Call Encapsulation return rd_kafka_transport_socket_recv0(rktrans, buf, errstr, errstr_size); }

7,Kafka Meta

Meta Data for the Kafka cluster includes: information for all brokers: IP and Port;
All Topic information: Topic name, number of Partitions, Ledger for each Partition, ISR, Replica collection, etc.
Each Broker of the Kafka cluster caches Meta Data for the entire cluster. When the Meta Data information of the Broker or a Topic changes, the ontroller of the Kafka cluster senses a corresponding state transition and broadcasts the changed new Meta Data information to all brokers.
RdKafka encapsulates and manipulates Meta Data, including Meta Data acquisition, periodic refresh, and referencing operations such as Partition Leader migration, changes in the number of Partitions, Broker downline and so on.
Meta Data is divided into Broker, Topic, Partition and is defined in rdkafka.h.

typedef struct rd_kafka_metadata_broker { int32_t id; // Broker ID char *host; // Broker Host Name int port; // Broker listening port } rd_kafka_metadata_broker_t; typedef struct rd_kafka_metadata_partition { int32_t id; // Partition ID rd_kafka_resp_err_t err; // Broker reported partition error int32_t leader; // Partition Leader Broker int replica_cnt; // Number of Broker s in Replica int32_t *replicas; // Copy Broker List int isr_cnt; // Number of ISR Broker s in ISR List int32_t *isrs; // ISR Broker List } rd_kafka_metadata_partition_t; /** * @brief Topic information */ typedef struct rd_kafka_metadata_topic { char *topic; // Topic Name int partition_cnt; // Number of partitions struct rd_kafka_metadata_partition *partitions; // rd_kafka_resp_err_t err; // Topic error reported by Broker } rd_kafka_metadata_topic_t; typedef struct rd_kafka_metadata { int broker_cnt; // Number of Broker s struct rd_kafka_metadata_broker *brokers; // Broker Meta int topic_cnt; // Topic Quantity struct rd_kafka_metadata_topic *topics; // Topic Meta int32_t orig_broker_id; // Broker ID char *orig_broker_name; // Broker Name } rd_kafka_metadata_t;
rd_kafka_resp_err_t rd_kafka_metadata (rd_kafka_t *rk, int all_topics, rd_kafka_topic_t *only_rkt, const struct rd_kafka_metadata **metadatap, int timeout_ms);

Request Meta Data data, block operation

struct rd_kafka_metadata * rd_kafka_metadata_copy (const struct rd_kafka_metadata *md, size_t size);

Deep Copy Meta Data

rd_kafka_resp_err_t rd_kafka_parse_Metadata (rd_kafka_broker_t *rkb, rd_kafka_buf_t *request, rd_kafka_buf_t *rkbuf, struct rd_kafka_metadata **mdp);

Processing Meta Data Request Response

size_t rd_kafka_metadata_topic_match (rd_kafka_t *rk, rd_list_t *tinfos, const rd_kafka_topic_partition_list_t *match);

Find a match ing Topic from the currently cached Meta Data and add it to tinfos

size_t rd_kafka_metadata_topic_filter (rd_kafka_t *rk, rd_list_t *tinfos, const rd_kafka_topic_partition_list_t *match);

Increase caching of all Topic s to tinfos match ing in Meta Data

rd_kafka_resp_err_t rd_kafka_metadata_refresh_topics (rd_kafka_t *rk, rd_kafka_broker_t *rkb, const rd_list_t *topics, int force, const char *reason);

Refresh all Meta Data for the specified topics

rd_kafka_resp_err_t rd_kafka_metadata_refresh_known_topics (rd_kafka_t *rk, rd_kafka_broker_t *rkb, int force, const char *reason);

Refresh Meta Data for known Topic s

rd_kafka_resp_err_t rd_kafka_metadata_refresh_brokers (rd_kafka_t *rk, rd_kafka_broker_t *rkb, const char *reason);

Refresh Meta Data from Broker

rd_kafka_resp_err_t rd_kafka_metadata_refresh_all (rd_kafka_t *rk, rd_kafka_broker_t *rkb, const char *reason);

Refresh Meta Data for all Topic s in the cluster

rd_kafka_resp_err_t rd_kafka_metadata_request (rd_kafka_t *rk, rd_kafka_broker_t *rkb, const rd_list_t *topics, const char *reason, rd_kafka_op_t *rko);

Meta Data Request
void rd_kafka_metadata_fast_leader_query (rd_kafka_t *rk);
Quickly refresh Meta Data for Partition Leader

8. Kafka Handle Object Creation

Kafka producer, consumer client object via rd_Kafka_The new function is created, rd_Kafka_The new source is as follows:

rd_kafka_t *rd_kafka_new (rd_kafka_type_t type, rd_kafka_conf_t *app_conf, char *errstr, size_t errstr_size){ ... // Create or specify conf if (!app_conf) conf = rd_kafka_conf_new(); else conf = app_conf; ... /* Call on_new() interceptors */ rd_kafka_interceptors_on_new(rk, &rk->rk_conf); ... // Create Queue rk->rk_rep = rd_kafka_q_new(rk); rk->rk_ops = rd_kafka_q_new(rk); rk->rk_ops->rkq_serve = rd_kafka_poll_cb; rk->rk_ops->rkq_opaque = rk; ... if (rk->rk_conf.dr_cb || rk->rk_conf.dr_msg_cb) rk->rk_drmode = RD_KAFKA_DR_MODE_CB; else if (rk->rk_conf.enabled_events & RD_KAFKA_EVENT_DR) rk->rk_drmode = RD_KAFKA_DR_MODE_EVENT; else rk->rk_drmode = RD_KAFKA_DR_MODE_NONE; if (rk->rk_drmode != RD_KAFKA_DR_MODE_NONE) rk->rk_conf.enabled_events |= RD_KAFKA_EVENT_DR; if (rk->rk_conf.rebalance_cb) rk->rk_conf.enabled_events |= RD_KAFKA_EVENT_REBALANCE; if (rk->rk_conf.offset_commit_cb) rk->rk_conf.enabled_events |= RD_KAFKA_EVENT_OFFSET_COMMIT; if (rk->rk_conf.error_cb) rk->rk_conf.enabled_events |= RD_KAFKA_EVENT_ERROR; rk->rk_controllerid = -1; ... if (type == RD_KAFKA_CONSUMER && RD_KAFKAP_STR_LEN(rk->rk_group_id) > 0) rk->rk_cgrp = rd_kafka_cgrp_new(rk, rk->rk_group_id, rk->rk_client_id); ... // Background thread and background event queue creation if (rk->rk_conf.background_event_cb) { /* Hold off background thread until thrd_create() is done. */ rd_kafka_wrlock(rk); rk->rk_background.q = rd_kafka_q_new(rk); rk->rk_init_wait_cnt++; if ((thrd_create(&rk->rk_background.thread, rd_kafka_background_thread_main, rk)) != thrd_success) ... } /* Create handler thread */ rk->rk_init_wait_cnt++; if ((thrd_create(&rk->rk_thread, rd_kafka_thread_main, rk)) != thrd_success) { ... } // Start Logic Broker Thread rk->rk_internal_rkb = rd_kafka_broker_add(rk, RD_KAFKA_INTERNAL, RD_KAFKA_PROTO_PLAINTEXT, "", 0, RD_KAFKA_NODEID_UA); // Add Broker to Configuration if (rk->rk_conf.brokerlist) { if (rd_kafka_brokers_add0(rk, rk->rk_conf.brokerlist) == 0) rd_kafka_op_err(rk, RD_KAFKA_RESP_ERR__ALL_BROKERS_DOWN, "No brokers configured"); } ... }

rd_kafka_new's main work is as follows;
(1) Set properties according to configuration;
(2) Create an OP queue for the Kafka Handle object;
(3) Create background threads and queues of background events;
(4) Create the RdKafka main thread to execute rd_kafka_thread_main function, main thread name isRdk:main;
(5) Create internal Broker threads;
(6) Create Broker threads based on configuration (one per Broker).

int rd_kafka_brokers_add (rd_kafka_t *rk, const char *brokerlist) { return rd_kafka_brokers_add0(rk, brokerlist); } int rd_kafka_brokers_add0 (rd_kafka_t *rk, const char *brokerlist) { ... if ((rkb = rd_kafka_broker_find(rk, proto, host, port)) && rkb->rkb_source == RD_KAFKA_CONFIGURED) { cnt++; } else if (rd_kafka_broker_add(rk, RD_KAFKA_CONFIGURED, proto, host, port, RD_KAFKA_NODEID_UA) != NULL) cnt++; ... } rd_kafka_broker_t *rd_kafka_broker_add (rd_kafka_t *rk, rd_kafka_confsource_t source, rd_kafka_secproto_t proto, const char *name, uint16_t port, int32_t nodeid) { ... thrd_create(&rkb->rkb_thread, rd_kafka_broker_thread_main, rkb); ... } static int rd_kafka_broker_thread_main (void *arg) { rd_kafka_set_thread_name("%s", rkb->rkb_name); rd_kafka_set_thread_sysname("rdk:broker%"PRId32, rkb->rkb_nodeid); ... rd_kafka_broker_serve(rkb, rd_kafka_max_block_ms); ... rd_kafka_broker_ops_serve(rkb, RD_POLL_NOWAIT); ... }

9. Producer production message process

(1)rd_kafka_produce
Rd_Kafka_The produce function is located in rdkafka_msg.c file:

int rd_kafka_produce (rd_kafka_topic_t *rkt, int32_t partition, int msgflags, void *payload, size_t len, const void *key, size_t keylen, void *msg_opaque) { return rd_kafka_msg_new(rd_kafka_topic_a2i(rkt), partition, msgflags, payload, len, key, keylen, msg_opaque); }

(2)rd_kafka_msg_new
Rd_Kafka_Msg_The new function is located in rdkafka_msg.c file:

int rd_kafka_msg_new (rd_kafka_itopic_t *rkt, int32_t force_partition, int msgflags, char *payload, size_t len, const void *key, size_t keylen, void *msg_opaque) { ... // Create rd_kafka_msg_t message rkm = rd_kafka_msg_new0(rkt, force_partition, msgflags, payload, len, key, keylen, msg_opaque, &err, &errnox, NULL, 0, rd_clock()); ... // Partition messages err = rd_kafka_msg_partitioner(rkt, rkm, 1); ... }

rd_kafka_msg_new internal through rd_kafka_msg_new0 creates a Kafka message using rd_kafka_msg_partitioner partitions Kafka messages.
(3)rd_kafka_msg_partitioner
rd_kafka_msg_partitioner function at rdkafka_msg.c file:

int rd_kafka_msg_partitioner (rd_kafka_itopic_t *rkt, rd_kafka_msg_t *rkm, rd_dolock_t do_lock) { // Get the partition number ... // Get partitions s_rktp_new = rd_kafka_toppar_get(rkt, partition, 0); ... rktp_new = rd_kafka_toppar_s2i(s_rktp_new); rd_atomic64_add(&rktp_new->rktp_c.producer_enq_msgs, 1); /* Update message partition */ if (rkm->rkm_partition == RD_KAFKA_PARTITION_UA) rkm->rkm_partition = partition; // Queue messages to a queued partition rd_kafka_toppar_enq_msg(rktp_new, rkm); ... }

rd_kafka_msg_partitioner internal through rd_kafka_toppar_enq_msg queues partitions.
(4)rd_kafka_toppar_enq_msg

void rd_kafka_toppar_enq_msg (rd_kafka_toppar_t *rktp, rd_kafka_msg_t *rkm) { ... // Queue if (rktp->rktp_partition == RD_KAFKA_PARTITION_UA || rktp->rktp_rkt->rkt_conf.queuing_strategy == RD_KAFKA_QUEUE_FIFO) { queue_len = rd_kafka_msgq_enq(&rktp->rktp_msgq, rkm); } else { queue_len = rd_kafka_msgq_enq_sorted(rktp->rktp_rkt, &rktp->rktp_msgq, rkm); } ... }

(5)rd_kafka_msgq_enq

static RD_INLINE RD_UNUSED int rd_kafka_msgq_enq (rd_kafka_msgq_t *rkmq, rd_kafka_msg_t *rkm) { TAILQ_INSERT_TAIL(&rkmq->rkmq_msgs, rkm, rkm_link); rkmq->rkmq_msg_bytes += rkm->rkm_len + rkm->rkm_key_len; return (int)++rkmq->rkmq_msg_cnt; }

(6)rd_kafka_msgq_enq_sorted
rd_kafka_msgq_enq_sorted function at rdkafka_msg.c file:

int rd_kafka_msgq_enq_sorted (const rd_kafka_itopic_t *rkt, rd_kafka_msgq_t *rkmq, rd_kafka_msg_t *rkm) { rd_dassert(rkm->rkm_u.producer.msgid != 0); return rd_kafka_msgq_enq_sorted0(rkmq, rkm, rkt->rkt_conf.msg_order_cmp); } int rd_kafka_msgq_enq_sorted0 (rd_kafka_msgq_t *rkmq, rd_kafka_msg_t *rkm, int (*order_cmp) (const void *, const void *)) { TAILQ_INSERT_SORTED(&rkmq->rkmq_msgs, rkm, rd_kafka_msg_t *, rkm_link, order_cmp); rkmq->rkmq_msg_bytes += rkm->rkm_len + rkm->rkm_key_len; return ++rkmq->rkmq_msg_cnt; }

The operation of the queue is in the rdsysqueue.h file.
Rd_Kafka_Broker_The add function is located in rdkafka_The broker.c file:

rd_kafka_broker_t *rd_kafka_broker_add (rd_kafka_t *rk, rd_kafka_confsource_t source, rd_kafka_secproto_t proto, const char *name, uint16_t port, int32_t nodeid) { rd_kafka_broker_t *rkb; rkb = rd_calloc(1, sizeof(*rkb)); // Set rd_kafka_broker_t Object Properties ... if (thrd_create(&rkb->rkb_thread, rd_kafka_broker_thread_main, rkb) != thrd_success) { ... } }

rd_kafka_broker_add Creates a Broker thread and starts execution rd_kafka_broker_thread_main function.

static int rd_kafka_broker_thread_main (void *arg) { ... rd_kafka_set_thread_name("%s", rkb->rkb_name); rd_kafka_set_thread_sysname("rdk:broker%"PRId32, rkb->rkb_nodeid); ... rd_kafka_broker_serve(rkb, ...); ... } static void rd_kafka_broker_serve (rd_kafka_broker_t *rkb, int timeout_ms) { ... if (rkb->rkb_source == RD_KAFKA_INTERNAL) rd_kafka_broker_internal_serve(rkb, abs_timeout); else if (rkb->rkb_rk->rk_type == RD_KAFKA_PRODUCER) rd_kafka_broker_producer_serve(rkb, abs_timeout); else if (rkb->rkb_rk->rk_type == RD_KAFKA_CONSUMER) rd_kafka_broker_consumer_serve(rkb, abs_timeout); } static void rd_kafka_broker_producer_serve (rd_kafka_broker_t *rkb, rd_ts_t abs_timeout) { // rd_kafka_broker_produce_toppars(rkb, now, &next_wakeup, do_timeout_scan); rd_kafka_broker_ops_io_serve(rkb, next_wakeup); } static void rd_kafka_broker_ops_io_serve (rd_kafka_broker_t *rkb, rd_ts_t abs_timeout) { ... rd_kafka_broker_ops_serve(rkb, rd_timeout_remains_us(abs_timeout)); ... } static int rd_kafka_broker_ops_serve (rd_kafka_broker_t *rkb, rd_ts_t timeout_us) { rd_kafka_op_t *rko; int cnt = 0; while ((rko = rd_kafka_q_pop(rkb->rkb_ops, timeout_us, 0)) && (cnt++, rd_kafka_broker_op_serve(rkb, rko))) timeout_us = RD_POLL_NOWAIT; return cnt; }

Rdkafka_The broker.c file:

static ssize_t rd_kafka_broker_send (rd_kafka_broker_t *rkb, rd_slice_t *slice) { ... r = rd_kafka_transport_send(rkb->rkb_transport, slice, errstr, sizeof(errstr)); ... }

Rdkafka_File transport.c:

ssize_t rd_kafka_transport_send (rd_kafka_transport_t *rktrans, rd_slice_t *slice, char *errstr, size_t errstr_size) { .. r = rd_kafka_transport_socket_send(rktrans, slice, errstr, errstr_size); ... } static ssize_t rd_kafka_transport_socket_send (rd_kafka_transport_t *rktrans, rd_slice_t *slice, char *errstr, size_t errstr_size) { #ifndef _MSC_VER /* FIXME: Use sendmsg() with iovecs if there's more than one segment * remaining, otherwise (or if platform does not have sendmsg) * use plain send(). */ return rd_kafka_transport_socket_sendmsg(rktrans, slice, errstr, errstr_size); #endif return rd_kafka_transport_socket_send0(rktrans, slice, errstr, errstr_size); } static ssize_t rd_kafka_transport_socket_sendmsg (rd_kafka_transport_t *rktrans, rd_slice_t *slice, char *errstr, size_t errstr_size) { ... r = sendmsg(rktrans->rktrans_s, &msg, MSG_DONTWAIT ... }

10. Consumer consumption message process

(1) Opening Message Consumption
RdKafka provides rd_kafka_consume_start, rd_kafka_consume, rd_kafka_consume_start_queue, rd_kafka_consume_The queue interface is used for message consumption.

int rd_kafka_consume_start0 (rd_kafka_itopic_t *rkt, int32_t partition, int64_t offset, rd_kafka_q_t *rkq) { shptr_rd_kafka_toppar_t *s_rktp; if (partition < 0) { rd_kafka_set_last_error(RD_KAFKA_RESP_ERR__UNKNOWN_PARTITION, ESRCH); return -1; } if (!rd_kafka_simple_consumer_add(rkt->rkt_rk)) { rd_kafka_set_last_error(RD_KAFKA_RESP_ERR__INVALID_ARG, EINVAL); return -1; } rd_kafka_topic_wrlock(rkt); s_rktp = rd_kafka_toppar_desired_add(rkt, partition); rd_kafka_topic_wrunlock(rkt); /* Verify offset */ if (offset == RD_KAFKA_OFFSET_BEGINNING || offset == RD_KAFKA_OFFSET_END || offset <= RD_KAFKA_OFFSET_TAIL_BASE) { /* logical offsets */ } else if (offset == RD_KAFKA_OFFSET_STORED) { /* offset manager */ if (rkt->rkt_conf.offset_store_method == RD_KAFKA_OFFSET_METHOD_BROKER && RD_KAFKAP_STR_IS_NULL(rkt->rkt_rk->rk_group_id)) { /* Broker based offsets require a group id. */ rd_kafka_toppar_destroy(s_rktp); rd_kafka_set_last_error(RD_KAFKA_RESP_ERR__INVALID_ARG, EINVAL); return -1; } } else if (offset < 0) { rd_kafka_toppar_destroy(s_rktp); rd_kafka_set_last_error(RD_KAFKA_RESP_ERR__INVALID_ARG, EINVAL); return -1; } rd_kafka_toppar_op_fetch_start(rd_kafka_toppar_s2i(s_rktp), offset, rkq, RD_KAFKA_NO_REPLYQ); rd_kafka_toppar_destroy(s_rktp); rd_kafka_set_last_error(0, 0); return 0; } int rd_kafka_consume_start (rd_kafka_topic_t *app_rkt, int32_t partition, int64_t offset) { rd_kafka_itopic_t *rkt = rd_kafka_topic_a2i(app_rkt); rd_kafka_dbg(rkt->rkt_rk, TOPIC, "START", "Start consuming partition %"PRId32,partition); return rd_kafka_consume_start0(rkt, partition, offset, NULL); } int rd_kafka_consume_start_queue (rd_kafka_topic_t *app_rkt, int32_t partition, int64_t offset, rd_kafka_queue_t *rkqu) { rd_kafka_itopic_t *rkt = rd_kafka_topic_a2i(app_rkt); return rd_kafka_consume_start0(rkt, partition, offset, rkqu->rkqu_q); } static rd_kafka_message_t *rd_kafka_consume0 (rd_kafka_t *rk, rd_kafka_q_t *rkq, int timeout_ms) { rd_kafka_op_t *rko; rd_kafka_message_t *rkmessage = NULL; rd_ts_t abs_timeout = rd_timeout_init(timeout_ms); if (timeout_ms) rd_kafka_app_poll_blocking(rk); rd_kafka_yield_thread = 0; while ((rko = rd_kafka_q_pop(rkq, rd_timeout_remains_us(abs_timeout), 0))) { rd_kafka_op_res_t res; res = rd_kafka_poll_cb(rk, rkq, rko, RD_KAFKA_Q_CB_RETURN, NULL); if (res == RD_KAFKA_OP_RES_PASS) break; if (unlikely(res == RD_KAFKA_OP_RES_YIELD || rd_kafka_yield_thread)) { /* Callback called rd_kafka_yield(), we must * stop dispatching the queue and return. */ rd_kafka_set_last_error(RD_KAFKA_RESP_ERR__INTR, EINTR); rd_kafka_app_polled(rk); return NULL; } /* Message was handled by callback. */ continue; } if (!rko) { /* Timeout reached with no op returned. */ rd_kafka_set_last_error(RD_KAFKA_RESP_ERR__TIMED_OUT, ETIMEDOUT); rd_kafka_app_polled(rk); return NULL; } rd_kafka_assert(rk, rko->rko_type == RD_KAFKA_OP_FETCH || rko->rko_type == RD_KAFKA_OP_CONSUMER_ERR); /* Get rkmessage from rko */ rkmessage = rd_kafka_message_get(rko); /* Store offset */ rd_kafka_op_offset_store(rk, rko); rd_kafka_set_last_error(0, 0); rd_kafka_app_polled(rk); return rkmessage; } rd_kafka_message_t *rd_kafka_consume (rd_kafka_topic_t *app_rkt, int32_t partition, int timeout_ms) { rd_kafka_itopic_t *rkt = rd_kafka_topic_a2i(app_rkt); shptr_rd_kafka_toppar_t *s_rktp; rd_kafka_toppar_t *rktp; rd_kafka_message_t *rkmessage; rd_kafka_topic_rdlock(rkt); s_rktp = rd_kafka_toppar_get(rkt, partition, 0/*no ua on miss*/); if (unlikely(!s_rktp)) s_rktp = rd_kafka_toppar_desired_get(rkt, partition); rd_kafka_topic_rdunlock(rkt); if (unlikely(!s_rktp)) { /* No such toppar known */ rd_kafka_set_last_error(RD_KAFKA_RESP_ERR__UNKNOWN_PARTITION, ESRCH); return NULL; } rktp = rd_kafka_toppar_s2i(s_rktp); rkmessage = rd_kafka_consume0(rkt->rkt_rk, rktp->rktp_fetchq, timeout_ms); rd_kafka_toppar_destroy(s_rktp); /* refcnt from .._get() */ return rkmessage; } rd_kafka_message_t *rd_kafka_consume_queue (rd_kafka_queue_t *rkqu, int timeout_ms) { return rd_kafka_consume0(rkqu->rkqu_rk, rkqu->rkqu_q, timeout_ms); }

(2) Poll Polling Message Queue

int rd_kafka_poll (rd_kafka_t *rk, int timeout_ms) { int r; if (timeout_ms) rd_kafka_app_poll_blocking(rk); r = rd_kafka_q_serve(rk->rk_rep, timeout_ms, 0, RD_KAFKA_Q_CB_CALLBACK, rd_kafka_poll_cb, NULL); rd_kafka_app_polled(rk); return r; } rd_kafka_message_t *rd_kafka_consumer_poll (rd_kafka_t *rk, int timeout_ms) { rd_kafka_cgrp_t *rkcg; if (unlikely(!(rkcg = rd_kafka_cgrp_get(rk)))) { rd_kafka_message_t *rkmessage = rd_kafka_message_new(); rkmessage->err = RD_KAFKA_RESP_ERR__UNKNOWN_GROUP; return rkmessage; } return rd_kafka_consume0(rk, rkcg->rkcg_q, timeout_ms); } rd_kafka_event_t *rd_kafka_queue_poll (rd_kafka_queue_t *rkqu, int timeout_ms) { rd_kafka_op_t *rko; if (timeout_ms) rd_kafka_app_poll_blocking(rkqu->rkqu_rk); rko = rd_kafka_q_pop_serve(rkqu->rkqu_q, rd_timeout_us(timeout_ms), 0, RD_KAFKA_Q_CB_EVENT, rd_kafka_poll_cb, NULL); rd_kafka_app_polled(rkqu->rkqu_rk); if (!rko) return NULL; return rko; }

2. Source Code Analysis of RdKafka C++.

1. Packaging of C API by C++ API

C++ API is mainly for RdKafka CAPI encapsulation, encapsulated into different functional classes according to different functional modules, classes are defined in rdkafkacpp.h file and qualified with RdKafka namespace, the main classes are Conf, Handle, TopicPartition, Topic, Message, Queue, KafkaConsumer, Consumer, Producer, BrokerMetadata, PartitionMetadata, TopicMetadata, Metadata, DeliveryReportCb, PartitionerCb, PartitionerKeyPointerCb, EventCb, Event, ConsumeCb:Consume, Rebalance Cb, OffsetCommitCb, SocketCb, OpenCb.

2. Consumer and KafkaConsumer

Consumer has full control over partition and offset; KafkaConsumer provides Topic subscription interface, which uses latest consumption by default and can specify partition and offset to start consumption by assign method.

3. Producer production message process

(1) Producer creation

RdKafka::Producer *RdKafka::Producer::create (RdKafka::Conf *conf, std::string &errstr) { char errbuf[512]; RdKafka::ConfImpl *confimpl = dynamic_cast<RdKafka::ConfImpl *>(conf); RdKafka::ProducerImpl *rkp = new RdKafka::ProducerImpl(); rd_kafka_conf_t *rk_conf = NULL; if (confimpl) { if (!confimpl->rk_conf_) { errstr = "Requires RdKafka::Conf::CONF_GLOBAL object"; delete rkp; return NULL; } rkp->set_common_config(confimpl); rk_conf = rd_kafka_conf_dup(confimpl->rk_conf_); if (confimpl->dr_cb_) { rd_kafka_conf_set_dr_msg_cb(rk_conf, dr_msg_cb_trampoline); rkp->dr_cb_ = confimpl->dr_cb_; } } rd_kafka_t *rk; if (!(rk = rd_kafka_new(RD_KAFKA_PRODUCER, rk_conf, errbuf, sizeof(errbuf)))) { errstr = errbuf; // rd_kafka_new() takes ownership only if succeeds if (rk_conf) rd_kafka_conf_destroy(rk_conf); delete rkp; return NULL; } rkp->rk_ = rk; return rkp; }

Conf objects need to be prepared when creating a Producer.
(2) Production messages

RdKafka::ErrorCode RdKafka::ProducerImpl::produce (RdKafka::Topic *topic, int32_t partition, int msgflags, void *payload, size_t len, const std::string *key, void *msg_opaque) { RdKafka::TopicImpl *topicimpl = dynamic_cast<RdKafka::TopicImpl *>(topic); if (rd_kafka_produce(topicimpl->rkt_, partition, msgflags, payload, len, key ? key->c_str() : NULL, key ? key->size() : 0, msg_opaque) == -1) return static_cast<RdKafka::ErrorCode>(rd_kafka_last_error()); return RdKafka::ERR_NO_ERROR; }

Topic objects need to be specified when producing messages.
(3) Poll polling

int RdKafka::HandleImpl::poll(int timeout_ms) { return rd_kafka_poll(rk_, timeout_ms); }

The production production message is asynchronous and returns immediately after it is placed in an internal queue, so the final written result needs to be returned by poll.Produces are best served and will try until they exceedMessage.timeout.msFailed to report.

4. Consumer consumption message process

(1) Create KafkaConsumer

RdKafka::KafkaConsumer *RdKafka::KafkaConsumer::create (RdKafka::Conf *conf, std::string &errstr) { char errbuf[512]; RdKafka::ConfImpl *confimpl = dynamic_cast<RdKafka::ConfImpl *>(conf); RdKafka::KafkaConsumerImpl *rkc = new RdKafka::KafkaConsumerImpl(); rd_kafka_conf_t *rk_conf = NULL; size_t grlen; if (!confimpl || !confimpl->rk_conf_) { errstr = "Requires RdKafka::Conf::CONF_GLOBAL object"; delete rkc; return NULL; } if (rd_kafka_conf_get(confimpl->rk_conf_, "group.id", NULL, &grlen) != RD_KAFKA_CONF_OK || grlen <= 1 /* terminating null only */) { errstr = "\"group.id\" must be configured"; delete rkc; return NULL; } rkc->set_common_config(confimpl); rk_conf = rd_kafka_conf_dup(confimpl->rk_conf_); rd_kafka_t *rk; if (!(rk = rd_kafka_new(RD_KAFKA_CONSUMER, rk_conf, errbuf, sizeof(errbuf)))) { errstr = errbuf; // rd_kafka_new() takes ownership only if succeeds rd_kafka_conf_destroy(rk_conf); delete rkc; return NULL; } rkc->rk_ = rk; /* Redirect handle queue to cgrp's queue to provide a single queue point */ rd_kafka_poll_set_consumer(rk); return rkc; }

(2) Subscribe to Topic

RdKafka::ErrorCode RdKafka::KafkaConsumerImpl::subscribe (const std::vector<std::string> &topics) { rd_kafka_topic_partition_list_t *c_topics; rd_kafka_resp_err_t err; c_topics = rd_kafka_topic_partition_list_new((int)topics.size()); for (unsigned int i = 0 ; i < topics.size() ; i++) rd_kafka_topic_partition_list_add(c_topics, topics[i].c_str(), RD_KAFKA_PARTITION_UA); err = rd_kafka_subscribe(rk_, c_topics); rd_kafka_topic_partition_list_destroy(c_topics); return static_cast<RdKafka::ErrorCode>(err); }

(3) Consumer News

RdKafka::Message *RdKafka::KafkaConsumerImpl::consume (int timeout_ms) { rd_kafka_message_t *rkmessage; rkmessage = rd_kafka_consumer_poll(this->rk_, timeout_ms); if (!rkmessage) return new RdKafka::MessageImpl(NULL, RdKafka::ERR__TIMED_OUT); return new RdKafka::MessageImpl(rkmessage); }

3. RdKafka Multithreaded Design

1. Producer/Consumer multi-threaded design

Internally, RdKafka makes full use of hardware resources using multiple threads. The RdKafka API is thread-safe, and applications can call any API function within their threads at any time.
Each Producer/Consumer instance creates the following threads:
(1) Apply threads to handle specific application business logic.
(2) Kafka Handler thread: Each time a Producer/Consumer is created, a Handler thread is created, that is, the RdKafka main thread with the name rdk::main and the thread execution function rd_kafka_thread_main.
(3) Kafka Broker Thread: For each broker added to Producer/Consumer, a thread is created responsible for communicating with the broker, and the thread execution function is rd_kafka_broker_thread_main, the thread name is rdk::brokerxxx.
(4) Inner Broker thread for OP operation queues with unassigned partitions.
(5) Background threads
If the configuration object has background_setEvent_Cb, Kafka Handler creates corresponding background threads and queues when it is created, and the thread execution function is rd_kafka_background_thread_main.

2. Thread Viewing

How Linux views the threads of the KafkaConsumer process:

ps -T -p pid top -H -p pid

The Cosnumer thread looks at the results:

The Producer thread looks at the results:

14 June 2020, 20:23 | Views: 1927

Add new comment

For adding a comment, please log in
or create account

0 comments