Want to make sure I'm using rd_kafka_poll correctly #71

winbatch · 2014-02-03T04:33:52Z

Hi,

I know we've discussed previously, but I want to make sure I'm using poll correctly. As I understand it, it will return when there is a delivery or error available. In my use case, I don't want to publish the next message unless I've received an acknowledgement for the prior message. I'm in the process of doing testing and am bringing down one of the brokers while publishing. I'm pretty sure there are cases where poll is returning >=1, simply because it called my error callback with a non fatal error (one of the brokers being unreachable). However, that also means that poll is returning, even though I haven't received an acknowledgement for my last message. That would seem to imply that I need to continue polling (in a loop) until I specifically get either a fatal error or an acknowledgement, since checking that poll had returned at least one event may not necessarily indicate that what I just produced() was either successful or failed (and the callbacks could all be just informational. Is this correct?
I'm seeing that multiple messages are 'outstanding' waiting for delivery callbacks, and in my case I should only ever have one outstanding.

edenhill · 2014-02-03T06:08:11Z

Oh, you want a sync interface. You do know that is the ultimate performance killer of all times, yes?

Each message produced will take at least the round-trip time to the broker + broker disk flush or propagation to other brokers. e.g: RTT (2ms) and required.acks > 1 (3ms) = 5ms per message = max 200 messages/second.

You typically dont need a sync interface for reliability since rdkafka will make sure the message is acked by the broker or errored back the application properly.

Having said all that, this is how you implement a sync interface:


void dr_cb (...err, , void *msg_opaque) {
     int *produce_statusp = (int *)msg_opaque;

     /* set sync_produce()'s produce_status value to the error code (which can be NO_ERROR) */
     *produce_statusp = err;
}

int sync_produce (rkt, msg..) {
   int produce_status = -100000; /* or some other magic value that is not proper value in rd_kafka_resp_err_t */

   rd_kafka_produce(rkt, ..msg, .., &produce_status /* msg_opaque */);

   do {
     /* poll dr and error callbacks. */
     rd_kafka_poll(rk, 1000);
    /* wait for dr_cb to be called and setting produce_status to the error value. */
   } while (produce_status == -100000);

  if (produce_status == RD_KAFKA_RESP_ERR_NO_ERROR)
   return SUCCESS!;
  else
   return FAILURE;
}

winbatch · 2014-02-03T11:52:05Z

Thanks. I've effectively done the below. Note that I'm sending an MB at a
time so the number of msgs a second is less important to me than the amount
of data transferred per time interval

On Monday, February 3, 2014, Magnus Edenhill [email protected]
wrote:

Oh, you want a sync interface. You do know that is the ultimate
performance killer of all times, yes?

Each message produced will take at least the round-trip time to the broker

broker disk flush or propagation to other brokers. e.g: RTT (2ms) and
required.acks > 1 (3ms) = 5ms per message = max 200 messages/second.

You typically dont need a sync interface for reliability since rdkafka
will make sure the message is acked by the broker or errored back the
application properly.

Having said all that, this is how you implement a sync interface:

void dr_cb (...err, , void *msg_opaque) {
int *produce_statusp = (int *)msg_opaque;
 /* set sync_produce()'s produce_status value to the error code (which can be NO_ERROR) */
 *produce_statusp = err;
}

int sync_produce (rkt, msg..) {
int produce_status = -100000; /* or some other magic value that is not proper value in rd_kafka_resp_err_t */

rd_kafka_produce(rkt, ..msg, .., &produce_status /* msg_opaque */);

do {
/* poll dr and error callbacks. /
rd_kafka_poll(rk, 1000);
/ wait for dr_cb to be called and setting produce_status to the error value. */
} while (produce_status == -100000);

if (produce_status == RD_KAFKA_RESP_ERR_NO_ERROR)
return SUCCESS!;
else
return FAILURE;
}

Reply to this email directly or view it on GitHubhttps://github.com//issues/71#issuecomment-33926763
.

edenhill · 2014-02-03T12:03:20Z

Okay, you mentioned earlier this data was compressed. Are you letting the producer compress it or is it already compressed when you hand it to the producer?

winbatch · 2014-02-03T12:14:48Z

Letting producer compress. This is streaming log data as the files are
written

On Monday, February 3, 2014, Magnus Edenhill [email protected]
wrote:

Okay, you mentioned earlier this data was compressed. Are you letting the
producer compress it or is it already compressed when you hand it to the
producer?

Reply to this email directly or view it on GitHubhttps://github.com//issues/71#issuecomment-33947133
.

edenhill · 2014-02-03T12:21:20Z

Okay, you might want to compress the message before handing it over to the producer for performance reasons on the broker.

See, when a Kafka producer compresses a message (or actually a set of messages - 1 or more), it compresses the message header and the message payload. When the broker receives this compressed message set it will uncompress it, assign message offsets, and then recompress it.
This makes sense when there are multiple messages in a message set, but in your case there will only ever be one message in each message set, and a pretty large one, so the broker will uncompress+recompress for no real purpose.

winbatch · 2014-02-03T12:22:56Z

If I do that, then the consumer would need to explicitly uncompressed it ,
right? If so, I don't want to impose that.

On Monday, February 3, 2014, Magnus Edenhill [email protected]
wrote:

Okay, you might want to compress the message before handing it over to the
producer for performance reasons on the broker.

See, when a Kafka producer compresses a message (or actually a set of
messages - 1 or more), it compresses the message header and the message
payload. When the broker receives this compressed message set it will
uncompress it, assign message offsets, and then recompress it.
This makes sense when there are multiple messages in a message set, but in
your case there will only ever be one message in each message set, and a
pretty large one, so the broker will uncompress+recompress for no real
purpose.

Reply to this email directly or view it on GitHubhttps://github.com//issues/71#issuecomment-33948568
.

edenhill · 2014-02-03T12:23:22Z

Yep, thats correct.

edenhill · 2014-02-03T12:28:36Z

All good? Reopen if not.

edenhill closed this as completed Feb 3, 2014

MockingJayWong mentioned this issue Jul 9, 2019

rd_kafka_topic_new after using admin api #2399

Closed

7 tasks

Furuta-Masakazu-quick mentioned this issue Sep 29, 2020

Producer detects "All broker connections are down" when rolling restarts #3090

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Want to make sure I'm using rd_kafka_poll correctly #71

Want to make sure I'm using rd_kafka_poll correctly #71

winbatch commented Feb 3, 2014

edenhill commented Feb 3, 2014

winbatch commented Feb 3, 2014

edenhill commented Feb 3, 2014

winbatch commented Feb 3, 2014

edenhill commented Feb 3, 2014

winbatch commented Feb 3, 2014

edenhill commented Feb 3, 2014

edenhill commented Feb 3, 2014

Want to make sure I'm using rd_kafka_poll correctly #71

Want to make sure I'm using rd_kafka_poll correctly #71

Comments

winbatch commented Feb 3, 2014

edenhill commented Feb 3, 2014

winbatch commented Feb 3, 2014

edenhill commented Feb 3, 2014

winbatch commented Feb 3, 2014

edenhill commented Feb 3, 2014

winbatch commented Feb 3, 2014

edenhill commented Feb 3, 2014

edenhill commented Feb 3, 2014