-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching rd_kafka_topic_t #345
Comments
This is your lucky day, that is the exact behaviour of
|
It might be hard for an application to rely on the refcount to retain a cached copy since there is no indication from topic_new() if it returned a new object or a cached one. One way of solving this would be to add a |
This could be my lucky day too if you decide to integrate it, I was about to use an intermediate rkt "cache" in order to integrate this functionality 😃 |
I think for my purposes (varnishkafka |
Will you be needing the |
@eugpermar :) |
Not for mine ones. My application needs to send messages to random (for message user provided) topic, so maybe in one moment refcount comes to 0, and in the next message I need to init it again. If it's easy to develop it, and you give me a few hints, I can do it by myself and send you a PR |
I'm also trying to do user provided topic name. Why do you need refcount? It should be fine to call |
Hi @ottomata, sorry for not to introduce myself :). For the first instance, yes. The problem is the performance: setting down and up sockets, connections, threads, queues, etc. I think that, in that scenario, I will waste more resources in setting up the toppar that in send the actual messages. With a little timeout, I can save all this disconnection/reconnection process, and let the application to process messages. |
Hm, so I'm worried about this too, but according to @edenhill, if you have already called Am I correct? |
The cost of setting up a new topic object is a metadata refresh from the brokers, which might not seem much but if you start pushing thousands of messages per second it just wont work with a metadata round trip for each message sent. @ottomata "most likely it will not" is absolutely correct for development systems but as soon as you roll out that stuff in production it is guaranteed to happen. |
Yes, I'm talking about the other side: Imagine that the application finish to flush all the messages, and I've already deleted my topic handler since I don't need anymore. So, librdkafka shutdown all the resources for that topic, and then... another message for that topic! It need to Maybe if I receive messages fast enough this does not happen, but I think that it's too risky :) My first (theoretical) approach was to put a layer on top of Magnus was faster this time :) and since he knows a lot more of this kafka stuff, let's trust in he's criteria hehe |
Hm, @edenhill for varnishkafka, this should be ok though, no? It is very clear when _new() is called, and in fact, _destroy() is never called by it, since it expects to be producing to its topics. Will rdkafka call _destroy() in the background? |
_new() and _destroy() will need to be symetric from the application side. The problem is this:
When the time between 3 and 1 (i.e., the time read_a_logline() takes to return) is longer than it takes for librdkafka to decommission and destroy a topic internally (which is an arbitrary number noone really knows) it means that a new topic object will be created for each produced message - something very costly. On the other hand if you already know at application start all topics it might produce to you can seed the cache and create a topic objects up front, like so:
|
Ah, hm. But varnishkafka has no calls to In my current WIP, I just call |
Everything that is learned must be forgotten :) if you do not call destroy,
|
I'd say not calling topic_destroy() is a termination bug, i.e., varnishkafka will not terminate cleanly, but that is not always a big concern. |
@eugpermar actually there is no memory allocation in calling topic_new() on an existing topic object. As long as the number of topic names is finite that should be fine. |
Aye, makes sense :) So, the functionality that is really needed is a way to tell rdkafka that ? |
I meant in different topic creation. As you said, if you do not control the
|
Aye. @edenhill, would everything explode if I did |
@ottomata yes, dont ever mess with refcnts! :) |
I think so too. |
The new rd_kafka_producev() API can take a topic name rather than topic_t object, thus the application will no longer need to create and hold on to a topic_t. |
@edenhill, the use case we have would require us to have a large number of topics. I am concerned that we might have too many of them in memory. How are the topics created from rd_kafka_producev managed? Is it still based on refcount internally? |
Yes, they are automatically created under the hood and refcounted. |
It would be handy if librdkafka had the ability to cache initialized rd_kafka_topic_t topics so that (lazy) users wouldn't have to implement this themselves. That is, when producing to multiple topics, each unique rd_kafka_topic_t could be stored librdkafka.
Something like
This could either initialize the topic and store it (if topic conf is also given?), or it could return something indicating the topic doesn't exist. I suppose a
rd_kafka_topic_store()
function of some kind would also be needed.(This doesn't already exist, does it?)
The text was updated successfully, but these errors were encountered: