Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleClient (do not deprecate - reasons explained within); aka fix ConsumerMetadataRequestPayload #633

Closed
grimwm opened this issue Apr 6, 2016 · 6 comments

Comments

@grimwm
Copy link

grimwm commented Apr 6, 2016

Hi, at my place of employment, we do a lot of work with Kafka. One of the tasks, however, is monitoring the state of information within Kafka, such as how large each topic is and what the lag for each topic is. Because of the way Kafka handles group coordination, and because of our use case, each of our topics has a separate consumer group. Thus, when we want to gather information about all of these topics to see how much lag is in the topic, we have to spawn a new KafkaConsumer with a separate group id, and this is taking a very long time and behaving really slowly, even when we use KafkaConsumer.poll(0) and assignment instead of subscription.

With SimpleClient, however, I have direct access to much of the low level framework and can create Metadata requests to get things like the list of all topics and partitions and what the highwater mark for all of them is in a single, high performance call. However, unfortunately, two things are occurring:

  1. I think you're planning to deprecate SimpleClient, but we really need this. It's the only way to do low-level communications fast and efficiently since this use case is not the standard poll/consume.
  2. ConsumerMetadataRequestPayload was merged in June or July of last year, but it is currently failing to work. I can come back with more information once I put back the code in my project that generates those payloads. I'd really appreciate having them work, since then I can create "committed offset" requests for several different topics with different group ids and get them all back efficiently like I do right now with higwater marks.

Thank you!!!

@dpkp
Copy link
Owner

dpkp commented Apr 6, 2016

Thanks for the feedback. Have you tried using the "new" low-level client, KafkaClient ? The reason SimpleClient would be deprecated is not to remove support for low-level clients, but rather to switch to the new protocol stack and the async client interface. SimpleClient will stay until the new client can do everything the old client could (and better).

Also worth noting is that kafka 0.10 is going to include several new Admin apis (create topic, delete topic, etc), and I think at that point we would want a more full-featured AdminClient that does all of the standard low-level work that you've mentioned here.

@grimwm
Copy link
Author

grimwm commented Apr 6, 2016

The new client doesn't support ConsumerMetadataRequestPayload from what I can see. I actually don't mind moving forward, but I would like to get all the features from SimpleClient ported before we do that.

We already do admin work here for deletion by writing a znode that kafka then notices and uses for performing deletion of a topic. I can't remember where I write that znode, but I had to look at the kafka api to figure it out.

@dpkp
Copy link
Owner

dpkp commented Apr 6, 2016

Maintaining the 'Payload' structures is very difficult. The original design assumed that "payloads" were associated with a topic+partition. ConsumerMetadata, now called GroupCoordinator, are not associated with topic+partitions and so it breaks this original design. Moving to the new protocol stack is intended to fix this problem. The basic idea would be:

cli = KafkaClient()
...
request = GroupMetadataRequest('group_foo')
future = cli.send(0, request)
cli.poll(future=future)
assert future.succeeded()
response = future.value

there's some more work needed to make using the new Request/Response objects easier. But this is the idea.

@grimwm
Copy link
Author

grimwm commented Apr 6, 2016

That looks acceptable to me. I mainly just want some sort of low-level interface, since it's the only way to get large amounts of metadata in an efficient manner. The current way we're doing it is to build a KafkaConsumer connection with the group_id we want, poll(0) (to update position info), and then asking for position(). Then we tear down this connection and move to the next topic and repeat. This takes a few minutes for the number of topics we currently have. However, for highwater marks, I can use the SimpleClient, atm, to ask for the highwater on every topic, and it's very "convenient". Hehe, I put that in quotes, because while it's a bit of a pain to learn how kafka expects payloads to be, it's very fast once written.

@jeffwidman
Copy link
Collaborator

Long term, this will probably be solved by #935

@jeffwidman
Copy link
Collaborator

Related:

  1. Add list consumer groups offsets #1643 which adds KafkaAdmin.list_consumer_group_offsets()...
  2. Fix describe groups #1642, after which KafkaAdmin.describe_consumer_groups() should work correctly...

Once those are both merged, I think this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants