-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kafka] RdKafkaProducer has no way of error handling #749
Comments
I don't have an immediate answer. Will think about it |
When callbacks are set up like so: echo microtime(true), '<br />';
$this->conf->setErrorCb(function ($kafka, $err, $reason) {
echo nl2br(sprintf("Kafka error: %s (reason: %s)\n", rd_kafka_err2str($err), $reason));
echo microtime(true), '<br />';
});
$this->conf->setDrMsgCb(function ($kafka, $message) {
if ($message->err) {
// message permanently failed to be delivered
echo nl2br(sprintf("Kafka error: %s", $message->err));
echo microtime(true), '<br />';
} else {
// message successfully delivered
}
}); They are called once and result in:
However, due to
when Conf instance was built I can see in stderr:
which indicates that librdkafka is still trying to send the message.
If Kafka is brought back up after www server received it's response (and callback were called and so on) I can see in Prometheus that message was successfully delivered: Cross-referencing an issue that probably describes this problem in php-librdkafka wrapper: |
@makasim I've looked into this further and found two issues regarding topic configuration - in .NET client library of librdkafka and librdkafka itself (confluentinc/librdkafka#2202, confluentinc/confluent-kafka-dotnet#322). However, it seems to me that enqueue-kafka should have some way of waiting for message to be delivered. Especially if the process sending the message is short lived and probably would not otherwise wait for potential message errors (by calling |
I am completely sold at it. The message queue is async by its nature and should not wait for anything. Also waiting for confirmation might drastically decrease the performance of the producer process. The wait operation kind of blocking and the script does not do anything valuable while waiting for the confirmation. |
I agree to an extent. However, currently our Kafka integration will silently ignore all Producer errors, unless someone notices that messages are missing and/or looks into syslog. PHP process itself is unaware of any issues and will look as if everything is ok. Librdkafka itself suggests calling From librdkafka:
From PHP wrapper (php-rdkafka)
It's the only way for error callbacks to even be fired. From what I've established
That's why I've suggested providing a method in the Kafka client to call Also, I would make a PR with some common configuration options explained and added to current docs (Kafka part) with potential pitfalls if they are not configured (like |
@Steveb-p thanks for your very well detailed answer. I agree that the issue should be addressed. Would you be able to provide a PR for this issue? |
@makasim Yes, will do. Just wanted to clarify what approach I should take before doing anything :) |
@makasim I've looked WDYT? To reiterate my previous comments:
|
|
I can think of only one solution https://github.com/laravel/framework/blob/8ab09fb6377eb5b6dab45ab38b0a62446b08e8ae/src/Illuminate/Queue/Worker.php#L139-L148 Not sure it works out. |
I think it's best to have 2 modes available:
Making this automagically would be hard to control. For example, we need flow control available for errors - what we do if there was an error when sending the message (that's the reason for all of this, isn't it?). If we'd get exception sometime later (and we cannot even predict when), we would not be able to handle the exact situation and for example to save (some) messages to disk or database for later processing, if they are really important ones. |
Not really. A lot of drivers support asynchronous execution. Producing a message (which is done in a subthread by the way) can take from a couple ms's to a few seconds at worst, and there are legitimate reasons to allow the program to execute further and - possibly - produce more messages. In my opinion the choice between waiting for message production results should be left to the user. We've mostly talked about what approach would be best, aiming to contain classes within their interface declaration ( My current approach would be to either attach some property to message instance, causing it to block (which would be transport specific behavior and will need to be documented) and/or extend |
@Steveb-p or it could be a similar approach I did for subscription consumer. A context could create a producer responsible for such kind of message publishing. If transport does not support it than NotSupportedException is thrown. That way we do not pollute context with extra methods which is good. WDYT? That producer could be implemented for kafka and rabbitmq (with publisher confirms mode) |
@makasim In other words you want to introduce I'm not really familiar with |
I am thinking of |
namespace Interop\Queue;
interface AsyncProducer extends Producer
{
// Async methods
} created from |
Any news on this? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Be silent, bot! I'll work on it. I promise! :) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Just wanted to mention that phprdkafka 4.0 is being prepared, where producer error handling will be moved to userland code. |
@Steveb-p is a fix for this dependent on phprdkafka 4? Also, do you have recommendations on how to handle guaranteed delivery through enqueue-kafka for this short term? For now I am basically just making the change you'd commented about in RdKafkaProducer->send(). Probably not something that would work as a production change for the library but for a temp patch for us it seems to do the trick (once I implemented both error AND delivery report handling.) Lastly, any input or help I can give to move forward? |
@mlively definitely any solution would not depend on phprdkafka 4.0. It’s just a note for my future self to keep it in mind. If anything phprdkafka will force error handling in user code, since it will no longer retry sending messages in thread keeping php process alive during shutdown. |
I would like to share our experience with the provided code in the description of this PR. The code with The only one thing I noticed, is that sometimes this loop Regarding Symfony Messenger integration - we already integrated this approach via custom middleware that allows us to send all failed (due to Kafka's unavailability) dispatched messages to Failure Transport. For more details and code examples, see symfony/symfony#35521 @Steveb-p what are the current blockers to have this code inside I've also benchmarked overhead of calling
which is really not noticeable. Can I help with something to get this code merged to the master branch so other developers don't have to spent their time trying to understand where are their messages :) ? |
@maks-rafalko Time really. And some design choice. Ideally we would like to have something that will either call EDIT: You can find me on both |
RdKafkaProducer might introduce a transport-specific method. That's fine. So people can do something like this (for example, in the extension):
|
I would introduce a method to get errors and leave it up to developers to use it or not. |
Understood. I'll have to refresh my memory then 😄 |
@Steveb-p my thoughts on this, at least for |
Hi! I'm struggling trying to get an error when producing a message to a stopped Kafka instance. I guess it's related to the async nature of the rdkafka extension as I'm reading, is this solved or configurable in any way? |
When an error occurs in
RdKafkaProducer
, it is silently ignored - or, more accurately, since it occurs outside of the main thread of php/php-fpm, it is not reported back.A simple test can be done to see this issue: trying to send a message to a non-existent Kafka server will result in nothing (see point 2. though)
This results in two things:
While this particular part is not really possible to fix inside enqueue, it's important since the message might actually be delivered later on.
I'd like to ask for opinion regarding how RdKafkaProducer should handle this situation. IMO it is worthwile to add a configuration option to make sending messages synchronous for this particular Producer - or at least wait a specified amount of time for message to be potentially acknowledged.
This can be done by introducing this code at the end of
send
method:This has a side effect of actually calling
dr_msg_cb
&error_cb
callbacks, which are otherwise ignored (or at least that's what my testing indicated).Thoughts?
The text was updated successfully, but these errors were encountered: