-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queue level duplication: support for quorum and mirror queues #37
Comments
This seems like a pretty important issue to figure out, as I observed the same behavior. @noxdafox Please let me know if you need further reproduction steps. |
Hello, I tried reproducing your issue as istructed.
Yet only one message at a time was in the queue. What version of RMQ, Erlang and the plugin are you using? Did you restart the broker after installing the plugin? Sometimes we observed cases in which adding the plugin to a running broker would look Ok but then it was not operational. |
Hi @noxdafox Rabbitmq version: 3.7.18 I'm using cluster of four rabbitmq's and it looks like this is the problem. This is my rabbitmq.conf file:
And definitions.json:
Plugin works perfectly but like mentioned before when I use the web ui to get messages then I can publish them again also I've been stoping and starting nodes and once node rejoins the cluster then I can publish message with the same deduplication header and it's added to the queue. I think that it may be related with this #19 So it looks like it is problem with cluster syncing. |
Unfortunately, RMQ 3.7.18 introduced non-backward compatible changes which break the plugin (see compatibility notes). I already pushed to master changes which bring support to new RMQ APIs but they still need to undergo testing before I make a new release. If you look at the README.md the maximum supported version is 3.7.17. I hope I can release a new forward-compatible version of the plugin by the end of this week but I can't make promises. |
Thanks for quick replay. I've just downgraded local enviroment to 3.7.17 and the results are exactly the same as for 3.7.18 With single rabbitmq node it is working perfectly, I can get messages and the deduplication is working but with more nodes when I get messages(with Nack, requeue = True) and publish new message with the same header it is added to the queue and the same goes for stoping and starting nodes. Rabbitmq version: 3.7.17 |
I just spinned a 3 nodes cluster using Used rabbitmq:3.7.17-management image with the plugin downloaded from the release list. Is there something I am doing differently from you setup? If you try the above mentioned docker image, do you get the same result? |
I've also used that image in docker, and plugin downloaded from the release list. Have You set the policy so the queue was mirrored to all the nodes?
And queue
Just like in this comment: #37 (comment) Maybe I'am doing something wrong? |
Any updates here, @noxdafox? |
I did not notice at first the use of HA queues. I will try to combine dedup and HA over the WE. My gut feeling is queue-level deduplication and HA are not compatible. Reason is HA implementation bypasses some of the queue behaviours the plugin relies upon in order to check for duplicates. If there is no workaround, I guess the only way to go is relying on exchange-level deduplication. |
I spent some time investigating the issue. Surprisingly enough, it seems duplicates show only if HA is enabled via policy. If HA is set on queue creation, de-duplication seem to work from the management console. This is not a trivial problem. Core issue is, RMQ HA sits on top of the This means that we would need to submit quite few changes within RMQ to make this plugin and mirrored queues work. Considering RMQ is introducing a new HA model in For the time being, I'll update the README to underline the issue. You can always use exchange-level deduplication which is the recommended one anyways as it's much simpler to use and understand. During the week, I'll approach RMQ devs to see if there's a way we can tackle this problem. |
Thank you for your diligence @noxdafox. Curious to hear from RMQ devs will say. |
Thanks @noxdafox for feedback
But correct me if I'm wrong exchange level deduplication doesn't really guarantee that messages will not get duplicated it will just ensure that for example if I set |
If I understand correctly, with "queue worker will crash" you mean the queue consumer right? In this case a new consumer will receive the same message again as the one which crashed did not acknowledge it. If you want to avoid message re-delivering in case of error you can disable acknowledgment. The de-duplication plugin does not prevent message duplicates due to network or broker errors. It is designed to prevent publishers from publishing duplicate messages. The exchange level de-duplication allows you to select for how long (virtually forever) a message should not be published again. As it works at the exchange-level, it allows to de-duplicate multiple queues at once. The queue level de-duplication prevents a message to be published within a queue if already another one with the same de-duplication header is present. Usually people prefer queue level de-duplication but fail to see how more unpredictable de-duplication becomes. What if the queue is empty most of the time for example? The queue level de-duplication is effective if the User wants to prevent two identical messages to be consumed at the same time but still wants identical messages to be consumed in different time frames. |
@noxdafox Our use case is as follows, for example. We would like to use a queue for jobs that would update various caches when blog posts are updated. The goal is to remove the delay associated with dumping such caches which could take multiple seconds and make it async, thus having the post screen in Wordpress come back faster and annoy our writers less. There will be other uses, but let's take this one as a hypothetical. The simple goal here is this:
We went with queue level deduplication and ran into issues with deduping not working and still inserting duplicate jobs. Now that we're considering exchange level deduplication with these ttls, what is the downside of using it vs queue level? If the job is already gone and processed but ttl is set to 10 min, does exchange level deduplication still add the job or it will simply look at the previous ttl and drop the adding the new duplicate job, even though it's not even a dupe anymore (because the original job was consumed and processed)? The goal is so simple - if there's a job with the same params, don't re-add it. If there isn't, add it. I don't get why RMQ doesn't support it natively, and why they're making it so difficult to support for a plugin. |
Did RMQ devs tell you anything useful to get this to work btw? |
@archon810, your Use Case is pretty legitimate. The queue level de-duplication seems the best approach. The exchange level de-duplication will de-duplicate for the entire duration of the TTL. In other words, a TTL of 10 minutes would mean no other message with the same header would be routed for the next 10 minutes since the first one. The reason why RMQ does not support de-duplication is because it's not part of the AMQP standard. RMQ provides a I used such behaviour to implement de-duplication over existing queues. Unfortunately, not all features are using the above behaviour. The reason for that is unknown to me. My gut feeling is the above behaviour was not fully intended for extending RMQ queues capabilities. As some of the features do not use such behaviour, they will "step over" the behaviour implementations. Among the problematic features/queues Mirrored and Quorum queues seem to be the offending ones. I did not yet reach RMQ community as I did not have time to dig deeper into the issue. I'd like to be able to pinpoint where the problem is before bothering the core developers. ATM i'm very busy so you will need to be patient for a while as my free time is very little. If de-duplication is critical for you, I might suggest you to try disabling mirroring for the meantime. OFC you need to consider the risk of loosing messages in case of network or broker errors. |
Thank you for clarifying things @noxdafox. I, of course, understand the free time issue, guilty of that myself. May I just ask that once you get some free time, you don't let this issue get forgotten? |
Hi @noxdafox |
Hello, sorry for the delay. The current desire of RMQ devs is to slowly retire generic multi-feature queues to provide more specialised queue implementations which better suit specific Use Cases. Quorum Queues are the first example of the direction RMQ developers are taking in this matter. So far, this plugin is not supporting Mirrored and Quorum queues and it won't in the near future (I am updating the README). My plan is to reach RMQ devs and try to understand if we can find a way to provide generic hooks to implement features such as queue-level deduplication on the new queue architecture. This will of course take some time considering there is no clear understanding of how the new architecture will look like. My suggestion for the moment is to rely on exchange level deduplication if queue HA is needed. I know is a sub-optimal solution for certain cases but there is not a simple alternative at the moment. |
Hi @noxdafox thanks |
Unfortunately no updates yet. As I said, this requires significant work on RabbitMQ rather than the plugin itself. I will resume this track in September when my agenda gets more free. |
Hi @noxdafox |
No sorry, I am currently focusing on other issues affecting the plugin. |
Just a clarification. Something happens and the queue starts to accepts duplicates until i have to recreate the queue from scratch,.. After recreation it works ok, unitl it starts to fail at some point and then keeps failing until queue recreation. |
@noxdafox Hey there! Sorry to bother, but is there any news on supporting Quorum queues? I am trying to scale RMQ, and this is a real blocker 😢 A lot of our queues are using the plugin, and it seems I won't be able to move them to the cluster. |
Hi I'm having the same issue of having multiple items with the same Erlang 24.2, RabbitMQ 3.9.12 Publish code: headers := make(amqp.Table)
headers["x-deduplication-header"] = id
b, err := json.Marshal(payload)
if err != nil {
return err
}
err = c.amqpChan.Publish(exchange, routingKey, false, false, amqp.Publishing{
ContentType: "application/json",
Body: b,
Headers: headers,
}) |
I'm experiencing this issue. Is this plugin functional? |
This ticket is for supporting deduplication on mirror and quorum queues. Not for issues related with de-duplication itself. If you are having issues with de-duplication, please open a ticket where you clearly state:
|
I am also looking for quorum deduplication functionality. Was disappointed to see the plugin only supports the deprecated classic queues. |
I've solved this issue for my project by creating a little Golang service that listens to the queues and does in-memory deduplication based on a unique id field / deduplication field, before publishing back into a deduped exchange Ripped out from my project so there's some stuff missing, but FYI in case this is helpful. You get the idea https://gist.github.com/dvcrn/fbfceeb1cd253cb4e4c81ec3b4d5d70f#file-main-go |
I reached the RMQ community today related to this issue: I cannot promise that there will be a solution in the foreseeable future but, if interested, you can follow the discussion. |
Hello!
Thank you for this plugin, it exactly what we need and it works great!
But I've found an issue or maybe I'm doing something wrong.
There is a case when tasks with same header are in the queue and it looks like it has something to do with requeue messages.
Steps to reproduce:
The x-deduplication-header is just a mda5 hash of message body
So am I doing something wrong or this is an issue?
Thanks in advance!
Maciej
The text was updated successfully, but these errors were encountered: