-
Notifications
You must be signed in to change notification settings - Fork 628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Recover commands during shutdown with spring-rabbit >= 3.0.11 #2941
Comments
I'm not very good in understanding TCP dumps, but can you, please, try your solution to upgrade to the latest Spring Boot Any chances to have a simple project where we can reproduce? I don't mean something TCP dump-related , but Java application to let me to run in debug mode to catch that |
OK. So, here is a JavaDoc of the command:
Apparently that is really correct to do when we close consumers and we have in-flight un'acked messages.
That is only the place in Spring AMQP and AMQP Java Client where this command explicitly called. |
I think you have found the root cause. It appears that the "issue" was introduced by this commit:
This change corresponds exactly to when it stopped working for us, as shown in the difference between v3.1.1 and v3.1.2: After we introduced this change, our process no longer stopped cleanly. The supervisor killed the process after 10 seconds of waiting for the shutdown to complete. We will try to reproduce the problem in an isolated environment or try to understand why with this change it take so long to shutdown. |
Hold on. I think that is totally OK that this recovery for unack'ed messages may take some time. Therefore I don't understand why your team look at this change in the shutdown behavior as a problem. Please, advice. |
No, I'm not saying that |
Hi, While shutting down our process, we observed several error logs of this kind:
Similarly repeated among RabbitMQ server logs:
Therefore, we cannot use this version at the moment. We haven't been able yet to create a demo project with the latest versions to reproduce the issue, but perhaps you could help us understand the cause of these errors? Thank you in advance! |
That is strange. You might mitigate the problem if you don't use transactions or don't use manual ack. The sample to reproduce would be great. Thanks |
Yes, the behavior is strange. Our configuration parameters are the follows:
Given the high traffic, could it be a concurrency issue? To be able to reproduce and debug, we are currently developing a sample project, but it will take us a few days. |
Hi, This project mirrors the configurations of our production environment (e.g. prefetch count, consumer concurrency, batch size, etc.). Could you please run this demo on your end and confirm if you observe the same behavior? Thanks a lot! |
I have a lot on my plate, so, help me to recollect what exactly you'd like me to look into? I guess we don't talk about Now your concern that there are some Correct me if I'm wrong. Thanks |
I believe the two things are related. |
Fixes: #2941 Issue link: #2941 Now `BlockingQueueConsumer.basicCancel()` performs `RabbitUtils.closeMessageConsumer()` to initiate `basicRecovery` on the transactional consumer to re-queue all the un-acked messages. However, there is a race condition when one in-flight message may still be delivered to the listener and then TX commit is initiated. There a `basicAck()` is initiated. However, such a tag might already be discarded because of the previous `basicRecovery`. Therefore, adjust `BlockingQueueConsumer.commitIfNecessary()` to skip `basicAck()` if locally transacted and already cancelled. Right, this may lead to the duplication delivery, but having abnormal shutdown situation we cannot guarantee that this message to commit has been processed properly. Also, adjust `BlockingQueueConsumer.nextMessage()` to rollback a message if consumer is cancelled instead of going through the loop via listener * Increase `replyTimeout` in the `EnableRabbitReturnTypesTests` for resource-sensitive builds
Hey, @marcogramy ! Sorry for some delay: was busy with migration to JSpecify in So, today I had a chance to look into your application and investigate the problem. I have pushed this fix into Would be great if you take a look over there and much better if you can run your solution against Thank you for your time and all the help around this issue! |
Hi, Thanks. |
The SNAPSHOT is here: https://repo.spring.io/ui/native/snapshot/org/springframework/amqp/spring-rabbit/3.2.3-SNAPSHOT/ So, you need to add https://repo.spring.io/snapshot into your repositories management:
And use
If you use Spring Boot dependency management, similar to what https://start.spring.io/ does for us when we generate Maven project:
Then you can use special properties to override versions managed by Spring Boot: https://docs.spring.io/spring-boot/maven-plugin/using.html#using.parent-pom. Please, share your feedback before next week where we are planning to release next version which should include the fix for this problem as well. Thanks |
I'm using Spring Boot 3.2.12 (which includes
spring-rabbit 3.1.8
andspring-amqp 3.1.8
) and I'm using severalSimpleMessageListenerContainer
consumers.During consumer shutdown, specifically during connection closure, I noticed unusual Recover commands that I hadn't observed before. By analyzing the network traffic (using tcpdump) on port 5673, I captured the following message exchange, which is unexpected:
![Image](https://private-user-images.githubusercontent.com/1673847/403834601-63e2a0f3-d8ff-40f3-a6ca-57be12f35bbb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0MDUwOTQsIm5iZiI6MTczOTQwNDc5NCwicGF0aCI6Ii8xNjczODQ3LzQwMzgzNDYwMS02M2UyYTBmMy1kOGZmLTQwZjMtYTZjYS01N2JlMTJmMzViYmIucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTJUMjM1OTU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZWQ4NTNjYTM2MDcxNDkzM2I2ZDZiMmM2NTM0Njc5OTE0OTgzNWE2MzU0NzE1Y2Y1YTI1NmIzMTg0MmNhZDM0NiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.SpgP2ZuF76EGqv6E-JIxejHfkMygo9Is2fppcrk9OLE)
After conducting several tests with previous versions of the spring-rabbit and spring-amqp libraries, I found that downgrading to
![Image](https://private-user-images.githubusercontent.com/1673847/403834992-f524852a-aaf1-4140-a953-b5f8434db13f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0MDUwOTQsIm5iZiI6MTczOTQwNDc5NCwicGF0aCI6Ii8xNjczODQ3LzQwMzgzNDk5Mi1mNTI0ODUyYS1hYWYxLTQxNDAtYTk1My1iNWY4NDM0ZGIxM2YucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTJUMjM1OTU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Zjc5NzFkYjJhNzY4NWU5YjUyNDU5MGVhZTQ3NmRiZWZkNjE3MDhjZjY0ZTc4NzAwNzE0ZWU5MzcxZmYxZTgwYiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.JoGSORCr8Q5-jCWwO2IeUIVKYHH_2-REERYV8LO2V-E)
spring-rabbit-3.0.10
resolves the issue. The network traffic in this scenario shows the expected behavior:Starting from spring-rabbit-3.0.11 onwards, I observe the problematic Recover commands again.
My RabbitMQ server version is 4.0.5.
I suspect this behavior is related to changes introduced in spring-rabbit-3.0.11 concerning shutdown procedures, as documented in issue #2594 and the corresponding commit: 02db80b.
Can you help me understand why this behavior occurs?
Thank you in advance for your assistance.
The text was updated successfully, but these errors were encountered: