-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consumer stops consuming events and throws KinesisMessageDrivenChannelAdapter : The lock for key 'xxxxxxxx:shardId-00000000000X' was not renewed in time #186
Comments
I think there must be some other logs related to the lock processing.
Would you mind to share that one with us as well? Also: is there a chance to use really the latest The |
yeah, sorry was trying with 2.1.0 thought if that one doesn't have the issue. No error before that or after. just that endlessly on both instances.
|
OK. Any chances that you can share a simple project to reproduce? |
Yeah, I did put a quick project together and seems like on LocalStack you can easy to replicate the issue. Once you get shards being consumed by not one instance but two. |
Hi @malvine ! Thank you for the sample. Will look into code today. |
Well, I'm not sure, but it looks like the problem is here:
Which we don't set in our
So, if another instance is holder for the lock, we don't return immediately as it states for the Does it make sense? I'll try to fix in my local Spring Integration AWS copy and re-run your solution to verify. |
So, yeah... My assumption and investigation is correct. I'm going to push the fix today, but if you need a release for Spring Integration AWS I'm moving this to Spring Integration AWS project though... |
@artembilan thank you very much! |
Did this get resolved? I'm running into this exact issue - even when upgrading Spring Integration AWS to I was even still able to reproduce it with the quick project that @malvine made using Spring Integration AWS to |
can you share, please, an updated sample project to see what is going on? |
@artembilan thanks for the quick reply! Here's a fork of his project upgraded to https://github.com/postalservice14/kinesis-lock-was-not-renewed-in-time |
any chances that you can upgrade your solution (not that sample app) to the latest Spring Boot and I'll try to reproduce over weekend in a simple unit test with Testcontainers. Well, I'd prefer to have a new GH issue with fresh, actual and relevant info instead of trying to resurrect this issue against out-dated versions. |
OK. Was able to reproduce locally as a parallel unit test against Testcontainers with LocalStack image. |
Found the problem. Thank you for the patience! |
So, the code in that Lock client looks like this:
I really doubt that it is OK to block it that long period. Some related discussion is here: spring-projects/spring-integration-aws#219 which leads us to the conclusion that we cannot use I don't know yet what is the proper fix must be done, perhaps a combination of |
Right. See here for a bug on AWS SDK side: awslabs/amazon-dynamodb-lock-client#44 |
Fixes spring-cloud/spring-cloud-stream-binder-aws-kinesis#186 The `DynamoDbLockClient` waits extra `leaseDuration` time in a loop breaking a `tryLock()` contract. * Fix `DynamoDbLockRegistry.tryLock()` to decrease an actual `additionalTimeToWait` by `leaseDuration`, so the target `DynamoDbLockClient` when it adds this `leaseDuration` will wait an actual timeout requested by the `tryLock()` contract. This way a `tryLock(0)` will definitely return immediately since we really are not interested in blocking **cherry-pick to 2.5.x** # Conflicts: # build.gradle
So, I have just made some workaround fix in the |
@artembilan I'm having the same issue, can you provide a URL of the repository that contains 2.5.5-SNAPSHOT? |
@artembilan version 2.5.5 doesn't have those messages about the locks, but I now have another issue. If my app is not released the locks properly (like JVM is dies or k8s rolling restart) then there are old locks that stay in the lock table, and those locks are not released on the next app startup. I've tested with the older version and the old locks are not an issue there, it just waits for some time and locks get released automatically. |
Yeah... i think we are facing this one: awslabs/amazon-dynamodb-lock-client#79. It's probably worth to try with much shorter |
You know I have just implemented our own I still have some changes to be done on the binder level to start a new |
That sounds awesome, I've tried to play with leaseDuration however, I got controversial results. |
So, here is a Kinesis Binder |
What are the expected spring-cloud-stream and spring-boot versions for spring-cloud-stream-binder-kinesis/4.0.0-SNAPSHOT ? |
|
With 4.0.0-SNAPSHOT I'm getting error: works fine with 3.0.0 |
I guess you have to remove your |
Well, that is in However the Oh! I see. There is a bug 😄 I do this over there:
where new |
Please, try now with Thank you for assisting in testing! |
Thanks for the quick feedback! I still see locks are not live for more than a second. |
Right. The number of records really must be equal to the number of active shard consumers. Any advice what tool I can use locally in Windows to browse DynamoDB in Docker container? |
You may try to use awscli, but you will need to set it up somehow to look into your docker dynamodb.
We may have a pair session if it helps. FYI I'm running java service locally using real AWS kinesis/dynamodb. |
So, I have this test:
It confirms that And yes the And no: I don't have any AWS accounts to test against real one. 😄 |
OK. Something is still wrong with my Looking further for renew problem... |
So, I made some tweaks and testing. Note: I still could not figure out why AWS CLI in Localstack container fails with that |
It looks like records are expiring rite after creation, instantly. |
Hm. So, sounds like Would you mind to double check with your AWS env how our You probably can disable TTL on the table for now though. |
Here is the value that I see rite now So it is some issue with timezone. |
another observation, if I set JVM timezone to GMT it starts working as expected |
OK. Have just pushed |
Looks like it works fine now, thank you! |
Fixes spring-cloud/spring-cloud-stream-binder-aws-kinesis#186 The `DynamoDbLockClient` waits extra `leaseDuration` time in a loop breaking a `tryLock()` contract. * Fix `DynamoDbLockRegistry.tryLock()` to decrease an actual `additionalTimeToWait` by `leaseDuration`, so the target `DynamoDbLockClient` when it adds this `leaseDuration` will wait an actual timeout requested by the `tryLock()` contract. This way a `tryLock(0)` will definitely return immediately since we really are not interested in blocking **cherry-pick to 2.5.x**
Fixes spring-cloud/spring-cloud-stream-binder-aws-kinesis#186 The `DynamoDbLockClient` waits extra `leaseDuration` time in a loop breaking a `tryLock()` contract. * Fix `DynamoDbLockRegistry.tryLock()` to decrease an actual `additionalTimeToWait` by `leaseDuration`, so the target `DynamoDbLockClient` when it adds this `leaseDuration` will wait an actual timeout requested by the `tryLock()` contract. This way a `tryLock(0)` will definitely return immediately since we really are not interested in blocking **cherry-pick to 2.5.x**
Hi,
I've got interesting use case and it's very simple to replicate.
I am using the latest binder version.
My setup:
To replicate:
Start the consumer: usually one instance gets all both shards as owner. Let's say instance1 gets all the shards. So far so good. Instance consuming events.
Restart the active consuming instance e.g. instance1. Instance2 will get now shard-0 and instance3 gets shard-1 as owner. (This step might take couple of attempts as sometimes instance2 or instance3 gets all the shards. Keep trying until you get one shards per instance)
Once you got instance1 - doing nothing; instance2 owner of shard-0 and instance-3 owner of shard-1, that's when the instances stop consuming any events and both of them have this in the logs every 30 seconds:
The only way to stop this is to restart one of the consuming instances so all the shards are back to one instance.
The binder version 2.0.1.RELEASE doesn't have this issue.
Pretty sure it's not even the binder issue - it's probably integrations-aws lib issue.
Thanks.
The text was updated successfully, but these errors were encountered: