Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harden ReliableDeliveryShardingSpecs #6750

Conversation

Aaronontheweb
Copy link
Member

Changes

Fixes the following test race condition caused by forcing the test to wait until a probe gracefully stops, which can cause a re-delivery.

Akka.Cluster.Sharding.Tests.Delivery.ReliableDeliveryShardingSpec.ReliableDelivery_with_Sharding_must_deliver_unconfirmed_if_ShardingConsumerController_is_terminated
Expected delivery4b.Message to be Job(msg-4), but found Job(msg-3).

With configuration:
- Use declared types and members
- Compare enums by value
- Match member by name (or throw)
- Without automatic conversion.
- Be strict about the order of items in byte arrays

@Aaronontheweb Aaronontheweb enabled auto-merge (squash) May 10, 2023 01:34
@@ -413,7 +413,7 @@ public async Task
}

// redeliver also when no more messages are sent to the entity
await consumerProbes[1].GracefulStop(RemainingOrDefault);
Sys.Stop(consumerProbes[1]); // don't wait for termination
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be the only fix this tests needs, FYI. No way to know for certain other than gathering more data.

@Aaronontheweb
Copy link
Member Author

@Arkatufus so we have a hard bug in the async TestKit here:

var endMessages = await consumerEndProbe.ReceiveNAsync(3, TimeSpan.FromSeconds(25)).ToListAsync();

25 seconds is what we allot - it used to be 15 seconds prior to my last PR to harden this test (issue is: actors weren't being given enough time to complete the messaging routine.)

Looking at the logs:

First

[INFO][05/10/2023 01:41:12.789Z][Thread 0014][Cluster (akka://test)] Cluster Node [15.0.0] - Node [akka.tcp://[email protected]:64654] is JOINING itself (with roles [], version [15.0.0]) and forming a new cluster

Last

[INFO][05/10/2023 01:41:18.679Z][Thread 0079][akka.tcp://[email protected]:64654/system/sharding/TestConsumer-1Coordinator] Singleton manager stopping singleton actor [akka://test/system/sharding/TestConsumer-1Coordinator/singleton]

That's a grand total of 6 seconds - and there's several other asserts that happen between the 25 second one and the one prior.

{
_log.Debug("End at [{0}]", job.SeqNr);
EndReplyTo.Tell(new Collected(_processed.Select(c => c.Item1).ToImmutableHashSet(), _messageCount + 1));
Context.Stop(Self);
}
else if (EndCondition(job))
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the figure - if the EndCondition is satisfied before we've processed any messages, we're probably being recreated via the ProducerController<T> + ShardRegion, therefore we want to ignore without sending anymore Collected responses.

@Aaronontheweb Aaronontheweb merged commit 952d0ff into akkadotnet:dev May 10, 2023
@Aaronontheweb Aaronontheweb deleted the harden-ReliableDeliverySharding-unconfirmed-spec branch May 10, 2023 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant