Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent 10.27.0 kills application with SQS queue dependency #2645

Closed
peterkiss1 opened this issue Jul 25, 2024 · 3 comments · Fixed by #2646
Closed

Agent 10.27.0 kills application with SQS queue dependency #2645

peterkiss1 opened this issue Jul 25, 2024 · 3 comments · Fixed by #2646
Labels
bug Something isn't working community To tag external issues and PRs

Comments

@peterkiss1
Copy link

With agent 10.27.0 our application start failing after a while; the issue isn't happening instantly after start.

Description
Followings are observed:

System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at Amazon.SQS.Model.Internal.MarshallTransformations.ReceiveMessageRequestMarshaller.Marshall(ReceiveMessageRequest publicRequest)
   at Amazon.Runtime.Internal.Marshaller.PreInvoke(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.Marshaller.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.MetricsHandler.InvokeAsync[T](IExecutionContext executionContext)
   at CompanyName.ApplicationIntegration.Queues.Amazon.SQS.Buffered.BufferedSqsQueue`1.GetBatchAsync(CancellationToken cancellationToken)
   at CompanyName.ApplicationIntegration.Queues.BatchMessageConsumer`1.ConsumeAsync(CancellationToken cancellationToken)
   at CompanyName.AppName.Api.Hosting.OrderExchangeBackgroundService.CallbackAsync(OrderExchangeBackgroundServiceConfiguration configuration, CancellationToken cancellationToken) in /src/Api/Hosting/OrderExchangeBackgroundService.cs:line 26
   at CompanyName.Extensions.Hosting.Continuous.ContinuousHostedService`2.CallbackAsync(CancellationToken stoppingToken)
System.IO.IOException: Unable to write data to the transport connection: Broken pipe.
 ---> System.Net.Sockets.SocketException (32): Broken pipe
 --- End of inner exception stack trace ---
 at Amazon.Runtime.HttpWebRequestMessage.GetResponseAsync(CancellationToken cancellationToken)
 at Amazon.Runtime.Internal.HttpHandler`1.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.Unmarshaller.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.SQS.Internal.ValidationResponseHandler.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.ErrorHandler.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.ErrorHandler.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.Signer.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.EndpointDiscoveryHandler.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.EndpointDiscoveryHandler.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
 at Amazon.Runtime.Internal.MetricsHandler.InvokeAsync[T](IExecutionContext executionContext)
 at CompanyName.ApplicationIntegration.Queues.Amazon.SQS.Buffered.BufferedSqsQueue`1.GetBatchAsync(CancellationToken cancellationToken)
 at CompanyName.Extensions.Commands.CommandProcessor`1.ProcessAsync(CancellationToken cancellationToken)
 at CompanyName.AppName.Api.Hosting.CommandProcessorBackgroundService.CallbackAsync(HostedServiceConfiguration value, CancellationToken cancellationToken) in /src/Api/Hosting/CommandProcessorBackgroundService.cs:line 20
 at CompanyName.Extensions.Hosting.Continuous.ContinuousHostedService`2.CallbackAsync(CancellationToken stoppingToken)

When the 2nd issue pops up the application unable to process anything.
Based on the errors the first one is probably due to a multi threaded access to something that doesn't support it at all (not sure if the AWS SDK contains any extensibility point which would let you to inject your headers), the second one can come from some socket leak.

image

The memory allocation and GC collect chart is also clearly reflect where we had the bad agent deployed with out application.

Response time reporting from SQS as an external service also broken:

image

Besides I don't understand why your implementation using dynamic in 2024? Where we all know it is not being maintained and it is leaking memory. I assume more questionable stuff could be found in the change or the code base.

Expected Behavior
Instrumentation should be (almost) invisible.

Steps to Reproduce
Basic testing should show you the issue but might need to run an application for an extended time to be more visible.

Your Environment
.NET 8 with current AWS SDK version
Rolling back to Agent 10.26.0 solves all our problem (we also don't need this instrumentation at all, passed trace headers are possibly wrong also).

@peterkiss1 peterkiss1 added the bug Something isn't working label Jul 25, 2024
@workato-integration
Copy link

@github-actions github-actions bot added the community To tag external issues and PRs label Jul 25, 2024
@tippmar-nr
Copy link
Member

Hi @peterkiss1 and thanks for reporting this issue. We're actively investigating and will advise when we have an update.

@tippmar-nr
Copy link
Member

Hi again @peterkiss1 - we think we know what the issue was and are working on a fix.

The clue we needed was in one of your replies on the issue you posted on the awssdk.net repo where you indicated that portions of the ReceiveMessageRequest properties are shared between instances of that object.

Our code is adding distributed trace headers to the ReceiveMessageRequest.MessageAttributeNames collection and assumed (incorrectly) that the collection would be a unique instance each time. Because we were adding the headers on every call, you saw memory growth that looked like a leak. And for the same reason, there was a race condition where one message might be in the process of marshalling and iterating that collection while our code was adding headers to it in another thread.

The second issue you report related to the broken pipe is unlikely to be caused by anything the .NET agent is doing -- we intercept the RuntimePipeline.InvokeAsync() method prior to and after it has executed and our code is generally passive - only extracting data that we need for our instrumentation.

We should have a fix for this issue in our next release. Please test it if you're able and let us know if you still see the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community To tag external issues and PRs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants