Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fault injector: fails with OOM because it buffers content #7463

Closed
lmolkova opened this issue Dec 14, 2023 · 3 comments · Fixed by #7466
Closed

Fault injector: fails with OOM because it buffers content #7463

lmolkova opened this issue Dec 14, 2023 · 3 comments · Fixed by #7466
Assignees
Labels
needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team.

Comments

@lmolkova
Copy link
Member

lmolkova commented Dec 14, 2023

Storage team uses HTTP fault injector in the stress tests.
The fault-injector is super-helpful in finding tricky download issues related to connection-level problems.

Given the nature of stress tests, they run with relatively high concurrency (100s of requests per second).
It allows to test race condition, threading issues, unnecessary content buffering, retry behavior, etc.

However when testing big content (~50 MB, 100 concurrent operations), the fault-injector OOMs:

[09:47:26.803] Upstream Request
  [09:47:26.803] URL: https://s8f777.blob.core.windows.net/stress-c4ca375f-ceb8-4ca0-beac-0d25abece642/blob-bae15f8d-2fc3-41ba-b999-6a9676f36993
  [09:47:26.803] Headers:
  [09:47:26.803]   Accept:application/xml
  [09:47:26.803]   User-Agent:azsdk-java-azure-storage-blob/12.24.1 (11.0.21; Linux; 5.15.133.1-1.cm2)
  [09:47:26.803]   Authorization:SharedKey s8f777:m2KkWBKZJNoQSo5G8WDhY4xDhjyjl0Zql968HxR0Rws=
  [09:47:26.803]   Date:Thu, 14 Dec 2023 21:47:26 GMT
> [09:47:26.803]   traceparent:00-613722d5151ce0920e1a18205bf8b9a5-1009444b2ed4b655-01
  [09:47:26.803]   x-ms-version:2023-08-03
  [09:47:26.803]   x-ms-client-request-id:24f5db88-1e2b-43c9-924c-e385b98fab1c
  [09:47:26.803] Sending request to upstream server...
  System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
     at System.IO.MemoryStream..ctor(Int32 capacity)
     at System.Net.Http.HttpContent.CreateMemoryStream(Int64 maxBufferSize, Exception& error)
     at System.Net.Http.HttpContent.CreateTemporaryBuffer(Int64 maxBufferSize, MemoryStream& tempBuffer, Exception& error)
     at System.Net.Http.HttpContent.LoadIntoBufferAsync(Int64 maxBufferSize, CancellationToken cancellationToken)
     at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts,
CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
     at Azure.Sdk.Tools.HttpFaultInjector.Program.SendUpstreamRequest(HttpRequest request) in /src/Azure.Sdk.Tools.HttpFaultInjector/Program.cs:line 215

presumably because it buffers all the content here:

Content = await upstreamResponseMessage.Content.ReadAsByteArrayAsync()

Fault-injector memory consumption (blue on the pic below) is 3x times bigger than test code memory consumption (red)
image

Fault injector does not need to buffer the content and can return the source stream back to the client (or do a proxy stream that does not retain anything in memory).

@github-actions github-actions bot added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Dec 14, 2023
@mikeharder
Copy link
Member

Streaming the upstream response might be tricky, because you need to keep the connection to upstream alive until streaming to downstream is complete. If the fault-injector client blocks for too long, upstream might force close the connection due to inactivity.

If we do add streaming, I would add a command-line option to enable it, but use buffering by default. Streaming might work for some scenarios but not others.

@mikeharder
Copy link
Member

Another option that might be simpler, would be to buffer the upstream response in a temporary file on disk, and delete the temporary file after the downstream response is finished. However, disk perf might be too slow for your high-throughput scenarios.

@lmolkova
Copy link
Member Author

lmolkova commented Dec 15, 2023

the connection will stay alive as long as client is reading from the stream (and frequently enough to avoid timeout). For fault injection test cases it's ok if either of the connections goes idle - the client will just retry, so I don't see why streaming can be worse than buffering (which is guaranteed to cause issues).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants