-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload manager does not re-use buffer pools across requests #2036
Comments
Hello @Rsm10, thank you for reaching out to us. I believe there is a couple reasons to why a lot of memory is being used. The first being that the marshalers take a lot of memory due to the use of reflection, and secondly the lack of pooling as you had stated. I'll go ahead and mark this as a feature request. If you have any additional questions, please let us know! |
I think there might be something else at play here. I updated the @Rsm10 could you provide more information about how the Uploader is being used, e.g the type of value used for the UploadInput.Body parameter. Is the upload type a os.File, bytes.Buffer, bytes.Reader, etc? If you're able to profile the issue you're experiencing in more detail it would significantly help investigate this issue further. I used the following to benchmark the Upload: (I realized the previous version of my tested had a bug causing it to have a empty body for all but first request, but after fixing still not a noticeable difference) // go test -bench=BenchmarkUpload -benchmem -benchtime 10s -run NONE -cpu 1,2,4 ./service/s3/s3manager/
func BenchmarkUpload(b *testing.B) {
createUploadBody := []byte(`
<?xml version="1.0" encoding="UTF-8"?>
<InitiateMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Bucket>Bucket</Bucket>
<Key>Key</Key>
<UploadId>uploadid</UploadId>
</InitiateMultipartUploadResult>
`)
completeUploadBody := []byte(`
<CompleteMultipartUpload>
<Part>
</Part>
</CompleteMultipartUpload>
`)
svc := s3.New(unit.Session)
svc.Handlers.Send.Swap(corehandlers.SendHandler.Name, request.NamedHandler{
Name: "mock.send",
Fn: func(r *request.Request) {
io.CopyN(ioutil.Discard, r.HTTPRequest.Body, r.HTTPRequest.ContentLength)
req := r.HTTPRequest
var body []byte
switch {
case strings.Contains(req.URL.RawQuery, "uploads") && req.Method == "POST":
body = createUploadBody
case strings.Contains(req.URL.RawQuery, "uploadId") && req.Method == "POST":
body = completeUploadBody
}
req.Body.Close()
respHeader := &http.Header{}
respHeader.Set("Content-Length", strconv.Itoa(len(body)))
r.HTTPResponse = &http.Response{
Status: "OK",
StatusCode: 200,
Header: http.Header{},
ContentLength: int64(len(body)),
Body: ioutil.NopCloser(bytes.NewReader(body)),
}
},
})
u := s3manager.NewUploaderWithClient(svc)
bodyBuf := make([]byte, 30*1024)
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
_, err := u.Upload(&s3manager.UploadInput{
Bucket: aws.String("Bucket"),
Key: aws.String("Key"),
// Use a Bytes Buffer since it is not seekable.
Body: bytes.NewBuffer(bodyBuf),
})
if err != nil {
b.Fatalf("expect no error, got %v", err)
}
}
})
} |
UploadInput.Body is being supplied the body from a get response (GetObjectOutput.Body). Here is the pprof output (-alloc_space) after running one of my tests (200,000 30 KB objects): Showing top 10 nodes out of 14 and without alloc_space: (pprof) top1 |
So noticed this while profiling memory usage when using the upload manager for small objects.
I have a part size of 64 MB and concurrency set to 2 for my single instance of the upload manager, since I don't want to use multipart for objects less than this size. However, I'd like to just use a single interface for all my s3 requests (big and small). The problem is that using this instance of the upload manager uses a very large amount of memory. For instance, when I perform 200,000 30 KB PUTs using this single upload manager (only doing 2 concurrent manager.Upload() calls at a time); I end up consuming 3.5 GB of memory while these requests are going on.
Looking into it further, I see that in upload.go, the "func (u *uploader) init() {" function reassigns u.bufferPool. And init() is called per-request. Hence there is no sharing of buffer pools across requests. I am wondering if this is intentional? It seems like it would make sense to have a shared buffer pool per-upload manager used by all requests? Currently the buffer pool only is used within a request.
I guess in general, I am wondering what the recommended use of s3manager.Uploader? Is it meant for this sort of workload?
Cheers,
Robin M
The text was updated successfully, but these errors were encountered: