-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunker should be able to use temp files instead of memory #12
Comments
I think will be good to be configurable ideally, with at least 3 options:
|
Interesting -- I hadn't thought about scaling to this extent. What sort of use case are we talking about? I'm picturing someone forking off a bunch of streams, leaving them open, and pushing data into them. |
We are building a (huge) document storage system, potentially saving many concurrent documents at the same time. Some of them small, some of them up to several 100 MB. I expect the operations to be mostly I/O bound, and hence senseful to leave many upload streams to S3 open simultaneously. At least up to the extent that we're saturating our upload bandwidth from EC2. |
This allows a (much) creater amount of upload streams to run in parallel.
This allows a (much) creater amount of upload streams to run in parallel.
This allows a (much) creater amount of upload streams to run in parallel.
The chunker at the moment requires (at least) 5MB of memory for every ongoing upload stream. With 100 concurrent connections, that'll easily eat a Java heap with nothing left over.
Buffering to temp files instead should not give a considerable overhead if it stays within disk cache, but allow the general system to scale much further, if one can live with (max S3 upload rate) = (max disk read speed).
The text was updated successfully, but these errors were encountered: