Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunker should be able to use temp files instead of memory #12

Open
jypma opened this issue Sep 21, 2016 · 3 comments
Open

Chunker should be able to use temp files instead of memory #12

jypma opened this issue Sep 21, 2016 · 3 comments

Comments

@jypma
Copy link
Contributor

jypma commented Sep 21, 2016

The chunker at the moment requires (at least) 5MB of memory for every ongoing upload stream. With 100 concurrent connections, that'll easily eat a Java heap with nothing left over.

Buffering to temp files instead should not give a considerable overhead if it stays within disk cache, but allow the general system to scale much further, if one can live with (max S3 upload rate) = (max disk read speed).

@filosganga
Copy link

I think will be good to be configurable ideally, with at least 3 options:

  • on memory
  • on memory (off heap)
  • on disk

@joearasin
Copy link
Contributor

Interesting -- I hadn't thought about scaling to this extent. What sort of use case are we talking about? I'm picturing someone forking off a bunch of streams, leaving them open, and pushing data into them.

@jypma
Copy link
Contributor Author

jypma commented Sep 21, 2016

We are building a (huge) document storage system, potentially saving many concurrent documents at the same time. Some of them small, some of them up to several 100 MB. I expect the operations to be mostly I/O bound, and hence senseful to leave many upload streams to S3 open simultaneously. At least up to the extent that we're saturating our upload bandwidth from EC2.

jypma added a commit to jypma/s3-stream that referenced this issue Sep 26, 2016
This allows a (much) creater amount of upload streams to run in parallel.
jypma added a commit to jypma/s3-stream that referenced this issue Sep 26, 2016
This allows a (much) creater amount of upload streams to run in parallel.
jypma added a commit to jypma/s3-stream that referenced this issue Sep 26, 2016
This allows a (much) creater amount of upload streams to run in parallel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants