-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS S3 Integration #1
Comments
The one I started on but haven't really had time to move on with, not based on the aws java sdk but on top of akka http client: https://github.com/johanandren/awsync Feel free to take any bits and pieces that are useful from it. |
I'm on it. |
Could Apache jclouds be under consideration?. The idea would be to build the connector on top of the blobstore abstraction designed there. Thus, we can save a lot of time as some integrations, e.g. AWS S3, Google Cloud Storage or Azure Blobs, would be solved at once. I have some code on this direction that I can share if you like. |
Another argument that would support the use of jclouds is that the local file system could be used as just another blobstore provider. That can be very useful for testing purposes since you wouldn't be extra charged for cloud providers while maintaining the code unaltered. Changing environment would be a matter of configuration. Just my two cents. |
jClouds uses AWS S3 REST API with javax.ws.rs and sync approach. |
The AWS Java API is also a synchronous client using the AWS XML HTTP APIs, no? (even if they have some "async" way of running a request and periodically check on its status AFAIR) |
yeah going to look closely to https://github.com/aws/aws-sdk-java |
main purpose is to choose the fastest API |
The work with blobstore cloud providers always introduces some inherent latencies. I only say that it might worth it to leverage the job done by jclouds and try to not reinvent the wheel. Anyway, taking into account other blobstore providers might bring a better perspective on how to deal with the problem in a more generic way. |
Did some digging into AWS Java API. |
My vote goes for akka-http. It would be great validation for the current Akka Http Client. Also could be a driving force to implement missing features in the Akka Http Client. |
I don't think we should re-invent the wheel. If there are good client libraries we should integrate with them instead of developing the same thing again. It might not always be the optimal solution, but I think time to market and maintenance cost are more important at this stage of the project. I have no opinion (experience) of jclouds vs aws java api. |
I think pure async would be nice but probably a lot of work, I don't see what jclouds improves over the AWS Java client except for an abundance of abstraction layers, so my vote is for building it on top of the Java client for now. |
The blobstore abstraction carried out in jclouds comes from the fact that there're a lot of similarities among different cloud providers so that it's feasible, and maybe convenient, to create this abstraction layer. Something similar happens with traversing on file systems and that's the reason why Camel guys created an abstraction layer on which they built some similar connectors, e.g. File or FTP. I plan to apply this same approach on the FTP connector after the first usable version is ready. My vote is for thinking in time to market and taking advantage of previous efforts carried out by the community. Scala came with the promise of leveraging existing Java libraries and previous investments. IMHO, that's what the users expect. Having a bunch of connectors almost "for free" is something that deserves at least a closer look and evaluation. |
I agree that using existing API is good as first step. Then we have a working integration, and a more "reactive native" one can follow soon after if someone has time to do it. |
We have an akka-http based implementation that works, but is a bit rough around the edges. This includes a library for signing Akka-HTTP requests for AWS. We'd love to contribute it to the project. |
As far as question marks go -- one (nontrivial) bit in S3-land is credentials. It's be really sweet to implement a Source that produces a stream of AWS credentials, b/c it's a nice abstraction around S3 credential refreshing on EC2 boxes. |
So... looking forward to PRs. We're happy to accept either I think, if one's more tested and used in the real world we'd prefer that - your impl is @joearasin I assume, right? The signing AFAIR is general for just credentials + requests right, so would be reusable for other AWS APIs as well? Please coodrinate here with others who wanted to contribute. |
The library is bluelabsio/s3-stream -- and yeah -- the signing is reusable (and I have the signing code split off in a separate module). |
Amm so the question is who will do the initial PR and when. Migrating should be faster. @joearasin are you going to do PR? or I can do PR (probably during weekend) and you'll review it then? |
Is anyone doing anything here? I've actually put together small prove of concept with jclouds. |
I'm putting something together -- One thing I'd like to figure out before PR is there was one issue we were having a bit of a debate over, and I wanted to open it up here as to whether we should merge in the PR over there before bringing things here. Here's the issue at hand: bluelabsio/s3-stream#12 -- In particular, is caching incomplete upload chunks to the file system something that should be handled as a part of this, or is it a piece of logic that should be pushed to "outside" the S3 upload flow? |
I think we should skip this for now and add afterwards. |
@joearasin have you consider some application that mock S3 service locally? I mean I did not find tests for S3 source itself. |
That's another question worth asking. I absolutely want to test that code -- is there going to be a preferred testing approach in this repo for external services? |
We would like to use Travis, at least initially. Anything you can run on Travis is fine, we can start things from the travis startup script. Lightweight testing is of course preferred to keep build times short and in the end there might be a limit on how much things we can run on (free) Travis. |
I have used s3ninja (http://s3ninja.net/) successfully for testing. However, it is complex to embed as the main class is defined in the root package. I have written a docker based test that starts and stops the s3ninja docker container using the Spotify docker client. I am quite happy with that. s3ninja does not have all the s3 features but it is enough in general. |
It doesn't look like s3ninja does multipart uploads, which is the main use case for s3stream. I had on my own list to try out https://github.com/ianbytchek/docker-riak-cs , which is supposed to be a much more complete S3 implementation. As long as that can be brought up on Travis. |
Or one could just use WireMock perhaps, the I/O isn't going to be that much. |
Just started #24 to get this rolling. |
Cool. Next questions:
|
@agolubev Feel free to branch off mine and add what you feel could do with additional tests. I can put them on top of my branch in the PR then. |
@jypma I would rather move some settings to config file. If you are Ok |
@agolubev Makes total sense. Maybe even model them as a nice |
Isn't it easier to collaborate if we just work on master? We can add something to the build to avoid releasing this module until it's ready. |
Sure, master works for me. I'll see if I can unbreak the build tomorrow :) |
@jypma s3-ninja supports multipart upload as I am using it in another project. But if docker-riak-cs is more complete I am happy with that as well. |
S3 support landed with #24 |
Continued from akka/akka-stream-contrib#75
The text was updated successfully, but these errors were encountered: