AWS S3 Integration #1

2m · 2016-10-20T06:22:19Z

Continued from akka/akka-stream-contrib#75

johanandren · 2016-10-20T06:31:34Z

The one I started on but haven't really had time to move on with, not based on the aws java sdk but on top of akka http client: https://github.com/johanandren/awsync

Feel free to take any bits and pieces that are useful from it.

agolubev · 2016-10-21T20:17:04Z

I'm on it.
My intention is to implement some similar API like HTTP client has (with vocabulary from AWS).
Something like S3().fromObject and S3().toObject.
Actually I'm going to use AWS Java API. So please let me know if you are strongly recommending REST.
Also will take into account API of FileIO for having file based sink/sources

juanjovazquez · 2016-10-22T19:58:52Z

Could Apache jclouds be under consideration?. The idea would be to build the connector on top of the blobstore abstraction designed there. Thus, we can save a lot of time as some integrations, e.g. AWS S3, Google Cloud Storage or Azure Blobs, would be solved at once. I have some code on this direction that I can share if you like.

juanjovazquez · 2016-10-22T20:14:44Z

Another argument that would support the use of jclouds is that the local file system could be used as just another blobstore provider. That can be very useful for testing purposes since you wouldn't be extra charged for cloud providers while maintaining the code unaltered. Changing environment would be a matter of configuration. Just my two cents.

agolubev · 2016-10-23T01:47:15Z

jClouds uses AWS S3 REST API with javax.ws.rs and sync approach.
It looks good and is the wrapper that can be mocked for testing purposes. However there can be additional delay because of several layers of abstractions.

johanandren · 2016-10-23T13:18:11Z

The AWS Java API is also a synchronous client using the AWS XML HTTP APIs, no? (even if they have some "async" way of running a request and periodically check on its status AFAIR)

agolubev · 2016-10-23T13:43:27Z

yeah going to look closely to https://github.com/aws/aws-sdk-java

agolubev · 2016-10-23T13:44:28Z

main purpose is to choose the fastest API

juanjovazquez · 2016-10-23T14:31:15Z

The work with blobstore cloud providers always introduces some inherent latencies. I only say that it might worth it to leverage the job done by jclouds and try to not reinvent the wheel. Anyway, taking into account other blobstore providers might bring a better perspective on how to deal with the problem in a more generic way.

agolubev · 2016-10-24T03:25:45Z

Did some digging into AWS Java API.
It implements async API with TransferManager and it's for upload/download objects. It is based on java.util.concurrent.Future. Minimum task is to transfer the whole object (in case of single download) or some portion in case of multiple download/upload.
It uses REST API.

agolubev · 2016-10-25T02:28:13Z

Camel AWS uses AWS Java API.
jclouds uses AWS REST API directly.
So question to core team (@ktoso @patriknw ) - do you want it:

soon and universal with jclouds or AWS API
or fast-functioning with akka-http and AWS REST?

2m · 2016-10-25T06:05:22Z

My vote goes for akka-http. It would be great validation for the current Akka Http Client. Also could be a driving force to implement missing features in the Akka Http Client.

patriknw · 2016-10-25T07:28:34Z

I don't think we should re-invent the wheel. If there are good client libraries we should integrate with them instead of developing the same thing again. It might not always be the optimal solution, but I think time to market and maintenance cost are more important at this stage of the project.

I have no opinion (experience) of jclouds vs aws java api.

johanandren · 2016-10-25T07:33:05Z

I think pure async would be nice but probably a lot of work, I don't see what jclouds improves over the AWS Java client except for an abundance of abstraction layers, so my vote is for building it on top of the Java client for now.

juanjovazquez · 2016-10-25T08:15:46Z

The blobstore abstraction carried out in jclouds comes from the fact that there're a lot of similarities among different cloud providers so that it's feasible, and maybe convenient, to create this abstraction layer. Something similar happens with traversing on file systems and that's the reason why Camel guys created an abstraction layer on which they built some similar connectors, e.g. File or FTP. I plan to apply this same approach on the FTP connector after the first usable version is ready.

My vote is for thinking in time to market and taking advantage of previous efforts carried out by the community. Scala came with the promise of leveraging existing Java libraries and previous investments. IMHO, that's what the users expect. Having a bunch of connectors almost "for free" is something that deserves at least a closer look and evaluation.

ktoso · 2016-10-25T08:18:48Z

I agree that using existing API is good as first step. Then we have a working integration, and a more "reactive native" one can follow soon after if someone has time to do it.

joearasin · 2016-10-27T14:32:05Z

We have an akka-http based implementation that works, but is a bit rough around the edges. This includes a library for signing Akka-HTTP requests for AWS. We'd love to contribute it to the project.

joearasin · 2016-10-27T14:34:01Z

As far as question marks go -- one (nontrivial) bit in S3-land is credentials. It's be really sweet to implement a Source that produces a stream of AWS credentials, b/c it's a nice abstraction around S3 credential refreshing on EC2 boxes.

ktoso · 2016-10-27T20:10:58Z

So... looking forward to PRs. We're happy to accept either I think, if one's more tested and used in the real world we'd prefer that - your impl is @joearasin I assume, right?

The signing AFAIR is general for just credentials + requests right, so would be reusable for other AWS APIs as well?

Please coodrinate here with others who wanted to contribute.

joearasin · 2016-10-27T20:12:26Z

The library is bluelabsio/s3-stream -- and yeah -- the signing is reusable (and I have the signing code split off in a separate module).

agolubev · 2016-10-27T20:42:46Z

Amm so the question is who will do the initial PR and when. Migrating should be faster. @joearasin are you going to do PR? or I can do PR (probably during weekend) and you'll review it then?

agolubev · 2016-10-31T18:12:53Z

Is anyone doing anything here? I've actually put together small prove of concept with jclouds.
Still willing to move/enhance s3-stream within alpakka (as we'll start not from scratch here)

joearasin · 2016-10-31T18:52:05Z

I'm putting something together -- One thing I'd like to figure out before PR is there was one issue we were having a bit of a debate over, and I wanted to open it up here as to whether we should merge in the PR over there before bringing things here.

Here's the issue at hand: bluelabsio/s3-stream#12 -- In particular, is caching incomplete upload chunks to the file system something that should be handled as a part of this, or is it a piece of logic that should be pushed to "outside" the S3 upload flow?

agolubev · 2016-10-31T19:03:44Z

I think we should skip this for now and add afterwards.
My opinion is if this is global settings it should be placed in config files where akka config is.
Maybe it is standard case so core team can guide here.

agolubev · 2016-10-31T19:04:47Z

@joearasin have you consider some application that mock S3 service locally? I mean I did not find tests for S3 source itself.

joearasin · 2016-10-31T19:06:53Z

That's another question worth asking. I absolutely want to test that code -- is there going to be a preferred testing approach in this repo for external services?

patriknw · 2016-11-01T08:03:41Z

We would like to use Travis, at least initially. Anything you can run on Travis is fine, we can start things from the travis startup script. Lightweight testing is of course preferred to keep build times short and in the end there might be a limit on how much things we can run on (free) Travis.

filosganga · 2016-11-01T19:11:42Z

I have used s3ninja (http://s3ninja.net/) successfully for testing. However, it is complex to embed as the main class is defined in the root package. I have written a docker based test that starts and stops the s3ninja docker container using the Spotify docker client. I am quite happy with that.

s3ninja does not have all the s3 features but it is enough in general.

jypma · 2016-11-08T13:13:32Z

It doesn't look like s3ninja does multipart uploads, which is the main use case for s3stream. I had on my own list to try out https://github.com/ianbytchek/docker-riak-cs , which is supposed to be a much more complete S3 implementation. As long as that can be brought up on Travis.

jypma · 2016-11-08T13:14:20Z

Or one could just use WireMock perhaps, the I/O isn't going to be that much.

jypma · 2016-11-08T14:21:44Z

Just started #24 to get this rolling.

agolubev · 2016-11-08T14:59:42Z

Cool. Next questions:

Are we going to merge this this PR to master or some branch will be used for some time? I'm voting for branch
Do we need to support Java API? Java tests. We can postpone but at least we need ticket for this.
Perhaps need Ticket for testing
Should we make PR review and create tickets based upon feedback? (trying to jump in and take some ticket for myself =) )

jypma · 2016-11-08T15:08:01Z

@agolubev Feel free to branch off mine and add what you feel could do with additional tests. I can put them on top of my branch in the PR then.

agolubev · 2016-11-08T15:17:27Z

@jypma I would rather move some settings to config file. If you are Ok

jypma · 2016-11-08T15:23:35Z

@agolubev Makes total sense. Maybe even model them as a nice SettingsCompanion thingy that one can then pass along, to override them in code.

patriknw · 2016-11-08T15:53:04Z

Isn't it easier to collaborate if we just work on master? We can add something to the build to avoid releasing this module until it's ready.

jypma · 2016-11-08T17:22:14Z

Sure, master works for me. I'll see if I can unbreak the build tomorrow :)

filosganga · 2016-11-10T16:15:41Z

@jypma s3-ninja supports multipart upload as I am using it in another project. But if docker-riak-cs is more complete I am happy with that as well.

2m · 2016-11-23T09:14:39Z

S3 support landed with #24

Keeping up to date with alpakka

joearasin mentioned this issue Oct 27, 2016

Figure out long-term process for this project bluelabsio/s3-stream#14

Closed

jypma mentioned this issue Nov 8, 2016

Initial import of https://github.com/bluelabsio/s3-stream #24

Merged

3 tasks

agolubev mentioned this issue Nov 11, 2016

Move S3 configuration to conf file #30

Closed

2m closed this as completed Nov 23, 2016

2m added this to the 0.2 milestone Nov 23, 2016

longshorej mentioned this issue Feb 12, 2019

mqtt-streaming: Prefer wrapping instead of reissuing packet ids #1489

Merged

DanieleSassoli referenced this issue in seglo/alpakka Dec 10, 2019

Merge pull request #1 from akka/master

2ae9f86

Keeping up to date with alpakka

AWS S3 Integration #1

AWS S3 Integration #1

Comments

2m commented Oct 20, 2016

johanandren commented Oct 20, 2016

agolubev commented Oct 21, 2016

juanjovazquez commented Oct 22, 2016

juanjovazquez commented Oct 22, 2016

agolubev commented Oct 23, 2016

johanandren commented Oct 23, 2016 • edited Loading

agolubev commented Oct 23, 2016

agolubev commented Oct 23, 2016

juanjovazquez commented Oct 23, 2016

agolubev commented Oct 24, 2016

agolubev commented Oct 25, 2016

2m commented Oct 25, 2016

patriknw commented Oct 25, 2016

johanandren commented Oct 25, 2016

juanjovazquez commented Oct 25, 2016

ktoso commented Oct 25, 2016

joearasin commented Oct 27, 2016

joearasin commented Oct 27, 2016

ktoso commented Oct 27, 2016

joearasin commented Oct 27, 2016

agolubev commented Oct 27, 2016

agolubev commented Oct 31, 2016

joearasin commented Oct 31, 2016 • edited Loading

agolubev commented Oct 31, 2016 • edited Loading

agolubev commented Oct 31, 2016

joearasin commented Oct 31, 2016

patriknw commented Nov 1, 2016

filosganga commented Nov 1, 2016

jypma commented Nov 8, 2016

jypma commented Nov 8, 2016

jypma commented Nov 8, 2016

agolubev commented Nov 8, 2016

jypma commented Nov 8, 2016

agolubev commented Nov 8, 2016

jypma commented Nov 8, 2016 • edited Loading

patriknw commented Nov 8, 2016

jypma commented Nov 8, 2016

filosganga commented Nov 10, 2016

2m commented Nov 23, 2016

johanandren commented Oct 23, 2016 •

edited

Loading

joearasin commented Oct 31, 2016 •

edited

Loading

agolubev commented Oct 31, 2016 •

edited

Loading

jypma commented Nov 8, 2016 •

edited

Loading