Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for datasets in authenticated S3-compatible private buckets #507

Open
unidesigner opened this issue Dec 6, 2023 · 14 comments
Open

Comments

@unidesigner
Copy link
Contributor

I am looking into supporting authenticated S3-Compatible file protocol where one could specify accessKey and secretKey to view data in a private bucket.

I imagine to be able to specify a source like this:

zarr://https://s3.us-west-004.amazonaws.com/bucketname/dataset?s3_access_key_id=accessKey&s3_secret_access_key=secretKey

which would initialize an S3Client from the aws-sdk/client-s3 SDK and make the appropriate request to get the info file and data chunks.

Where would I get started to implement this in Neuroglancer?

@unidesigner unidesigner changed the title Support for Authenticated S3-Compatible Private Buckets Support for datasets in authenticated S3-compatible private buckets Dec 6, 2023
@jbms
Copy link
Collaborator

jbms commented Dec 7, 2023

The place to add support would be here:
https://github.com/google/neuroglancer/blob/master/src/neuroglancer/util/special_protocol_request.ts

However, there are a few issues to consider:

  • You will need to know the bucket region in order to generate a correct signature. It is a bit tricky to figure it out, see e.g. the approach we take in tensorstore (https://github.com/google/tensorstore/blob/4e82af4392fe4939875d80f9fa5fd01c55beda64/tensorstore/kvstore/s3/s3_endpoint.cc#L211) but here there is the added challenge that it needs to work within the cross-origin request limitations of the browser. Alternatively you could require that the bucket region be specified manually as part of the datasource URL, though that is a bit annoying.
  • If you put the access key in the datasource URL itself, then sharing the Neuroglancer URL/state will also share the access key. That may in some cases be desired, but you would need to be careful to use an access key with only the limited privileges that you intend to grant. It would be very easy to accidentally share a more privileged access key. For GCS, I implemented ngauth, which allows neuroglancer users to access private GCS buckets. It requires that you run an ngauth server, which verifies a given user has access to a bucket, and then provides a time-limited restrict authentication token. For S3, I think there may be a way to accomplish something similar, without even the need for a server, by using AWS Cognito combined with suitable S3 access policies (https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_s3_cognito-bucket.html). I expect in many cases that would be preferable to embedding the access token directly in the URL. Nonetheless, it could make sense to still support directly embedding the access token in the URL, and that is ultimately simpler.
  • A final issue is that the access token would ideally come at the beginning of the URL, before the bucket name, rather than at the end, so that completion works.

@aaronkanzer
Copy link

aaronkanzer commented Feb 26, 2024

Hi @jbms -- it seems that when I click https://github.com/google/neuroglancer/blob/master/src/neuroglancer/util/special_protocol_request.ts -- I get a 404 -- any chance you know if the code moved?

Also, nice to meet you 👋 I work over at MIT with @kabilar and others on the LINC Project: https://connects.mgh.harvard.edu/. We are hoping to leverage neuroglancer in at least the short-term for viewing private zarrs stored in S3

@jbms
Copy link
Collaborator

jbms commented Feb 26, 2024

Updated URL is here: https://github.com/google/neuroglancer/blob/master/src/util/special_protocol_request.ts

The src/neuroglancer prefix was renamed to src/.

If you decide on the approach you will use for getting the AWS credentials in Neuroglancer I can offer more advice.

@aaronkanzer
Copy link

Updated URL is here: https://github.com/google/neuroglancer/blob/master/src/util/special_protocol_request.ts

The src/neuroglancer prefix was renamed to src/.

If you decide on the approach you will use for getting the AWS credentials in Neuroglancer I can offer more advice.

Any advice regarding AWS creds<>auth would be great, as all our assets are all hosted via S3 -- thanks in advance

Also, just tagging a few others involved in the project here for visibility @ayendiki @balbasty @MikeSchutzman

@jbms
Copy link
Collaborator

jbms commented Feb 26, 2024

The options that I've thought of are in my previous comment: #507 (comment)

The simplest thing to implement would be to use the syntax:

s3+awskey:<ACCESS_KEY>:<SECRET_KEY>://bucket/path

Additionally you could support Amazon Cogito for credentials --- that would probably be preferable in most cases but would be a bit more complicated.

You can probably use this library to handle the actual requests to s3:

https://www.npmjs.com/package/@aws-sdk/client-s3

There is also this example of using Amazon Cogito:
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/getting-started-browser.html

@aaronkanzer
Copy link

Thanks @jbms -- could you just clarify the s3+awskey:<ACCESS_KEY>:<SECRET_KEY>://bucket/path option? Unless I'm mistaken, I don't think I'd want to present these keys in plaintext -- let me know what you had in mind.

Will look further into Cognito.

@unidesigner did you ever arrive at a solution? I am looking to proof-of-concept something, but curious to understand how you approached this.

Thanks all in advance

@unidesigner
Copy link
Contributor Author

Hi @aaronkanzer - unfortunately, I have not had time to prioritize work on this. Are you working on a proof-of-concept implementation?

I'd suggest before adding complexity with Cognito to support the scheme proposed by @jbms to provide access_key and secret_key in the URL. The secret_key would indeed be provided in plain text applying Security by Obscurity.

@aaronkanzer
Copy link

Hi @aaronkanzer - unfortunately, I have not had time to prioritize work on this. Are you working on a proof-of-concept implementation?

I'd suggest before adding complexity with Cognito to support the scheme proposed by @jbms to provide access_key and secret_key in the URL. The secret_key would indeed be provided in plain text applying Security by Obscurity.

@unidesigner Yes, we are working on a proof-of-concept. After some research, we are implementing presigned cookies via AWS CloudFront.

We have a CloudFront Distribution that then sits in front of our S3 bucket, and serve neuroglancer directly
This is allowing us to fetch a handful of private chunks of data in an efficient manner (e.g. in our case, we are working heavily with .ome.zarr.).

Once we have a cleaned-up e2e solution, I'm happy to share some diagrams or example code -- would also be curious to get @jbms thoughts too, and see if we can extend support directly into neuroglancer eventually.

Cc @kabilar

@aaronkanzer
Copy link

aaronkanzer commented Oct 31, 2024

@unidesigner Realizing I never followed up here -- did you come to an implementation here?

We've been using our CloudFront solution for quite some time now with success -- let me know, happy to transfer any knowledge if helpful.

Cc @kabilar @satra

@unidesigner
Copy link
Contributor Author

Hi @aaronkanzer - no, not yet unfortunately, but it is still something I'd want to look into given time. I'd be interested in understanding your CloudFront solution! Can you post it here or write me an email [email protected]. Thank you!

@aaronkanzer
Copy link

Hi @aaronkanzer - no, not yet unfortunately, but it is still something I'd want to look into given time. I'd be interested in understanding your CloudFront solution! Can you post it here or write me an email [email protected]. Thank you!

@unidesigner -- sorry for the delay here! I've sent you an email via [email protected] to hopefully meet!

@d-v-b
Copy link
Contributor

d-v-b commented Dec 16, 2024

I'm also very interested in this! @aaronkanzer would you have time to email me at [email protected], or is there code / documentation I could look at?

@joshmoore
Copy link
Contributor

I'm interested as well.

@kabilar
Copy link

kabilar commented Dec 16, 2024

Hi all, thanks for your interest. The high-level overview is documented here. Would also be happy to meet. Here is a scheduling link for Aaron and I.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants