Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No documentation on how to actually use the S3 backend #442

Closed
jakebolewski opened this issue Mar 13, 2018 · 22 comments
Closed

No documentation on how to actually use the S3 backend #442

jakebolewski opened this issue Mar 13, 2018 · 22 comments
Assignees
Milestone

Comments

@jakebolewski
Copy link
Contributor

We have no easy to use guide / docs on how to setup the S3 backend (AWS), connect to a bucket (with correct config settings), write and read back some data.

@jakebolewski jakebolewski changed the title No documentation for how to actually use the S3 backend No documentation on how to actually use the S3 backend Mar 13, 2018
@jakebolewski jakebolewski added this to the 1.2.2 milestone Mar 13, 2018
@jakebolewski
Copy link
Contributor Author

jakebolewski commented Mar 13, 2018

We should port over the previous example documentation for the point release, and work on setting up a tutorial style guide for the next release cycle.

@stavrospapadopoulos
Copy link
Member

We should also add some code examples.

@stavrospapadopoulos stavrospapadopoulos self-assigned this Mar 14, 2018
@deorbit
Copy link

deorbit commented Mar 15, 2018

Some confusion may arise from the endpoint_override field of the config. The name looks to be derived from the ClientConfiguration class in the AWS C++ SDK, but in the context of TileDB, requiring this field to be set suggests that something other than what should normally be expected is happening by default. What should go in that field in the normal scenario? Is it simply s3.amazonaws.com?

An example combining S3 config setup and bucket write/read would be very helpful.

@tdenniston
Copy link
Contributor

tdenniston commented Mar 15, 2018

Hi @deorbit, hopefully we will have some better documentation and examples soon, but in the meantime here is a sample configuration that may help you get started.

Let's assume you are running your TileDB program on an EC2 instance, and you are trying to create a new TileDB array in an existing S3 bucket in the us-east-1 region. Using the C API, your configuration would look something like this:

tiledb_config_t *config = NULL;
tiledb_config_t *error = NULL;

tiledb_config_create(&config, &error);
tiledb_config_set(config, "vfs.s3.scheme", "https", &error);
tiledb_config_set(config, "vfs.s3.region", "us-east-1", &error);
tiledb_config_set(config, "vfs.s3.endpoint_override", "", &error);
tiledb_config_set(config, "vfs.s3.use_virtual_addressing", "true", &error);

tiledb_ctx_t *ctx;
tiledb_ctx_create(&ctx, config);

// Create your array schema as normal, defining dimensions, the
// domain, attributes, etc.

tiledb_array_create(ctx, "s3://my-bucket-name/array-name", array_schema);

tiledb_config_free(&config);
tiledb_ctx_free(&ctx);

Let me know if this does not work for you, or you have other questions.

@deorbit
Copy link

deorbit commented Mar 15, 2018

Thanks @tdenniston. I'm using the Python API as follows:

config = tiledb.Config()
config["vfs.s3.scheme"] = "https" 
config["vfs.s3.region"] = "us-east-2"
config["vfs.s3.endpoint_override"] = ""
config["vfs.s3.use_virtual_addressing"] = True
tdb_ctx = tiledb.Ctx(config=config)
array_name = "s3://deorbit.tiledb"
...<blah blah>...
tiledb.DenseArray(tdb_ctx, array_name,...<blah blah>...)

An exception is thrown:

tiledb.libtiledb.TileDBError: [TileDB::Config] Error: Cannot set parameter; Invalid S3 use virtual addressing

@stavrospapadopoulos
Copy link
Member

Please use:

config["vfs.s3.use_virtual_addressing"] = "true"

Sorry about that :). The Python API uses string key/value pairs for the configs.

@deorbit
Copy link

deorbit commented Mar 15, 2018

Ah, so do I need to explicitly create a tiledb.ArraySchema now? I've been working without it when running an array off the local filesystem.

(I'm seeing tiledb.libtiledb.TileDBError: [TileDB::StorageManager] Error: Cannot load array schema; Schema file not found)

@stavrospapadopoulos
Copy link
Member

It seems that TileDB cannot find the array you attempt to read. Could you please share the code with which you create the array?

@deorbit
Copy link

deorbit commented Mar 15, 2018

Here's how, in addition to the above config setup. Works locally. What else do I need for S3? Do I need to explicitly configure a schema when working with S3?

tiledb.DenseArray(tdb_ctx, array_name,
                        domain = tdb_domain, 
                        attrs = (attrX,),
                        cell_order = 'row-major',
                        tile_order = 'row-major')

@stavrospapadopoulos
Copy link
Member

No, it should work exactly as it works locally. Can you please check that

  • you built TileDB enabling S3 (this is not enabled by default)
  • you properly exported your AWS keys as
export AWS_ACCESS_KEY_ID=<your-access-key-id>
export AWS_SECRET_ACCESS_KEY=<your-secret-access-key>

cc @jakebolewski

@deorbit
Copy link

deorbit commented Mar 15, 2018

Keys are set. Does the Homebrew build have S3 enabled?

@stavrospapadopoulos
Copy link
Member

stavrospapadopoulos commented Mar 15, 2018

Not by default. You need to run:

brew install tiledb --with-s3

This is documented here but we need to add it to docs.tiledb.io as well.

@deorbit
Copy link

deorbit commented Mar 15, 2018

That was it. I ran brew install tiledb --with-s3. An additional pip install tiledb got it working in my Python virtual environment. Buckets are being created. Thanks!

@stavrospapadopoulos
Copy link
Member

Awesome! Massive S3 performance optimizations are coming up. Please stay tuned!

@deorbit
Copy link

deorbit commented Mar 15, 2018

Great! We will take full advantage of them.

stavrospapadopoulos added a commit that referenced this issue Mar 15, 2018
Added some S3 documentation. Closes #442
@stavrospapadopoulos
Copy link
Member

Reopening as we are having some issues building the added S3 documentation on RTD. Will address this very soon.

@stavrospapadopoulos
Copy link
Member

jakebolewski pushed a commit that referenced this issue Mar 17, 2018
PR #458

(cherry picked from commit 5b806cf)
@deeTEEcee
Copy link

does this work with aws sso login? (e.g: if I set someting like export AWS_PROFILE=<name>)

@deeTEEcee
Copy link

deeTEEcee commented Dec 5, 2023

It seems to work, I just had to run this on top of the existing aws sso setup: https://github.com/victorskl/yawsso

@ihnorton
Copy link
Member

ihnorton commented Dec 5, 2023

Hi @deeTEEcee,

We support both session tokens and assumerole, please see: https://docs.tiledb.com/main/how-to/backends/s3#aws-security-credentials

AWS_PROFILE should work via the SDK defaults if no other credential source is specified. You can also override the order with the tiledb config option vfs.s3.config_source, which takes the values:

   * - `auto` (TileDB config options are considered first,
   *    then SDK-defined precedence: env vars, config files, ec2 metadata),
   * - `config_files` (forces SDK to only consider options found in aws
   *    config files).

@vtrifonov-altos
Copy link

vtrifonov-altos commented Feb 3, 2024

How do I setup SSO auth for TileDB? I get the error below when I do soma.Experiment.open(s3_path). I am pretty sure the path exists and works: in Python I can access with s3fs.S3FileSystem(profile=...). Our company setup is through SSO and using s3fs starts up a prompt to authenticate in a browser. With TileDB I get some error about redirecting. I tried setting region, key, secret, etc. with tiledb.default_ctx to no avail.

error message:
TileDBError: [TileDB::GroupDirectory] Error: Error while listing with prefix .... Unable to parse ExceptionName: PermanentRedirect Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.

@ihnorton
Copy link
Member

ihnorton commented Feb 5, 2024

Hi @vtrifonov-altos, could you please email isaiah <at> tiledb.com, we'll have to get some more details about your setup. (we'll update docs and cross-link here if we make any updates to clarify this for others in the future)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants