Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-0057: Auto Region #57

Merged
merged 5 commits into from
Feb 24, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions docs/rfcs/0057-auto-region.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
- Proposal Name: `auto_region`
- Start Date: 2022-02-24
- RFC PR: [datafuselabs/opendal#57](https://github.com/datafuselabs/opendal/pull/57)
- Tracking Issue: [datafuselabs/opendal#58](https://github.com/datafuselabs/opendal/issues/58)

# Summary

Automatically detecting user's s3 region.

# Motivation

Current behavior for `region` and `endpoint` is buggy. `endpoint=https://s3.amazonaws.com` and `endpoint=""` are expected to be the same, because `endpoint=""` means take the default value `https://s3.amazonaws.com`. However, they aren't.

S3 SDK has a mechanism to construct the correct API endpoint. It works like `format!("s3.{}.amazonaws.com", region)` internally. But if we specify the endpoint to `https://s3.amazonaws.com`, SDK will take this endpoint static.

So users could meet errors like:

```shell
attempting to access must be addressed using the specified endpoint
```

Automatically detecting the user's s3 region will help resolve this problem. Users don't need to care about the region anymore, `OpenDAL` will figure it out. Everything works regardless of whether the input is `s3.amazonaws.com` or `s3.us-east-1.amazonaws.com`.

# Guide-level explanation

`OpenDAL` will remove `region` option, and users only need to set the `endpoint` now.

Valid input including:

- `https://s3.amazonaws.com`
- `https://s3.us-east-1.amazonaws.com`
- `https://oss-ap-northeast-1.aliyuncs.com`
- `http://127.0.0.1:9000`

`OpenDAL` will handle the `region` internally and automatically.

# Reference-level explanation

S3 services support mechanism to indicate the correct region on itself.

Sending a `HEAD` request to `<endpoint>/<bucket>` will get a response like:

```shell
:) curl -I https://s3.amazonaws.com/databend-shared
HTTP/1.1 301 Moved Permanently
x-amz-bucket-region: us-east-2
x-amz-request-id: NPYSWK7WXJD1KQG7
x-amz-id-2: 3FJSJ5HACKqLbeeXBUUE3GoPL1IGDjLl6SZx/fw2MS+k0GND0UwDib5YQXE6CThiQxpYBWZjgxs=
Content-Type: application/xml
Date: Thu, 24 Feb 2022 05:15:13 GMT
Server: AmazonS3
```

`x-amz-bucket-region: us-east-2` will be returned, and we can use this region to construct the correct endpoint for this bucket:

```shell
:) curl -I https://s3.us-east-2.amazonaws.com/databend-shared
HTTP/1.1 403 Forbidden
x-amz-bucket-region: us-east-2
x-amz-request-id: 98CN5MYV3GQ1XMPY
x-amz-id-2: Tdxy36bRRP21Oip18KMQ7FG63MTeXOpXdd5/N3izFH0oalPODVaRlpCkDU3oUN0HIE24/ezX5Dc=
Content-Type: application/xml
Date: Thu, 24 Feb 2022 05:16:57 GMT
Server: AmazonS3
```

It also works for S3 compilable services like minio:

```shell
# Start minio with `MINIO_SITE_REGION` configured
:) MINIO_SITE_REGION=test minio server .
# Sending request to minio bucket
:) curl -I 127.0.0.1:9900/databend
HTTP/1.1 403 Forbidden
Accept-Ranges: bytes
Content-Length: 0
Content-Security-Policy: block-all-mixed-content
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Bucket-Region: test
X-Amz-Request-Id: 16D6A12DCA57E0FA
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block
Date: Thu, 24 Feb 2022 05:18:51 GMT
```

We can use this mechanism to detect `region` automatically. The algorithm works as follows:

- If `endpoint` is empty, fill it will `https://s3.amazonaws.com` and the corresponding template: `https://s3.{region}.amazonaws.com`.
- Sending a `HEAD` request to `<endpoint>/<bucket>`.
- If got `200` or `403` response, the endpoint works.
- Use this endpoint directly without filling the template.
- Take the header `x-amz-bucket-region` as the region to fill the endpoint.
- Use the fallback value `us-east-1` to make SDK happy if the header not exists.
- If got a `301` response, the endpoint needs construction.
- Take the header `x-amz-bucket-region` as the region to fill the endpoint.
- Return an error to the user if not exist.
- If got `404`, the bucket could not exist, or the endpoint is incorrect.
- Return an error to the user.

# Drawbacks

None.

# Rationale and alternatives

## Use virtual style `<bucket>.<endpoint>`?

The virtual style works too. But not all services support this kind of API endpoint. For example, using `http://testbucket.127.0.0.1` is wrong, and we need to do extra checks.

Using `<endpoint>/<bucket>` makes everything easier.

## Use `ListBuckets` API?

`ListBuckets` requires higher permission than normal bucket read and write operations. It's better to finish the job without requesting more permission.

## Misbehavior S3 Compilable Services

Many services didn't implement S3 API correctly.

Aliyun OSS will return `404` for every bucket:

```shell
:) curl -I https://aliyuncs.com/<my-existing-bucket>
HTTP/2 404
date: Thu, 24 Feb 2022 05:32:57 GMT
content-type: text/html
content-length: 690
ufe-result: A6
set-cookie: thw=cn; Path=/; Domain=.taobao.com; Expires=Fri, 24-Feb-23 05:32:57 GMT;
server: Tengine/Aserver
```

QingStor Object Storage will return `307` with the `Location` header:

```shell
:) curl -I https://s3.qingstor.com/community
HTTP/1.1 301 Moved Permanently
Server: nginx/1.13.6
Date: Thu, 24 Feb 2022 05:33:55 GMT
Connection: keep-alive
Location: https://pek3a.s3.qingstor.com/community
X-Qs-Request-Id: 05b83b615c801a3d
```

In this proposal, we will not figure them out. It's easier for the user to fill the correct endpoint instead of automatically detecting them.

# Prior art

None

# Unresolved questions

None

# Future possibilities

None