-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Use custom url for s3 using AWS_ENDPOINT_URL #36770
Comments
Arrow uses AWS' C++ SDK. From this table it appears that the C++ SDK does not yet support this feature: Perhaps the best place to advocate for this feature would be on the AWS C++ SDK repo. If it is added there then we would pick up support for it automatically once we upgraded to the latest SDK version. That being said, it should be possible for us to provide support for this, even if the SDK does not, in case someone wanted to create a PR. |
This may work: diff --git a/cpp/src/arrow/filesystem/s3fs.cc b/cpp/src/arrow/filesystem/s3fs.cc
index c57fc7f29..b0c2d973e 100644
--- a/cpp/src/arrow/filesystem/s3fs.cc
+++ b/cpp/src/arrow/filesystem/s3fs.cc
@@ -339,6 +339,7 @@ Result<S3Options> S3Options::FromUri(const Uri& uri, std::string* out_path) {
}
bool region_set = false;
+ bool endpoint_override_set = false;
for (const auto& kv : options_map) {
if (kv.first == "region") {
options.region = kv.second;
@@ -347,6 +348,7 @@ Result<S3Options> S3Options::FromUri(const Uri& uri, std::string* out_path) {
options.scheme = kv.second;
} else if (kv.first == "endpoint_override") {
options.endpoint_override = kv.second;
+ endpoint_override_set = true;
} else if (kv.first == "allow_bucket_creation") {
ARROW_ASSIGN_OR_RAISE(options.allow_bucket_creation,
::arrow::internal::ParseBoolean(kv.second));
@@ -357,6 +359,12 @@ Result<S3Options> S3Options::FromUri(const Uri& uri, std::string* out_path) {
return Status::Invalid("Unexpected query parameter in S3 URI: '", kv.first, "'");
}
}
+ if (!endpoint_override_set) {
+ auto endpoint = std::getenv("AWS_ENDPOINT_URL");
+ if (endpoint) {
+ options.endpoint_override = endpoint;
+ }
+ }
if (!region_set && !bucket.empty() && options.endpoint_override.empty()) {
// XXX Should we use a dedicated resolver with the given credentials? BTW, the following will work with the current implementation: file = pq.ParquetFile(f's3://mybucket/my_file.parquet?endpoint_override={os.environ["AWS_ENDPOINT_URL"]}') |
I think it should just be in cpp sdk then. Raised an issue there |
… AWS_ENDPOINT_URL (#36791) ### Rationale for this change we need a way to read custom object storage (such as minio host or other s3-like storage). use environment variable `AWS_ENDPOINT_URL ` ### What changes are included in this PR? set variable endpoint_override according the environment variable ### Are these changes tested? unittest and tested on pyarrow ### Are there any user-facing changes? No * Closes: #36770 Authored-by: yiwei.wang <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
@adbmal Could you add |
take |
I think it would make sense to document this in https://arrow.apache.org/docs/cpp/env_vars.html. Happy to file an issue and submit a patch for that. |
It's a good idea! Please do it! |
@kou I think it is a minor change, no need to file an issue, here is the PR, please review. |
…riable AWS_ENDPOINT_URL (apache#36791) ### Rationale for this change we need a way to read custom object storage (such as minio host or other s3-like storage). use environment variable `AWS_ENDPOINT_URL ` ### What changes are included in this PR? set variable endpoint_override according the environment variable ### Are these changes tested? unittest and tested on pyarrow ### Are there any user-facing changes? No * Closes: apache#36770 Authored-by: yiwei.wang <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…riable AWS_ENDPOINT_URL (apache#36791) ### Rationale for this change we need a way to read custom object storage (such as minio host or other s3-like storage). use environment variable `AWS_ENDPOINT_URL ` ### What changes are included in this PR? set variable endpoint_override according the environment variable ### Are these changes tested? unittest and tested on pyarrow ### Are there any user-facing changes? No * Closes: apache#36770 Authored-by: yiwei.wang <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Describe the enhancement requested
AWS_ENDPOINT_URL is now supported by the AWS for custom url (for example localhost).
More info about it docs and original github issue. It was merged into botocore in this pr.
What I can do with boto:
What I have to do in pyarrow:
What I would like to do in pyarrow:
This will allow me to use
s3://
instead of creating file systemComponent(s)
Python
The text was updated successfully, but these errors were encountered: