Skip to content

Commit

Permalink
[DSIP-42] Add dolphinscheduler-aws-authentication module (#16043)
Browse files Browse the repository at this point in the history
  • Loading branch information
ruanwenjun authored May 24, 2024
1 parent baa252a commit f754611
Show file tree
Hide file tree
Showing 49 changed files with 899 additions and 592 deletions.
11 changes: 6 additions & 5 deletions deploy/kubernetes/dolphinscheduler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,12 @@ Please refer to the [Quick Start in Kubernetes](../../../docs/docs/en/guide/inst
| conf.auto | bool | `false` | auto restart, if true, all components will be restarted automatically after the common configuration is updated. if false, you need to restart the components manually. default is false |
| conf.common."alert.rpc.port" | int | `50052` | rpc port |
| conf.common."appId.collect" | string | `"log"` | way to collect applicationId: log, aop |
| conf.common."aws.credentials.provider.type" | string | `"AWSStaticCredentialsProvider"` | |
| conf.common."aws.s3.access.key.id" | string | `"minioadmin"` | The AWS access key. if resource.storage.type=S3, and credentials.provider.type is AWSStaticCredentialsProvider. This configuration is required |
| conf.common."aws.s3.access.key.secret" | string | `"minioadmin"` | The AWS secret access key. if resource.storage.type=S3, and credentials.provider.type is AWSStaticCredentialsProvider. This configuration is required |
| conf.common."aws.s3.bucket.name" | string | `"dolphinscheduler"` | The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name. |
| conf.common."aws.s3.endpoint" | string | `"http://minio:9000"` | You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn |
| conf.common."aws.s3.region" | string | `"ca-central-1"` | The AWS Region to use. if resource.storage.type=S3, This configuration is required |
| conf.common."conda.path" | string | `"/opt/anaconda3/etc/profile.d/conda.sh"` | set path of conda.sh |
| conf.common."data-quality.jar.dir" | string | `nil` | data quality option |
| conf.common."data.basedir.path" | string | `"/tmp/dolphinscheduler"` | user data local directory path, please make sure the directory exists and have read write permissions |
Expand All @@ -138,11 +144,6 @@ Please refer to the [Quick Start in Kubernetes](../../../docs/docs/en/guide/inst
| conf.common."resource.alibaba.cloud.oss.bucket.name" | string | `"dolphinscheduler"` | oss bucket name, required if you set resource.storage.type=OSS |
| conf.common."resource.alibaba.cloud.oss.endpoint" | string | `"https://oss-cn-hangzhou.aliyuncs.com"` | oss bucket endpoint, required if you set resource.storage.type=OSS |
| conf.common."resource.alibaba.cloud.region" | string | `"cn-hangzhou"` | alibaba cloud region, required if you set resource.storage.type=OSS |
| conf.common."resource.aws.access.key.id" | string | `"minioadmin"` | The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required |
| conf.common."resource.aws.region" | string | `"ca-central-1"` | The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required |
| conf.common."resource.aws.s3.bucket.name" | string | `"dolphinscheduler"` | The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name. |
| conf.common."resource.aws.s3.endpoint" | string | `"http://minio:9000"` | You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn |
| conf.common."resource.aws.secret.access.key" | string | `"minioadmin"` | The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required |
| conf.common."resource.azure.client.id" | string | `"minioadmin"` | azure storage account name, required if you set resource.storage.type=ABS |
| conf.common."resource.azure.client.secret" | string | `"minioadmin"` | azure storage account key, required if you set resource.storage.type=ABS |
| conf.common."resource.azure.subId" | string | `"minioadmin"` | azure storage subId, required if you set resource.storage.type=ABS |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ data:
{{- range $key, $value := index .Values.conf "common" }}
{{- if and $.Values.minio.enabled }}
{{- if eq $key "resource.storage.type" }}{{ $value = "S3" }}{{- end }}
{{- if eq $key "resource.aws.s3.endpoint" }}{{ $value = print "http://" (include "dolphinscheduler.minio.fullname" $) ":9000" }}{{- end }}
{{- if eq $key "aws.s3.endpoint" }}{{ $value = print "http://" (include "dolphinscheduler.minio.fullname" $) ":9000" }}{{- end }}
{{- end }}
{{ $key }}={{ $value }}
{{- end }}
Expand Down
21 changes: 13 additions & 8 deletions deploy/kubernetes/dolphinscheduler/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -250,20 +250,25 @@ conf:
# -- resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
resource.storage.upload.base.path: /dolphinscheduler

# -- The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.access.key.id: minioadmin
# The AWS credentials provider type. support: AWSStaticCredentialsProvider, InstanceProfileCredentialsProvider
# AWSStaticCredentialsProvider: use the access key and secret key to authenticate
# InstanceProfileCredentialsProvider: use the IAM role to authenticate
aws.credentials.provider.type: AWSStaticCredentialsProvider

# -- The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.secret.access.key: minioadmin
# -- The AWS access key. if resource.storage.type=S3, and credentials.provider.type is AWSStaticCredentialsProvider. This configuration is required
aws.s3.access.key.id: minioadmin

# -- The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.region: ca-central-1
# -- The AWS secret access key. if resource.storage.type=S3, and credentials.provider.type is AWSStaticCredentialsProvider. This configuration is required
aws.s3.access.key.secret: minioadmin

# -- The AWS Region to use. if resource.storage.type=S3, This configuration is required
aws.s3.region: ca-central-1

# -- The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name.
resource.aws.s3.bucket.name: dolphinscheduler
aws.s3.bucket.name: dolphinscheduler

# -- You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn
resource.aws.s3.endpoint: http://minio:9000
aws.s3.endpoint: http://minio:9000

# -- alibaba cloud access key id, required if you set resource.storage.type=OSS
resource.alibaba.cloud.access.key.id: <your-access-key-id>
Expand Down
22 changes: 10 additions & 12 deletions docs/docs/en/architecture/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,19 +191,21 @@ The default configuration is as follows:

Note that DolphinScheduler also supports zookeeper related configuration through `bin/env/dolphinscheduler_env.sh`.

For ETCD Registry, please see more details on [link](https://github.com/apache/dolphinscheduler/blob/dev/dolphinscheduler-registry/dolphinscheduler-registry-plugins/dolphinscheduler-registry-etcd/README.md).
For JDBC Registry, please see more details on [link](https://github.com/apache/dolphinscheduler/blob/dev/dolphinscheduler-registry/dolphinscheduler-registry-plugins/dolphinscheduler-registry-jdbc/README.md).
For ETCD Registry, please see more details
on [link](https://github.com/apache/dolphinscheduler/blob/dev/dolphinscheduler-registry/dolphinscheduler-registry-plugins/dolphinscheduler-registry-etcd/README.md).
For JDBC Registry, please see more details
on [link](https://github.com/apache/dolphinscheduler/blob/dev/dolphinscheduler-registry/dolphinscheduler-registry-plugins/dolphinscheduler-registry-jdbc/README.md).

### common.properties [hadoop、s3、yarn config properties]

Currently, common.properties mainly configures Hadoop,s3a related configurations. Configuration file location:

| Service | Configuration file |
|---------------|----------------------------------------|
| Master Server | `master-server/conf/common.properties` |
| Api Server | `api-server/conf/common.properties` |
| Worker Server | `worker-server/conf/common.properties` |
| Alert Server | `alert-server/conf/common.properties` |
| Service | Configuration file |
|---------------|-----------------------------------------------------------------------|
| Master Server | `master-server/conf/common.properties` |
| Api Server | `api-server/conf/common.properties`, `api-server/conf/aws.yaml` |
| Worker Server | `worker-server/conf/common.properties`, `worker-server/conf/aws.yaml` |
| Alert Server | `alert-server/conf/common.properties` |

The default configuration is as follows:

Expand All @@ -212,10 +214,6 @@ The default configuration is as follows:
| data.basedir.path | /tmp/dolphinscheduler | local directory used to store temp files |
| resource.storage.type | NONE | type of resource files: HDFS, S3, OSS, GCS, ABS, NONE |
| resource.upload.path | /dolphinscheduler | storage path of resource files |
| aws.access.key.id | minioadmin | access key id of S3 |
| aws.secret.access.key | minioadmin | secret access key of S3 |
| aws.region | us-east-1 | region of S3 |
| aws.s3.endpoint | http://minio:9000 | endpoint of S3 |
| hdfs.root.user | hdfs | configure users with corresponding permissions if storage type is HDFS |
| fs.defaultFS | hdfs://mycluster:8020 | If resource.storage.type=S3, then the request url would be similar to 's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory |
| hadoop.security.authentication.startup.state | false | whether hadoop grant kerberos permission |
Expand Down
Loading

0 comments on commit f754611

Please sign in to comment.