-
Notifications
You must be signed in to change notification settings - Fork 66
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1322 from yuvipanda/s3-terraform
Add scratch bucket functionality for AWS
- Loading branch information
Showing
10 changed files
with
241 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,9 +9,9 @@ improving the security posture of our hubs. | |
This page lists various features we offer around access to cloud resources, | ||
and how to enable them. | ||
|
||
## GCP | ||
## How it works | ||
|
||
### How it works | ||
### GCP | ||
|
||
On Google Cloud Platform, we use [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) | ||
to map a particular [Kubernetes Service Account](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/) | ||
|
@@ -21,8 +21,19 @@ as well as dask worker pods) | |
will have the permissions assigned to the Google Cloud Service Account. | ||
This Google Cloud Service Account is managed via terraform. | ||
|
||
(howto:features:cloud-access:gcp:access-perms)= | ||
### Enabling specific cloud access permissions | ||
### AWS | ||
|
||
On AWS, we use [IRSA](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) | ||
to map a particular [Kubernetes Service Account](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/) | ||
to a particular [AWS IAM Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). | ||
All pods using the Kubernetes Service Account (user's jupyter notebook pods | ||
as well as dask worker pods) | ||
will have the permissions assigned to the AWS IAM Role. | ||
This AWS IAM Role is managed via terraform. | ||
|
||
|
||
(howto:features:cloud-access:access-perms)= | ||
## Enabling specific cloud access permissions | ||
|
||
1. In the `.tfvars` file for the project in which this hub is based off | ||
create (or modify) the `hub_cloud_permissions` variable. The config is | ||
|
@@ -44,17 +55,17 @@ This Google Cloud Service Account is managed via terraform. | |
and the cluster name together can't be more than 29 characters. `terraform` | ||
will complain if you go over this limit, so in general just use the name | ||
of the hub and shorten it only if `terraform` complains. | ||
2. `requestor_pays` enables permissions for user pods and dask worker | ||
2. (GCP only) `requestor_pays` enables permissions for user pods and dask worker | ||
pods to identify as the project while making requests to Google Cloud Storage | ||
buckets marked as 'requestor pays'. More details [here](topic:features:cloud:gcp:requestor-pays). | ||
3. `bucket_admin_access` lists bucket names (as specified in `user_buckets` | ||
terraform variable) all users on this hub should have full read/write | ||
access to. Used along with the [user_buckets](howto:features:cloud-access:gcp:storage-buckets) | ||
terraform variable to enable the [scratch buckets](topic:features:cloud:gcp:scratch-buckets) | ||
access to. Used along with the [user_buckets](howto:features:cloud-access:storage-buckets) | ||
terraform variable to enable the [scratch buckets](topic:features:cloud:scratch-buckets) | ||
feature. | ||
3. `hub_namespace` is the full name of the hub, as hubs are put in Kubernetes | ||
3. (GCP only) `hub_namespace` is the full name of the hub, as hubs are put in Kubernetes | ||
Namespaces that are the same as their names. This is explicitly specified here | ||
because `<hub-name-slug>` could possibly be truncated. | ||
because `<hub-name-slug>` could possibly be truncated on GCP. | ||
|
||
2. Run `terraform apply -var-file=projects/<cluster-var-file>.tfvars`, and look at the | ||
plan carefully. It should only be creating or modifying IAM related objects (such as roles | ||
|
@@ -69,12 +80,24 @@ This Google Cloud Service Account is managed via terraform. | |
4. Run `terraform output kubernetes_sa_annotations`, this should | ||
show you a list of hubs and the annotation required to be set on them: | ||
|
||
```{tabbed} GCP | ||
<pre> | ||
$ terraform output kubernetes_sa_annotations | ||
{ | ||
"prod" = "iam.gke.io/gcp-service-account: [email protected]" | ||
"staging" = "iam.gke.io/gcp-service-account: [email protected]" | ||
} | ||
</pre> | ||
``` | ||
|
||
```{tabbed} AWS | ||
<pre> | ||
$ terraform output kubernetes_sa_annotations | ||
{ | ||
"prod" = "iam.gke.io/gcp-service-account: meom-ige-prod@meom-ige-cnrs.iam.gserviceaccount.com" | ||
"staging" = "iam.gke.io/gcp-service-account: meom-ige-staging@meom-ige-cnrs.iam.gserviceaccount.com" | ||
"prod" = "eks.amazonaws.com/role-arn: arn:aws:iam::740010314650:role/uwhackweeks-prod" | ||
"staging" = "eks.amazonaws.com/role-arn: arn:aws:iam::740010314650:role/uwhackweeks-staging" | ||
} | ||
</pre> | ||
``` | ||
|
||
This shows all the annotations for all the hubs configured to provide cloud access | ||
|
@@ -85,10 +108,20 @@ This Google Cloud Service Account is managed via terraform. | |
|
||
6. Specify the annotation from step 4, nested under `userServiceAccount.annotations`. | ||
|
||
```yaml | ||
```{tabbed} GCP | ||
<pre> | ||
userServiceAccount: | ||
annotations: | ||
iam.gke.io/gcp-service-account: [email protected]" | ||
</pre> | ||
``` | ||
```{tabbed} AWS | ||
<pre> | ||
userServiceAccount: | ||
annotations: | ||
eks.amazonaws.com/role-arn: arn:aws:iam::740010314650:role/uwhackweeks-staging | ||
</pre> | ||
``` | ||
```{note} | ||
|
@@ -98,10 +131,10 @@ This Google Cloud Service Account is managed via terraform. | |
7. Get this change deployed, and users should now be able to use the requestor pays feature! | ||
Currently running users might have to restart their pods for the change to take effect. | ||
(howto:features:cloud-access:gcp:storage-buckets)= | ||
### Creating storage buckets for use with the hub | ||
(howto:features:cloud-access:storage-buckets)= | ||
## Creating storage buckets for use with the hub | ||
See [the relevant topic page](topic:features:cloud:gcp:scratch-buckets) for more information | ||
See [the relevant topic page](topic:features:cloud:scratch-buckets) for more information | ||
on why users want this! | ||
1. In the `.tfvars` file for the project in which this hub is based off | ||
|
@@ -128,7 +161,7 @@ on why users want this! | |
very helpful for 'scratch' buckets that are temporary. Set to | ||
`null` to prevent this cleaning up process from happening. | ||
|
||
2. Enable access to these buckets from the hub by [editing `hub_cloud_permissions`](howto:features:cloud-access:gcp:access-perms) | ||
2. Enable access to these buckets from the hub by [editing `hub_cloud_permissions`](howto:features:cloud-access:access-perms) | ||
in the same `.tfvars` file. Follow all the steps listed there - this | ||
should create the storage buckets and provide all users access to them! | ||
|
||
|
@@ -142,9 +175,13 @@ on why users want this! | |
jupyterhub: | ||
singleuser: | ||
extraEnv: | ||
SCRATCH_BUCKET: gcs://<bucket-full-name>/$(JUPYTERHUB_USER) | ||
SCRATCH_BUCKET: <s3 or gcs>://<bucket-full-name>/$(JUPYTERHUB_USER) | ||
PANGEO_SCRATCH: <s3 or gcs>://<bucket-full-name>/$(JUPYTERHUB_USER) | ||
``` | ||
```{note} | ||
Use s3 on AWS and gcs on GCP for the protocol part | ||
``` | ||
```{note} | ||
If the hub is a `daskhub`, nest the config under a `basehub` key | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
resource "aws_s3_bucket" "user_buckets" { | ||
for_each = var.user_buckets | ||
bucket = "${var.cluster_name}-${each.key}" | ||
|
||
} | ||
|
||
resource "aws_s3_bucket_lifecycle_configuration" "user_bucket_expiry" { | ||
for_each = var.user_buckets | ||
bucket = "${var.cluster_name}-${each.key}" | ||
|
||
dynamic "rule" { | ||
for_each = each.value.delete_after != null ? [1] : [] | ||
|
||
content { | ||
id = "delete-after-expiry" | ||
status = "Enabled" | ||
|
||
expiration { | ||
days = each.value.delete_after | ||
} | ||
} | ||
} | ||
} | ||
|
||
locals { | ||
# Nested for loop, thanks to https://www.daveperrett.com/articles/2021/08/19/nested-for-each-with-terraform/ | ||
bucket_permissions = distinct(flatten([ | ||
for hub_name, permissions in var.hub_cloud_permissions : [ | ||
for bucket_name in permissions.bucket_admin_access : { | ||
hub_name = hub_name | ||
bucket_name = bucket_name | ||
} | ||
] | ||
])) | ||
} | ||
|
||
|
||
data "aws_iam_policy_document" "bucket_access" { | ||
for_each = { for bp in local.bucket_permissions : "${bp.hub_name}.${bp.bucket_name}" => bp } | ||
statement { | ||
effect = "Allow" | ||
actions = ["s3:*"] | ||
principals { | ||
type = "AWS" | ||
identifiers = [ | ||
aws_iam_role.irsa_role[each.value.hub_name].arn | ||
] | ||
} | ||
resources = [ | ||
# Grant access only to the bucket and its contents | ||
aws_s3_bucket.user_buckets[each.value.bucket_name].arn, | ||
"${aws_s3_bucket.user_buckets[each.value.bucket_name].arn}/*" | ||
] | ||
} | ||
} | ||
|
||
resource "aws_s3_bucket_policy" "user_bucket_access" { | ||
|
||
for_each = { for bp in local.bucket_permissions : "${bp.hub_name}.${bp.bucket_name}" => bp } | ||
bucket = aws_s3_bucket.user_buckets[each.value.bucket_name].id | ||
policy = data.aws_iam_policy_document.bucket_access[each.key].json | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
data "aws_caller_identity" "current" {} | ||
|
||
data "aws_partition" "current" {} | ||
|
||
resource "aws_iam_role" "irsa_role" { | ||
for_each = var.hub_cloud_permissions | ||
name = "${var.cluster_name}-${each.key}" | ||
|
||
assume_role_policy = data.aws_iam_policy_document.irsa_role_assume[each.key].json | ||
} | ||
|
||
data "aws_iam_policy_document" "irsa_role_assume" { | ||
for_each = var.hub_cloud_permissions | ||
statement { | ||
|
||
effect = "Allow" | ||
|
||
actions = ["sts:AssumeRoleWithWebIdentity"] | ||
|
||
principals { | ||
type = "Federated" | ||
|
||
identifiers = [ | ||
"arn:${data.aws_partition.current.partition}:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${replace(data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer, "https://", "")}" | ||
] | ||
} | ||
condition { | ||
test = "StringEquals" | ||
variable = "${replace(data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer, "https://", "")}:sub" | ||
values = [ | ||
"system:serviceaccount:${each.key}:user-sa" | ||
] | ||
} | ||
} | ||
} | ||
|
||
output "kubernetes_sa_annotations" { | ||
value = { | ||
for k, v in var.hub_cloud_permissions : | ||
k => "eks.amazonaws.com/role-arn: ${aws_iam_role.irsa_role[k].arn}" | ||
} | ||
description = <<-EOT | ||
Annotations to apply to userServiceAccount in each hub to enable cloud permissions for them. | ||
Helm, not terraform, control namespace creation for us. This makes it quite difficult | ||
to create the appropriate kubernetes service account attached to the Google Cloud Service | ||
Account in the appropriate namespace. Instead, this output provides the list of annotations | ||
to be applied to the kubernetes service account used by jupyter and dask pods in a given hub. | ||
This should be specified under userServiceAccount.annotations (or basehub.userServiceAccount.annotations | ||
in case of daskhub) on a values file created specifically for that hub. | ||
EOT | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters