Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup shared cluster on AWS and deploy 'researchdelight' hub #1967

Closed
wants to merge 105 commits into from

Conversation

sgibson91
Copy link
Member

@sgibson91 sgibson91 commented Nov 30, 2022

ref #1949

Since this is a new shared cluster, I will deploy a staging and dask-staging hub alongside the research delight hub requested, mimicking the GCP shared cluster setup.

Hubs added:

@sgibson91 sgibson91 marked this pull request as draft November 30, 2022 11:53
@sgibson91
Copy link
Member Author

sgibson91 commented Nov 30, 2022

(Updated!) tf plan output:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
 <= read (data resources)

Terraform will perform the following actions:

  # data.aws_iam_policy_document.bucket_access["dask-staging.scratch-dask-staging"] will be read during apply
  # (config refers to values not yet known)
 <= data "aws_iam_policy_document" "bucket_access" {
      + id   = (known after apply)
      + json = (known after apply)

      + statement {
          + actions   = [
              + "s3:*",
            ]
          + effect    = "Allow"
          + resources = [
              + (known after apply),
              + (known after apply),
            ]

          + principals {
              + identifiers = [
                  + (known after apply),
                ]
              + type        = "AWS"
            }
        }
    }

  # data.aws_iam_policy_document.bucket_access["research-delight.scratch-research-delight"] will be read during apply
  # (config refers to values not yet known)
 <= data "aws_iam_policy_document" "bucket_access" {
      + id   = (known after apply)
      + json = (known after apply)

      + statement {
          + actions   = [
              + "s3:*",
            ]
          + effect    = "Allow"
          + resources = [
              + (known after apply),
              + (known after apply),
            ]

          + principals {
              + identifiers = [
                  + (known after apply),
                ]
              + type        = "AWS"
            }
        }
    }

  # data.aws_iam_policy_document.bucket_access["staging.scratch-staging"] will be read during apply
  # (config refers to values not yet known)
 <= data "aws_iam_policy_document" "bucket_access" {
      + id   = (known after apply)
      + json = (known after apply)

      + statement {
          + actions   = [
              + "s3:*",
            ]
          + effect    = "Allow"
          + resources = [
              + (known after apply),
              + (known after apply),
            ]

          + principals {
              + identifiers = [
                  + (known after apply),
                ]
              + type        = "AWS"
            }
        }
    }

  # aws_efs_file_system.homedirs will be created
  + resource "aws_efs_file_system" "homedirs" {
      + arn                     = (known after apply)
      + availability_zone_id    = (known after apply)
      + availability_zone_name  = (known after apply)
      + creation_token          = (known after apply)
      + dns_name                = (known after apply)
      + encrypted               = (known after apply)
      + id                      = (known after apply)
      + kms_key_id              = (known after apply)
      + number_of_mount_targets = (known after apply)
      + owner_id                = (known after apply)
      + performance_mode        = (known after apply)
      + size_in_bytes           = (known after apply)
      + tags                    = {
          + "Name" = "hub-homedirs"
        }
      + tags_all                = {
          + "Name" = "hub-homedirs"
        }
      + throughput_mode         = "bursting"
    }

  # aws_efs_mount_target.homedirs will be created
  + resource "aws_efs_mount_target" "homedirs" {
      + availability_zone_id   = (known after apply)
      + availability_zone_name = (known after apply)
      + dns_name               = (known after apply)
      + file_system_arn        = (known after apply)
      + file_system_id         = (known after apply)
      + id                     = (known after apply)
      + ip_address             = (known after apply)
      + mount_target_dns_name  = (known after apply)
      + network_interface_id   = (known after apply)
      + owner_id               = (known after apply)
      + security_groups        = [
          + "sg-000f5f85de16c7792",
        ]
      + subnet_id              = "subnet-03b0128ec2dc8b556"
    }

  # aws_iam_access_key.continuous_deployer will be created
  + resource "aws_iam_access_key" "continuous_deployer" {
      + create_date                    = (known after apply)
      + encrypted_secret               = (known after apply)
      + encrypted_ses_smtp_password_v4 = (known after apply)
      + id                             = (known after apply)
      + key_fingerprint                = (known after apply)
      + secret                         = (sensitive value)
      + ses_smtp_password_v4           = (sensitive value)
      + status                         = "Active"
      + user                           = "hub-continuous-deployer"
    }

  # aws_iam_role.irsa_role["dask-staging"] will be created
  + resource "aws_iam_role" "irsa_role" {
      + arn                   = (known after apply)
      + assume_role_policy    = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "sts:AssumeRoleWithWebIdentity"
                      + Condition = {
                          + StringEquals = {
                              + "oidc.eks.us-west-2.amazonaws.com/id/E2ACE6437981F58A2BA31CE7F6F85AB8:sub" = "system:serviceaccount:dask-staging:user-sa"
                            }
                        }
                      + Effect    = "Allow"
                      + Principal = {
                          + Federated = "arn:aws:iam::790657130469:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/E2ACE6437981F58A2BA31CE7F6F85AB8"
                        }
                      + Sid       = ""
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + create_date           = (known after apply)
      + force_detach_policies = false
      + id                    = (known after apply)
      + managed_policy_arns   = (known after apply)
      + max_session_duration  = 3600
      + name                  = "2i2c-aws-us-dask-staging"
      + name_prefix           = (known after apply)
      + path                  = "/"
      + tags_all              = (known after apply)
      + unique_id             = (known after apply)

      + inline_policy {
          + name   = (known after apply)
          + policy = (known after apply)
        }
    }

  # aws_iam_role.irsa_role["research-delight"] will be created
  + resource "aws_iam_role" "irsa_role" {
      + arn                   = (known after apply)
      + assume_role_policy    = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "sts:AssumeRoleWithWebIdentity"
                      + Condition = {
                          + StringEquals = {
                              + "oidc.eks.us-west-2.amazonaws.com/id/E2ACE6437981F58A2BA31CE7F6F85AB8:sub" = "system:serviceaccount:research-delight:user-sa"
                            }
                        }
                      + Effect    = "Allow"
                      + Principal = {
                          + Federated = "arn:aws:iam::790657130469:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/E2ACE6437981F58A2BA31CE7F6F85AB8"
                        }
                      + Sid       = ""
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + create_date           = (known after apply)
      + force_detach_policies = false
      + id                    = (known after apply)
      + managed_policy_arns   = (known after apply)
      + max_session_duration  = 3600
      + name                  = "2i2c-aws-us-research-delight"
      + name_prefix           = (known after apply)
      + path                  = "/"
      + tags_all              = (known after apply)
      + unique_id             = (known after apply)

      + inline_policy {
          + name   = (known after apply)
          + policy = (known after apply)
        }
    }

  # aws_iam_role.irsa_role["staging"] will be created
  + resource "aws_iam_role" "irsa_role" {
      + arn                   = (known after apply)
      + assume_role_policy    = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "sts:AssumeRoleWithWebIdentity"
                      + Condition = {
                          + StringEquals = {
                              + "oidc.eks.us-west-2.amazonaws.com/id/E2ACE6437981F58A2BA31CE7F6F85AB8:sub" = "system:serviceaccount:staging:user-sa"
                            }
                        }
                      + Effect    = "Allow"
                      + Principal = {
                          + Federated = "arn:aws:iam::790657130469:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/E2ACE6437981F58A2BA31CE7F6F85AB8"
                        }
                      + Sid       = ""
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + create_date           = (known after apply)
      + force_detach_policies = false
      + id                    = (known after apply)
      + managed_policy_arns   = (known after apply)
      + max_session_duration  = 3600
      + name                  = "2i2c-aws-us-staging"
      + name_prefix           = (known after apply)
      + path                  = "/"
      + tags_all              = (known after apply)
      + unique_id             = (known after apply)

      + inline_policy {
          + name   = (known after apply)
          + policy = (known after apply)
        }
    }

  # aws_iam_user.continuous_deployer will be created
  + resource "aws_iam_user" "continuous_deployer" {
      + arn           = (known after apply)
      + force_destroy = false
      + id            = (known after apply)
      + name          = "hub-continuous-deployer"
      + path          = "/"
      + tags_all      = (known after apply)
      + unique_id     = (known after apply)
    }

  # aws_iam_user_policy.continuous_deployer will be created
  + resource "aws_iam_user_policy" "continuous_deployer" {
      + id     = (known after apply)
      + name   = "eks-readonly"
      + policy = jsonencode(
            {
              + Statement = [
                  + {
                      + Action   = "eks:DescribeCluster"
                      + Effect   = "Allow"
                      + Resource = "*"
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + user   = "hub-continuous-deployer"
    }

  # aws_s3_bucket.user_buckets["scratch-dask-staging"] will be created
  + resource "aws_s3_bucket" "user_buckets" {
      + acceleration_status         = (known after apply)
      + acl                         = (known after apply)
      + arn                         = (known after apply)
      + bucket                      = "2i2c-aws-us-scratch-dask-staging"
      + bucket_domain_name          = (known after apply)
      + bucket_regional_domain_name = (known after apply)
      + force_destroy               = false
      + hosted_zone_id              = (known after apply)
      + id                          = (known after apply)
      + object_lock_enabled         = (known after apply)
      + policy                      = (known after apply)
      + region                      = (known after apply)
      + request_payer               = (known after apply)
      + tags_all                    = (known after apply)
      + website_domain              = (known after apply)
      + website_endpoint            = (known after apply)

      + cors_rule {
          + allowed_headers = (known after apply)
          + allowed_methods = (known after apply)
          + allowed_origins = (known after apply)
          + expose_headers  = (known after apply)
          + max_age_seconds = (known after apply)
        }

      + grant {
          + id          = (known after apply)
          + permissions = (known after apply)
          + type        = (known after apply)
          + uri         = (known after apply)
        }

      + lifecycle_rule {
          + abort_incomplete_multipart_upload_days = (known after apply)
          + enabled                                = (known after apply)
          + id                                     = (known after apply)
          + prefix                                 = (known after apply)
          + tags                                   = (known after apply)

          + expiration {
              + date                         = (known after apply)
              + days                         = (known after apply)
              + expired_object_delete_marker = (known after apply)
            }

          + noncurrent_version_expiration {
              + days = (known after apply)
            }

          + noncurrent_version_transition {
              + days          = (known after apply)
              + storage_class = (known after apply)
            }

          + transition {
              + date          = (known after apply)
              + days          = (known after apply)
              + storage_class = (known after apply)
            }
        }

      + logging {
          + target_bucket = (known after apply)
          + target_prefix = (known after apply)
        }

      + object_lock_configuration {
          + object_lock_enabled = (known after apply)

          + rule {
              + default_retention {
                  + days  = (known after apply)
                  + mode  = (known after apply)
                  + years = (known after apply)
                }
            }
        }

      + replication_configuration {
          + role = (known after apply)

          + rules {
              + delete_marker_replication_status = (known after apply)
              + id                               = (known after apply)
              + prefix                           = (known after apply)
              + priority                         = (known after apply)
              + status                           = (known after apply)

              + destination {
                  + account_id         = (known after apply)
                  + bucket             = (known after apply)
                  + replica_kms_key_id = (known after apply)
                  + storage_class      = (known after apply)

                  + access_control_translation {
                      + owner = (known after apply)
                    }

                  + metrics {
                      + minutes = (known after apply)
                      + status  = (known after apply)
                    }

                  + replication_time {
                      + minutes = (known after apply)
                      + status  = (known after apply)
                    }
                }

              + filter {
                  + prefix = (known after apply)
                  + tags   = (known after apply)
                }

              + source_selection_criteria {
                  + sse_kms_encrypted_objects {
                      + enabled = (known after apply)
                    }
                }
            }
        }

      + server_side_encryption_configuration {
          + rule {
              + bucket_key_enabled = (known after apply)

              + apply_server_side_encryption_by_default {
                  + kms_master_key_id = (known after apply)
                  + sse_algorithm     = (known after apply)
                }
            }
        }

      + versioning {
          + enabled    = (known after apply)
          + mfa_delete = (known after apply)
        }

      + website {
          + error_document           = (known after apply)
          + index_document           = (known after apply)
          + redirect_all_requests_to = (known after apply)
          + routing_rules            = (known after apply)
        }
    }

  # aws_s3_bucket.user_buckets["scratch-research-delight"] will be created
  + resource "aws_s3_bucket" "user_buckets" {
      + acceleration_status         = (known after apply)
      + acl                         = (known after apply)
      + arn                         = (known after apply)
      + bucket                      = "2i2c-aws-us-scratch-research-delight"
      + bucket_domain_name          = (known after apply)
      + bucket_regional_domain_name = (known after apply)
      + force_destroy               = false
      + hosted_zone_id              = (known after apply)
      + id                          = (known after apply)
      + object_lock_enabled         = (known after apply)
      + policy                      = (known after apply)
      + region                      = (known after apply)
      + request_payer               = (known after apply)
      + tags_all                    = (known after apply)
      + website_domain              = (known after apply)
      + website_endpoint            = (known after apply)

      + cors_rule {
          + allowed_headers = (known after apply)
          + allowed_methods = (known after apply)
          + allowed_origins = (known after apply)
          + expose_headers  = (known after apply)
          + max_age_seconds = (known after apply)
        }

      + grant {
          + id          = (known after apply)
          + permissions = (known after apply)
          + type        = (known after apply)
          + uri         = (known after apply)
        }

      + lifecycle_rule {
          + abort_incomplete_multipart_upload_days = (known after apply)
          + enabled                                = (known after apply)
          + id                                     = (known after apply)
          + prefix                                 = (known after apply)
          + tags                                   = (known after apply)

          + expiration {
              + date                         = (known after apply)
              + days                         = (known after apply)
              + expired_object_delete_marker = (known after apply)
            }

          + noncurrent_version_expiration {
              + days = (known after apply)
            }

          + noncurrent_version_transition {
              + days          = (known after apply)
              + storage_class = (known after apply)
            }

          + transition {
              + date          = (known after apply)
              + days          = (known after apply)
              + storage_class = (known after apply)
            }
        }

      + logging {
          + target_bucket = (known after apply)
          + target_prefix = (known after apply)
        }

      + object_lock_configuration {
          + object_lock_enabled = (known after apply)

          + rule {
              + default_retention {
                  + days  = (known after apply)
                  + mode  = (known after apply)
                  + years = (known after apply)
                }
            }
        }

      + replication_configuration {
          + role = (known after apply)

          + rules {
              + delete_marker_replication_status = (known after apply)
              + id                               = (known after apply)
              + prefix                           = (known after apply)
              + priority                         = (known after apply)
              + status                           = (known after apply)

              + destination {
                  + account_id         = (known after apply)
                  + bucket             = (known after apply)
                  + replica_kms_key_id = (known after apply)
                  + storage_class      = (known after apply)

                  + access_control_translation {
                      + owner = (known after apply)
                    }

                  + metrics {
                      + minutes = (known after apply)
                      + status  = (known after apply)
                    }

                  + replication_time {
                      + minutes = (known after apply)
                      + status  = (known after apply)
                    }
                }

              + filter {
                  + prefix = (known after apply)
                  + tags   = (known after apply)
                }

              + source_selection_criteria {
                  + sse_kms_encrypted_objects {
                      + enabled = (known after apply)
                    }
                }
            }
        }

      + server_side_encryption_configuration {
          + rule {
              + bucket_key_enabled = (known after apply)

              + apply_server_side_encryption_by_default {
                  + kms_master_key_id = (known after apply)
                  + sse_algorithm     = (known after apply)
                }
            }
        }

      + versioning {
          + enabled    = (known after apply)
          + mfa_delete = (known after apply)
        }

      + website {
          + error_document           = (known after apply)
          + index_document           = (known after apply)
          + redirect_all_requests_to = (known after apply)
          + routing_rules            = (known after apply)
        }
    }

  # aws_s3_bucket.user_buckets["scratch-staging"] will be created
  + resource "aws_s3_bucket" "user_buckets" {
      + acceleration_status         = (known after apply)
      + acl                         = (known after apply)
      + arn                         = (known after apply)
      + bucket                      = "2i2c-aws-us-scratch-staging"
      + bucket_domain_name          = (known after apply)
      + bucket_regional_domain_name = (known after apply)
      + force_destroy               = false
      + hosted_zone_id              = (known after apply)
      + id                          = (known after apply)
      + object_lock_enabled         = (known after apply)
      + policy                      = (known after apply)
      + region                      = (known after apply)
      + request_payer               = (known after apply)
      + tags_all                    = (known after apply)
      + website_domain              = (known after apply)
      + website_endpoint            = (known after apply)

      + cors_rule {
          + allowed_headers = (known after apply)
          + allowed_methods = (known after apply)
          + allowed_origins = (known after apply)
          + expose_headers  = (known after apply)
          + max_age_seconds = (known after apply)
        }

      + grant {
          + id          = (known after apply)
          + permissions = (known after apply)
          + type        = (known after apply)
          + uri         = (known after apply)
        }

      + lifecycle_rule {
          + abort_incomplete_multipart_upload_days = (known after apply)
          + enabled                                = (known after apply)
          + id                                     = (known after apply)
          + prefix                                 = (known after apply)
          + tags                                   = (known after apply)

          + expiration {
              + date                         = (known after apply)
              + days                         = (known after apply)
              + expired_object_delete_marker = (known after apply)
            }

          + noncurrent_version_expiration {
              + days = (known after apply)
            }

          + noncurrent_version_transition {
              + days          = (known after apply)
              + storage_class = (known after apply)
            }

          + transition {
              + date          = (known after apply)
              + days          = (known after apply)
              + storage_class = (known after apply)
            }
        }

      + logging {
          + target_bucket = (known after apply)
          + target_prefix = (known after apply)
        }

      + object_lock_configuration {
          + object_lock_enabled = (known after apply)

          + rule {
              + default_retention {
                  + days  = (known after apply)
                  + mode  = (known after apply)
                  + years = (known after apply)
                }
            }
        }

      + replication_configuration {
          + role = (known after apply)

          + rules {
              + delete_marker_replication_status = (known after apply)
              + id                               = (known after apply)
              + prefix                           = (known after apply)
              + priority                         = (known after apply)
              + status                           = (known after apply)

              + destination {
                  + account_id         = (known after apply)
                  + bucket             = (known after apply)
                  + replica_kms_key_id = (known after apply)
                  + storage_class      = (known after apply)

                  + access_control_translation {
                      + owner = (known after apply)
                    }

                  + metrics {
                      + minutes = (known after apply)
                      + status  = (known after apply)
                    }

                  + replication_time {
                      + minutes = (known after apply)
                      + status  = (known after apply)
                    }
                }

              + filter {
                  + prefix = (known after apply)
                  + tags   = (known after apply)
                }

              + source_selection_criteria {
                  + sse_kms_encrypted_objects {
                      + enabled = (known after apply)
                    }
                }
            }
        }

      + server_side_encryption_configuration {
          + rule {
              + bucket_key_enabled = (known after apply)

              + apply_server_side_encryption_by_default {
                  + kms_master_key_id = (known after apply)
                  + sse_algorithm     = (known after apply)
                }
            }
        }

      + versioning {
          + enabled    = (known after apply)
          + mfa_delete = (known after apply)
        }

      + website {
          + error_document           = (known after apply)
          + index_document           = (known after apply)
          + redirect_all_requests_to = (known after apply)
          + routing_rules            = (known after apply)
        }
    }

  # aws_s3_bucket_lifecycle_configuration.user_bucket_expiry["scratch-dask-staging"] will be created
  + resource "aws_s3_bucket_lifecycle_configuration" "user_bucket_expiry" {
      + bucket = "2i2c-aws-us-scratch-dask-staging"
      + id     = (known after apply)

      + rule {
          + id     = "delete-after-expiry"
          + status = "Enabled"

          + expiration {
              + days                         = 7
              + expired_object_delete_marker = (known after apply)
            }
        }
    }

  # aws_s3_bucket_lifecycle_configuration.user_bucket_expiry["scratch-research-delight"] will be created
  + resource "aws_s3_bucket_lifecycle_configuration" "user_bucket_expiry" {
      + bucket = "2i2c-aws-us-scratch-research-delight"
      + id     = (known after apply)

      + rule {
          + id     = "delete-after-expiry"
          + status = "Enabled"

          + expiration {
              + days                         = 7
              + expired_object_delete_marker = (known after apply)
            }
        }
    }

  # aws_s3_bucket_lifecycle_configuration.user_bucket_expiry["scratch-staging"] will be created
  + resource "aws_s3_bucket_lifecycle_configuration" "user_bucket_expiry" {
      + bucket = "2i2c-aws-us-scratch-staging"
      + id     = (known after apply)

      + rule {
          + id     = "delete-after-expiry"
          + status = "Enabled"

          + expiration {
              + days                         = 7
              + expired_object_delete_marker = (known after apply)
            }
        }
    }

  # aws_s3_bucket_policy.user_bucket_access["dask-staging.scratch-dask-staging"] will be created
  + resource "aws_s3_bucket_policy" "user_bucket_access" {
      + bucket = (known after apply)
      + id     = (known after apply)
      + policy = (known after apply)
    }

  # aws_s3_bucket_policy.user_bucket_access["research-delight.scratch-research-delight"] will be created
  + resource "aws_s3_bucket_policy" "user_bucket_access" {
      + bucket = (known after apply)
      + id     = (known after apply)
      + policy = (known after apply)
    }

  # aws_s3_bucket_policy.user_bucket_access["staging.scratch-staging"] will be created
  + resource "aws_s3_bucket_policy" "user_bucket_access" {
      + bucket = (known after apply)
      + id     = (known after apply)
      + policy = (known after apply)
    }

Plan: 17 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + continuous_deployer_creds = (sensitive value)
  + db_helm_config            = (sensitive value)
  + eksctl_iam_command        = (known after apply)
  + kubernetes_sa_annotations = {
      + dask-staging     = (known after apply)
      + research-delight = (known after apply)
      + staging          = (known after apply)
    }
  + nfs_server_dns            = (known after apply)

@sgibson91 sgibson91 requested a review from a team November 30, 2022 11:54
@sgibson91
Copy link
Member Author

@2i2c-org/tech-team I am requesting early review of the eksctl/terraform files before I fully deploy and begin on the hubs

// Warning: version 1.23 introduces some breaking changes
// Checkout the docs before upgrading
// ref: https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi-migration-faq.html
version: '1.22'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly stating that this is ok for now to get this out the door, and we should work on upgrading in the new year.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Scale to zero node groups!

Copy link
Member

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @sgibson91. The only suggestion I have is that we name this to match the other shared projects we have, and call it 2i2c-aws or 2i2c-aws-us maybe? I think shared-cluster is a bit too broad.

@sgibson91
Copy link
Member Author

The only suggestion I have is that we name this to match the other shared projects we have, and call it 2i2c-aws or 2i2c-aws-us maybe? I think shared-cluster is a bit too broad.

Yeah, I was trying to look to the two-eye-two-see GCP project for inspiration but that's still called pilot-hubs-cluster 😬

shared-hubs-cluster was too broad, 2i2c-aws-us is more specific
@sgibson91
Copy link
Member Author

sgibson91 commented Dec 2, 2022

Latest commit renames the cluster to 2i2c-aws-us

@sgibson91
Copy link
Member Author

I just deployed a staging hub to this cluster more or less copying the staging hub from the 2i2c/GCP shared cluster folder.

It deployed fine, but I am not an admin when I login? Even though I have provided the correct config to make me an admin

Screenshot 2022-12-02 at 17 49 48

@yuvipanda
Copy link
Member

@sgibson91 what does the hub logs say?

@sgibson91
Copy link
Member Author

what does the hub logs say?

Nothing interesting at all. The config to make us admins runs without error.

@sgibson91
Copy link
Member Author

There's this?

[W 2022-12-02 17:49:28.048 JupyterHub auth:298] Service Server at /user/[email protected]/ requested scopes access:[email protected]/,access:[email protected],read:users:name!user,read:users:groups!user for user [email protected], granting only access:[email protected]/,read:users:name!user,read:users:groups!user.

@sgibson91
Copy link
Member Author

Initial start up logs:

Loading /usr/local/etc/jupyterhub/secret/values.yaml
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
Loading extra config: 01-custom-theme
Loading extra config: 02-custom-admin
Loading extra config: 03-cloud-storage-bucket
Loading extra config: 04-2i2c-add-staff-user-ids-to-admin-users
Loading extra config: 05-gh-teams
Loading extra config: 05-per-user-disk
[I 2022-12-02 17:38:51.717 JupyterHub app:2775] Running JupyterHub version 3.0.0
[I 2022-12-02 17:38:51.717 JupyterHub app:2805] Using Authenticator: oauthenticator.auth0.Auth0OAuthenticator-15.1.0
[I 2022-12-02 17:38:51.717 JupyterHub app:2805] Using Spawner: builtins.CustomSpawner
[I 2022-12-02 17:38:51.717 JupyterHub app:2805] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-3.0.0
[I 2022-12-02 17:38:51.788 JupyterHub app:1934] Not using allowed_users. Any authenticated user will be allowed.
[I 2022-12-02 17:38:51.815 JupyterHub provider:653] Updating oauth client service-configurator
[I 2022-12-02 17:38:51.871 JupyterHub app:2844] Initialized 0 spawners in 0.003 seconds
[I 2022-12-02 17:38:51.872 JupyterHub app:3057] Not starting proxy
[I 2022-12-02 17:38:51.876 JupyterHub app:3093] Hub API listening on http://:8081/hub/
[I 2022-12-02 17:38:51.876 JupyterHub app:3095] Private Hub API connect url http://hub:8081/hub/
[I 2022-12-02 17:38:51.877 JupyterHub app:3104] Starting managed service jupyterhub-idle-culler
[I 2022-12-02 17:38:51.877 JupyterHub service:385] Starting service 'jupyterhub-idle-culler': ['python3', '-m', 'jupyterhub_idle_culler', '--url=http://localhost:8081/hub/api', '--timeout=3600', '--cull-every=600', '--concurrency=10']
[I 2022-12-02 17:38:51.879 JupyterHub service:133] Spawning python3 -m jupyterhub_idle_culler --url=http://localhost:8081/hub/api --timeout=3600 --cull-every=600 --concurrency=10
[I 2022-12-02 17:38:51.883 JupyterHub app:3104] Starting managed service configurator at http://configurator:10101
[I 2022-12-02 17:38:51.883 JupyterHub service:385] Starting service 'configurator': ['python3', '-m', 'jupyterhub_configurator.app', '--Configurator.config_file=/usr/local/etc/jupyterhub-configurator/jupyterhub_configurator_config.py']
[I 2022-12-02 17:38:51.885 JupyterHub service:133] Spawning python3 -m jupyterhub_configurator.app --Configurator.config_file=/usr/local/etc/jupyterhub-configurator/jupyterhub_configurator_config.py
[I 2022-12-02 17:38:52.040 JupyterHub log:186] 200 GET /hub/api/ ([email protected]) 11.61ms
[I 2022-12-02 17:38:52.054 JupyterHub log:186] 200 GET /hub/api/users?state=[secret] ([email protected]) 12.19ms
[I 2022-12-02 17:38:52.899 JupyterHub app:3113] Adding external service hub-health
[I 2022-12-02 17:38:52.900 JupyterHub proxy:480] Adding route for Hub: / => http://hub:8081
[W 2022-12-02 17:38:52.901 JupyterHub proxy:448] Adding missing route for configurator (Server(url=http://configurator:10101/services/configurator/, bind_url=http://configurator:10101/services/configurator/))
[I 2022-12-02 17:38:52.902 JupyterHub proxy:314] Adding service configurator to proxy /services/configurator/ => http://configurator:10101
[I 2022-12-02 17:38:52.906 JupyterHub app:3162] JupyterHub is now running, internal Hub API at http://hub:8081/hub/

@sgibson91
Copy link
Member Author

From when I logged in

[I 2022-12-02 17:41:46.867 JupyterHub oauth2:102] OAuth redirect: 'https://staging.aws.2i2c.cloud/hub/oauth_callback'
[I 2022-12-02 17:41:46.868 JupyterHub log:186] 302 GET /hub/oauth_login?next=%2Fhub%2F -> https://2i2c.us.auth0.com/authorize?response_type=code&redirect_uri=https%3A%2F%2Fstaging.aws.2i2c.cloud%2Fhub%2Foauth_callback&client_id=nTFsGt6LL2SPwLuQEsu5krQpUp8Xr18a&state=[secret]&scope=openid+name+profile+email (@192.168.6.151) 1.67ms
[I 2022-12-02 17:41:55.466 JupyterHub roles:238] Adding role user for User: [email protected]
[I 2022-12-02 17:41:55.482 JupyterHub base:810] User logged in: [email protected]
[I 2022-12-02 17:41:55.482 JupyterHub log:186] 302 GET /hub/oauth_callback?code=[secret]&state=[secret] -> /hub/ (@192.168.6.151) 397.10ms
[I 2022-12-02 17:41:55.774 JupyterHub log:186] 302 GET /hub/ -> /hub/spawn ([email protected]@192.168.6.151) 15.73ms
[I 2022-12-02 17:41:55.815 JupyterHub reflector:274] watching for pods with label selector='component=singleuser-server' in namespace staging
[I 2022-12-02 17:41:55.818 JupyterHub reflector:274] watching for events with field selector='involvedObject.kind=Pod' in namespace staging
[I 2022-12-02 17:41:55.987 JupyterHub provider:651] Creating oauth client jupyterhub-user-sgibson%402i2c.org
[I 2022-12-02 17:41:56.006 JupyterHub log:186] 302 GET /hub/spawn -> /hub/spawn-pending/[email protected] ([email protected]@192.168.6.151) 32.70ms
[I 2022-12-02 17:41:56.015 JupyterHub spawner:2489] Attempting to create pod jupyter-sgibson-402i2c-2eorg, with timeout 3
[I 2022-12-02 17:41:56.201 JupyterHub pages:394] [email protected] is pending spawn
[I 2022-12-02 17:41:56.204 JupyterHub log:186] 200 GET /hub/spawn-pending/[email protected] ([email protected]@192.168.6.151) 5.86ms
[I 2022-12-02 17:42:07.963 JupyterHub log:186] 200 GET /hub/home ([email protected]@192.168.6.151) 11.55ms
[I 2022-12-02 17:42:14.758 JupyterHub log:186] 302 GET / -> /hub/ (@192.168.6.151) 0.72ms
[I 2022-12-02 17:42:14.909 JupyterHub log:186] 302 GET /hub/ -> /hub/login?next=%2Fhub%2F (@192.168.6.151) 0.72ms
[I 2022-12-02 17:42:15.062 JupyterHub log:186] 200 GET /hub/login?next=%2Fhub%2F (@192.168.6.151) 1.96ms
[W 2022-12-02 17:42:15.310 JupyterHub web:1796] 400 DELETE /hub/api/users/sgibson%402i2c.org/server (192.168.6.151): [email protected] is pending spawn, please wait
[W 2022-12-02 17:42:15.311 JupyterHub log:186] 400 DELETE /hub/api/users/sgibson%402i2c.org/server ([email protected]@192.168.6.151) 2.93ms
[I 2022-12-02 17:42:24.216 JupyterHub log:186] 200 GET /hub/metrics (@192.168.8.103) 6.03ms
[I 2022-12-02 17:43:24.216 JupyterHub log:186] 200 GET /hub/metrics (@192.168.8.103) 6.03ms
[I 2022-12-02 17:44:24.217 JupyterHub log:186] 200 GET /hub/metrics (@192.168.8.103) 6.96ms
[I 2022-12-02 17:44:58.610 JupyterHub log:186] 301 GET /user/[email protected] -> /user/[email protected]/ (@192.168.6.151) 0.82ms
[I 2022-12-02 17:44:58.981 JupyterHub log:186] 302 GET /user/[email protected]/ -> /hub/user/[email protected]/ (@192.168.6.151) 0.79ms
[I 2022-12-02 17:44:59.167 JupyterHub log:186] 303 GET /hub/user/[email protected]/ ([email protected]@192.168.6.151) 7.58ms
[I 2022-12-02 17:44:59.348 JupyterHub pages:394] [email protected] is pending spawn
[I 2022-12-02 17:44:59.349 JupyterHub log:186] 200 GET /hub/spawn-pending/[email protected]?next=%2Fhub%2Fuser%2Fsgibson%402i2c.org%2F ([email protected]@192.168.6.151) 3.58ms
[I 2022-12-02 17:45:03.044 JupyterHub log:186] 200 GET /hub/api (@192.168.6.204) 0.62ms
[I 2022-12-02 17:45:03.065 JupyterHub log:186] 200 POST /hub/api/users/[email protected]/activity ([email protected]@192.168.6.204) 14.62ms
[W 2022-12-02 17:45:03.312 JupyterHub _version:68] jupyterhub version 3.0.0 != jupyterhub-singleuser version 2.3.1. This could cause failure to authenticate and result in redirect loops!
[I 2022-12-02 17:45:03.312 JupyterHub base:963] User [email protected] took 187.337 seconds to start
[I 2022-12-02 17:45:03.312 JupyterHub proxy:333] Adding user [email protected] to proxy /user/[email protected]/ => http://192.168.6.204:8888
[I 2022-12-02 17:45:03.314 JupyterHub users:749] Server [email protected] is ready

@yuvipanda
Copy link
Member

My suggestion is to put some print statements here https://github.com/2i2c-org/infrastructure/blob/master/helm-charts/basehub/values.yaml#L492 and see what shows up in the hub logs.

hyphen is not in the hub image name James has created, so let's be
consistent
pre-commit-ci bot and others added 19 commits January 5, 2023 15:58
updates:
- [github.com/pycqa/isort: 5.11.0 → v5.11.3](PyCQA/isort@5.11.0...v5.11.3)
We would like to be able to select the ML specific images not just for the GPU specific server, but also for the medium server. This option is available for more members and will in particular be used for an upcoming workshop. I am not sure if this simple change does the trick or if anything elsewhere needs to be specified.
updates:
- [github.com/pycqa/isort: v5.11.3 → 5.11.4](PyCQA/isort@v5.11.3...5.11.4)
Bumps [rich](https://github.com/Textualize/rich) from 12.6.0 to 13.0.0.
- [Release notes](https://github.com/Textualize/rich/releases)
- [Changelog](https://github.com/Textualize/rich/blob/master/CHANGELOG.md)
- [Commits](Textualize/rich@v12.6.0...v13.0.0)

---
updated-dependencies:
- dependency-name: rich
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
There are new changes here that will help a lot with
memory usage!

See https://gateway.dask.org/changelog.html#id1
Brings in fix to allow opening .json files that have hard tabs
This restores admin access to 2i2c staff members
@sgibson91
Copy link
Member Author

sgibson91 commented Jan 5, 2023

Omg, I did git merge ( not git rebase) to update my branch and I don't know why that has added so many commits instead of one 🤦🏻‍♀️

@sgibson91
Copy link
Member Author

Ok, the dask-staging hub is now timing out "waiting for the condition" from helm and I don't know why because all the pods are up and running, so I'm just going to purge that one in view of getting this damn PR merged. staging hub completes fine.

@sgibson91
Copy link
Member Author

Ugh, now the PR has changed 73 files, what the hell has happened 😭

@sgibson91
Copy link
Member Author

Closing in favour of #2022 which isn't so much of a mess

@sgibson91 sgibson91 closed this Jan 5, 2023
@sgibson91 sgibson91 deleted the researchdelight branch January 5, 2023 17:08
@sgibson91 sgibson91 restored the researchdelight branch January 24, 2023 10:03
sgibson91 added a commit to sgibson91/infrastructure that referenced this pull request Jan 24, 2023
This file was missed in 2i2c-org#2022, and recovered from 2i2c-org#1967
@sgibson91 sgibson91 deleted the researchdelight branch January 24, 2023 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[Request deployment] New Hub: researchdelight
9 participants