-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refactor]: Upgrade to support v18.X of community EKS module #91
Conversation
@jrsdav:Thanks for your contribution. For this PR, do we need to update docs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@jrsdav are |
@gengmao Good catch -- But the default behavior around |
…s-cloud into refactor-vnext
@jrsdav does the new version of community aws eks module still self-manages the amazon vpc cni plugin? Is it possible to switch to aws-managed k8s add-ons to avoid issues like https://github.com/streamnative/eng-support-tickets/issues/102 (part of https://github.com/streamnative/cloud-ops/issues/264) |
@gengmao we will want to manage the plugin in self-managed mode as part of the bootstrap, as opposed to using the aws-managed plugin. This gives us more control over versions and allows for a cleaner adoption for existing self managed environments. I have created an issue to track this as part of the ArgoCD work. |
@@ -151,21 +159,21 @@ resource "helm_release" "cluster_autoscaler" { | |||
} | |||
] | |||
image = { | |||
tag = lookup(local.k8s_to_autoscaler_version, var.cluster_version, "v1.20.1") # image.tag defaults to the version corresponding to var.cluster_version's default value and must manually be updated | |||
tag = lookup(local.k8s_to_autoscaler_version, var.cluster_version, "v1.21.1") # image.tag defaults to the version corresponding to var.cluster_version's default value and must manually be updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should default to 1.22 or 1.23.
description = "Desired number of worker nodes in the node pool." | ||
type = number | ||
} | ||
|
||
variable "node_pool_disk_size" { | ||
default = 50 | ||
default = 100 | ||
description = "Disk size in GiB for worker nodes in the node pool. Defaults to 50." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description = "Disk size in GiB for worker nodes in the node pool. Defaults to 50." | |
description = "Disk size in GiB for worker nodes in the node pool. Defaults to 100." |
@@ -138,7 +114,7 @@ variable "cluster_autoscaler_helm_chart_repository" { | |||
} | |||
|
|||
variable "cluster_autoscaler_helm_chart_version" { | |||
default = "9.19.2" | |||
default = "9.21.0" | |||
description = "Helm chart version for the cluster-autoscaler. Defaults to \"9.10.4\". See https://github.com/kubernetes/autoscaler/tree/master/charts/cluster-autoscaler for more details." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should get rid of these defaults to X
in the description. When we are generates the docs, terraform-docs actually does put the default in the table.
variable "disable_public_pulsar_endpoint" { | ||
default = false | ||
description = "Whether or not to make the Istio Gateway use a public facing or internal network load balancer. If set to \"true\", additional configuration is required in order to manage the cluster from the StreamNative console" | ||
type = bool | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this new variable name. disable_ublic_pulsar_endpoint
is more description than the old name of istio_network_loadbancer
default = "gp3" | ||
description = "Disk type for function worker nodes. Defaults to gp3." | ||
variable "hosted_zone_id" { | ||
default = "*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is *
a good default?
@@ -109,11 +116,12 @@ locals { | |||
"1.20" = "v1.20.1", | |||
"1.21" = "v1.21.1", | |||
"1.22" = "v1.22.1", | |||
"1.23" = "v1.23.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"1.23" = "v1.23.0" | |
"1.23" = "v1.23.1" |
The .1 is released. Should we use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.24.0 and 1.25.0 is also released. Should we add them to the map?
I fixed the merge conflict |
Fixes:
Motivation
This Terraform module relies on a community driven AWS EKS module and has been steadily using version
v17.X
for the majority of our releases.In the most recent version of the community EKS module,
v18.X
, significant breaking changes were introduced which prevented our ability to continue staying current with updates.It's been a long desire to re-work this module to be compatible with
v18.X
of the community EKS module, which also presented a good opportunity for us to refine our configuration approach for StreamNative platform.This PR brings in the necessary changes to make this module compatible with
v18.X
, along with significant (and opinionated) changes to refine our use and configuration of AWS EKS environments compatible with StreamNative.Modifications
Default Behavior
The behavior of the module has changed significantly. Using a "Vanilla" configuration, the following will occur:
All Resources
additional_tags
applies custom sets of AWS resource tags against all applicable resources created by this module.Cluster addons (
autoscaler
,cert-manager
,csi
,external-dns
,metrics-server
,node-termination-handler
):Node Groups
0
(with the exception of thesmall
node group, which is set to 1).100GiB
node_pool_instance_types
default value (e.g.3 AZs
*4 Instance Classes
=12 node groups
)Note: The node groups changed in this module will result in Terraform wanting to destroy existing node groups. Because of this, it is recommended that you remove node group references in existing configurations before upgrading to this version (you will have to manually migrate away from and decommission the legacy node groups). An upgrade guide will be provided with more details.
Modified Behavior
Here is a brief overview of what is now able to be modified as part of the module:
use_runtime_mode
input is sticking around for backwards compatibility. It's sole responsibility is for management of the EKS cluster IAM role in this module, as opposed to being created/managed in the community EKS module. Continue using it astrue
if done so previously.enable_bootstrap
input controls all of the add-ons installed by this module, including Istio (unless overriden by specifyingenable_istio = true
). Ifenable_bootstrap
is set tofalse
, nothing will be installed on the cluster (this allows us to use ArgoCD for installing add-ons for StreamNative's own configurations)create_iam_policies
input is set tofalse
, the module will attach an IAM policy to an add-on IRSA IAM role, rather than try to create one. It defaults to IAM policies namedStreamNativeCloudRuntimePolicy
andStreamNativeCloudLBPolicy
, but can also have a custom policy ARN provided using thesncloud_services_iam_policy_arn
andsncloud_services_lb_policy_arn
inputs.Removed
The following has been removed from the module:
Resources:
Sub-modules:
backup-resources
(added directly in the main module)managed-cloud
(moved to github.com/streamnative/terraform-managed-cloud)vault-resources
(no longer used or needed)tiered-storage-resources
(added directly in the main moduleVariables:
aws_partition
calico_*
(multiple inputs)cluster_log_kms_key_id
disk_encryption_kms_key_id
(renamed todisk_encryption_kms_key_arn
)enable_aws_load_balancer_controller
enable_calico
enable_cert_manager
(now installed withenable_bootstrap
)enable_cluster_autoscaler
(now installed withenable_bootstrap
)enable_csi
(now installed withenable_bootstrap
)enable_cert_manager
(now installed withenable_bootstrap
)enable_cluster_autoscaler
(now installed withenable_bootstrap
)enable_external_secrets
enable_external_dns
(now installed withenable_bootstrap
)enable_func_pool_monitoring
enable_metrics_server
(now installed withenable_bootstrap
)external_secrets_*
(multiple values)extra_node_pool_instance_types
func_pool_*
(multiple values)istio_network_loadbancer
(now controlled withdisable_public_pulsar_endpoint
)map_additional_aws_accounts
map_additional_iam_users
node_pool_ami_is_eks_optimized
(renamed tonode_pool_ebs_optimized
)wait_for_cluster_timeout
Outputs
cloudwatch_log_group_arn
node_groups
(changed toeks_node_groups
)worker_iam_role_arn
worker_security_group_id
(changed toeks_node_group_security_group_id
)worker_https_ingress_security_group_rule
Added
The following has been added to the module:
Resources:
Variables:
s3_encryption_kms_key_arn
cluster_security_group_additional_rules
cluster_security_group_id
create_cluster_security_group
create_iam_policies
create_node_security_group
disable_public_pulsar_endpoint
disable_public_eks_endpoint
disk_encryption_kms_key_arn
enable_bootstrap
enable_sncloud_control_plane_access
node_security_group_additional_rules
node_security_group_id
node_pool_disk_iops
node_pool_ebs_optimized
velero_backup_schedule
velero_excluded_namespaces
velero_helm_chart_name
velero_helm_chart_repository
velero_helm_chart_version
velero_namespace
velero_plugin_version
velero_policy_arn
velero_settings
Outputs:
eks_cluster_endpoint
eks_cluster_platform_version
eks_node_group_iam_role_arn
eks_node_group_security_group_id
eks_node_groups
tiered_storage_s3_bucket_arn
velero_s3_bucket_arn
Verifying this change
These changes are being verified against existing (upgrade/migration) and new environments.
Documentation
Check the box below.
Need to update docs?
doc-required
(If you need help on updating docs, create a doc issue)
no-need-doc
(Please explain why)
doc
(If this PR contains doc changes)
Todo