-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azurerm_kubernetes_cluster azurerm_kubernetes_cluster_node_pool - Adding node_soak_duration_in_minutes
and drain_timeout_in_minutes
to the node pool upgrade_settings
block
#26022
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @bitcloud! Would you mind rebasing this? The version upgrade for AKS just went into main. Once that's done I'll continue with my review.
Co-authored-by: Itay Grudev <[email protected]>
Co-authored-by: Itay Grudev <[email protected]>
This reverts commit b87cbb2.
52a139a
to
0e9f9d4
Compare
@stephybun I re-based the branch, resolved all the conflicts, ran the tests and tested that drain and soak work correctly with the provider in development mode. I think it's ready. Tell us if you need any changes and we'll add them ASAP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bitcloud.
I made a first pass through this. Could you resolve the review comments? Once that's done we can run the tests and see if we're good to go or require more changes.
CHANGELOG.md
Outdated
@@ -1,5 +1,10 @@ | |||
## 3.105.0 (Unreleased) | |||
|
|||
ENHANCEMENTS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating the changelog in PRs is a frequent source of conflicts since it gets updated manually after merging PRs, so we should remove all changes to this file.
values := make(map[string]interface{}) | ||
|
||
if input != nil && input.MaxSurge != nil { | ||
maxSurge = *input.MaxSurge | ||
values["max_surge"] = *input.MaxSurge | ||
} | ||
|
||
if maxSurge == "" { | ||
return []interface{}{} | ||
if input != nil && input.DrainTimeoutInMinutes != nil { | ||
values["drain_timeout_in_minutes"] = *input.DrainTimeoutInMinutes | ||
} | ||
|
||
return []interface{}{ | ||
map[string]interface{}{ | ||
"max_surge": maxSurge, | ||
}, | ||
if input != nil && input.DrainTimeoutInMinutes != nil { | ||
values["node_soak_duration_in_minutes"] = *input.NodeSoakDurationInMinutes | ||
} | ||
|
||
return []interface{}{values} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually nil check the input first and return early if it's empty, GH won't allow me to use the suggestions feature on this snippet but could we update this to
if input == nil {
return []interface{}{}
}
values := make(map[string]interface{})
if input.MaxSurge != nil {
values["max_surge"] = *input.MaxSurge
}
if input.DrainTimeoutInMinutes != nil {
values["drain_timeout_in_minutes"] = *input.DrainTimeoutInMinutes
}
if input.DrainTimeoutInMinutes != nil {
values["node_soak_duration_in_minutes"] = *input.NodeSoakDurationInMinutes
}
return []interface{}{values}
"drain_timeout_in_minutes": { | ||
Type: pluginsdk.TypeInt, | ||
Optional: true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs It looks like this has a default value of 30 minutes, so we should set that here as well. Is there also some validation that we could add? e.g. value must be between 0 and 60 minutes?
"node_soak_duration_in_minutes": { | ||
Type: pluginsdk.TypeInt, | ||
Optional: true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, docs indicate this can be set between 0 and 30 minutes, so we should add that as validation here.
"drain_timeout_in_minutes": { | ||
Type: pluginsdk.TypeInt, | ||
Optional: true, | ||
}, | ||
"node_soak_duration_in_minutes": { | ||
Type: pluginsdk.TypeInt, | ||
Optional: true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we should apply defaults and validation where applicable
@@ -945,6 +945,10 @@ A `http_proxy_config` block supports the following: | |||
|
|||
A `upgrade_settings` block supports the following: | |||
|
|||
* `drain_timeout_in_minutes` - (Optional) The amount of time in minutes to wait on eviction of pods and graceful termination per node. This eviction wait time honors waiting on pod disruption budgets. If this time is exceeded, the upgrade fails. | |||
|
|||
* `node_soak_duration_in_minutes` - (Optional) The amount of time in minutes to wait after draining a node and before reimaging it and moving on to next node. If not specified, the default is 0 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `node_soak_duration_in_minutes` - (Optional) The amount of time in minutes to wait after draining a node and before reimaging it and moving on to next node. If not specified, the default is 0 minutes. | |
* `node_soak_duration_in_minutes` - (Optional) The amount of time in minutes to wait after draining a node and before reimaging and moving on to next node. Defaults to `0`. |
@@ -308,6 +308,10 @@ A `sysctl_config` block supports the following: | |||
|
|||
A `upgrade_settings` block supports the following: | |||
|
|||
* `drain_timeout_in_minutes` - (Optional) The amount of time in minutes to wait on eviction of pods and graceful termination per node. This eviction wait time honors waiting on pod disruption budgets. If this time is exceeded, the upgrade fails. | |||
|
|||
* `node_soak_duration_in_minutes` - (Optional) The amount of time in minutes to wait after draining a node and before reimaging it and moving on to next node. If not specified, the default is 0 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `node_soak_duration_in_minutes` - (Optional) The amount of time in minutes to wait after draining a node and before reimaging it and moving on to next node. If not specified, the default is 0 minutes. | |
* `node_soak_duration_in_minutes` - (Optional) The amount of time in minutes to wait after draining a node and before reimaging and moving on to next node. Defaults to `0`. |
@@ -945,6 +945,10 @@ A `http_proxy_config` block supports the following: | |||
|
|||
A `upgrade_settings` block supports the following: | |||
|
|||
* `drain_timeout_in_minutes` - (Optional) The amount of time in minutes to wait on eviction of pods and graceful termination per node. This eviction wait time honors waiting on pod disruption budgets. If this time is exceeded, the upgrade fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This eviction wait time honors waiting on pod disruption budgets
I'm unsure if this is correct, in the context of deleting the cluster we actually ignore PDBs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text is more about upgrades. I would intuitively expect that it is ignored for cluster deletion. I can amend the description, but I think that is understood and expected anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done much with PDBs so it wasn't implicit for me. I think to avoid any potential confusion it'd be good to
specify that e.g.
* `drain_timeout_in_minutes` - (Optional) The amount of time in minutes to wait on eviction of pods and graceful termination per node. This eviction wait time honors waiting on pod disruption budgets. If this time is exceeded, the upgrade fails. | |
* `drain_timeout_in_minutes` - (Optional) The amount of time in minutes to wait on eviction of pods and graceful termination per node. This eviction wait time honors pod disruption budgets for upgrades. If this time is exceeded, the upgrade fails. |
|
||
* `drain_timeout_in_minutes` - The amount of time in minutes to wait on eviction of pods and graceful termination per node. This eviction wait time honors waiting on pod disruption budgets. If this time is exceeded, the upgrade fails. | ||
|
||
* `node_soak_duration_in_minutes` - The amount of time in minutes to wait after draining a node and before reimaging it and moving on to next node. If not specified, the default is 0 minutes. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is an redundant space here and we also don't need to specify any defaults or possible values for properties in the data source docs
* `drain_timeout_in_minutes` - The amount of time in minutes to wait on eviction of pods and graceful termination per node. This eviction wait time honors waiting on pod disruption budgets. If this time is exceeded, the upgrade fails. | |
* `node_soak_duration_in_minutes` - The amount of time in minutes to wait after draining a node and before reimaging it and moving on to next node. If not specified, the default is 0 minutes. | |
* `drain_timeout_in_minutes` - The amount of time in minutes to wait on eviction of pods and graceful termination per node. This eviction wait time honors waiting on pod disruption budgets. If this time is exceeded, the upgrade fails. | |
* `node_soak_duration_in_minutes` - The amount of time in minutes to wait after draining a node and before reimaging it and moving on to next node. | |
@@ -94,6 +94,10 @@ In addition to the Arguments listed above - the following Attributes are exporte | |||
|
|||
A `upgrade_settings` block exports the following: | |||
|
|||
* `drain_timeout_in_minutes` - The amount of time in minutes to wait on eviction of pods and graceful termination per node. This eviction wait time honors waiting on pod disruption budgets. If this time is exceeded, the upgrade fails. | |||
|
|||
* `node_soak_duration_in_minutes` - The amount of time in minutes to wait after draining a node and before reimaging it and moving on to next node. If not specified, the default is 0 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `node_soak_duration_in_minutes` - The amount of time in minutes to wait after draining a node and before reimaging it and moving on to next node. If not specified, the default is 0 minutes. | |
* `node_soak_duration_in_minutes` - The amount of time in minutes to wait after draining a node and before reimaging it and moving on to next node. |
@stephybun I added all of the requested changes. I am unsure about one of your documentation comments. Could you give me more guidance if you'd like it changed. I trimmed the tests, but I left the corner cases with 0 values. |
Sure, could you specify which? |
Never mind. You already replied to my comment 20min ago. I just applied your proposal. Are you ok with the tests like this? I combined two of the tests in one, but I feel like the 0s and empty configuration should remain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bitcloud @itay-grudev
Apologies if some of my review comments were unclear. I've tried to clarify and be more specific in my suggestions this time around. I also missed that the state migration schema was updated in my initial review, this needs to be removed. Please let me know if you have any further questions. Once these comments are resolved I'll run the tests and we'll see if any further changes are required.
internal/services/containers/kubernetes_cluster_node_pool_resource.go
Outdated
Show resolved
Hide resolved
internal/services/containers/kubernetes_cluster_node_pool_resource.go
Outdated
Show resolved
Hide resolved
internal/services/containers/kubernetes_cluster_node_pool_resource.go
Outdated
Show resolved
Hide resolved
internal/services/containers/migration/kubernetes_cluster_node_pool.go
Outdated
Show resolved
Hide resolved
Config: r.upgradeSettingsConfig(data, map[string]interface{}{ | ||
"max_surge": "2", | ||
"drain_timeout_in_minutes": 35, | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test steps could still be condensed further, I've added what the test steps should look like below:
data.ResourceTest(t, r, []acceptance.TestStep{
{
Config: r.upgradeSettingsConfig(data, map[string]interface{}{
"max_surge": "2%",
"drain_timeout_in_minutes": 35,
"node_soak_duration_in_minutes": 18,
}),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
check.That(data.ResourceName).Key("default_node_pool.0.upgrade_settings.#").HasValue("1"),
check.That(data.ResourceName).Key("default_node_pool.0.upgrade_settings.0.max_surge").HasValue("2"),
check.That(data.ResourceName).Key("default_node_pool.0.upgrade_settings.0.drain_timeout_in_minutes").HasValue("40"),
check.That(data.ResourceName).Key("default_node_pool.0.upgrade_settings.0.node_soak_duration_in_minutes").HasValue("18"),
),
},
data.ImportStep(),
{
Config: r.upgradeSettingsConfig(data, map[string]interface{}{
"max_surge": "10%",
"drain_timeout_in_minutes": 0,
"node_soak_duration_in_minutes": 0,
}),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
check.That(data.ResourceName).Key("default_node_pool.0.upgrade_settings.#").HasValue("1"),
check.That(data.ResourceName).Key("default_node_pool.0.upgrade_settings.0.max_surge").HasValue("10%"),
check.That(data.ResourceName).Key("default_node_pool.0.upgrade_settings.0.drain_timeout_in_minutes").HasValue("0"),
check.That(data.ResourceName).Key("default_node_pool.0.upgrade_settings.0.node_soak_duration_in_minutes").HasValue("0"),
),
},
data.ImportStep(),
{
Config: r.upgradeSettingsConfig(data, map[string]interface{}{
"max_surge": "10%",
}),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
check.That(data.ResourceName).Key("default_node_pool.0.upgrade_settings.#").HasValue("1"),
check.That(data.ResourceName).Key("default_node_pool.0.upgrade_settings.0.max_surge").DoesNotExist(),
check.That(data.ResourceName).Key("default_node_pool.0.upgrade_settings.0.drain_timeout_in_minutes").DoesNotExist(),
check.That(data.ResourceName).Key("default_node_pool.0.upgrade_settings.0.node_soak_duration_in_minutes").DoesNotExist(),
),
},
data.ImportStep(),
This will:
- Test the create path for
drain_timeout_in_minutes
andnode_soak_duration_in_minutes
- Test the update path for
drain_timeout_in_minutes
andnode_soak_duration_in_minutes
- Test the update path for
drain_timeout_in_minutes
andnode_soak_duration_in_minutes
by removing it from the config
As a note, max_surge
needs to be specified at the moment since the API changed the default (this will be defaulted in the provider in 4.0). We get a plan diff if it isn't specified.
…urce.go Co-authored-by: stephybun <[email protected]>
…urce.go Co-authored-by: stephybun <[email protected]>
…urce.go Co-authored-by: stephybun <[email protected]>
…_pool.go Co-authored-by: stephybun <[email protected]>
@stephybun Done. I'm sorry I took so long. I was waiting for the tests to finish. |
Thanks for the help. |
@stephybun Can we do anything to help you and accelerate this to |
@itay-grudev we have some test failures due to the way the |
@stephybun Yhea, I don't mind of course. Ping me if I can assist in any way. |
Apologies for needing to intervene here @bitcloud / @itay-grudev. The AKS API can be tricky at times, and while looking into this I found that the I unfortunately did not have the necessary permissions to push those updates to your branch, and given the large number of changes made I've opened a separate PR that supersedes this one. I did my best to maintain the authorship of the original work in #26137. I'm going to close this PR now in favour of #26137. The tests are running for this and I'll have to defer review of this to another member of the team but please subscribe to #26137 for updates. Thanks again for all your effort on this! |
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
Community Note
Description
Based on the great PR #24491 from @aristosvo we added the new parameters
node_soak_duration_in_minutes
anddrain_timeout_in_minutes
and updated and fixed the testcases.PR Checklist
For example: “
resource_name_here
- description of change e.g. adding propertynew_property_name_here
”Changes to existing Resource / Data Source
Testing
Change Log
Below please provide what should go into the changelog (if anything) conforming to the Changelog Format documented here.
azurerm_kubernetes_cluster
azurerm_kubernetes_cluster_node_pool
- addingnode_soak_duration_in_minutes
anddrain_timeout_in_minutes
to the node poolupgrade_settings
blockgithub.aaakk.us.kg/hashicorp/go-azure-sdk/resource-manager/containerservice
to version2023-09-02-preview
This is a (please select all that apply):
Related Issue(s)
Fixes: #13066
Build on top: #24491
Note
If this PR changes meaningfully during the course of review please update the title and description as required.