Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error updating maintenance_window in AKS using Terraform when computed window start is in the past #22762

Closed
1 task done
aydosman opened this issue Aug 1, 2023 · 21 comments · Fixed by #23985
Closed
1 task done

Comments

@aydosman
Copy link

aydosman commented Aug 1, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Terraform Version

1.5.4

AzureRM Provider Version

3.67.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster

Terraform Configuration

Original Configuration:
maintenance_window = {
  utc_offset = "+01:00"
  maintenance_window_auto_upgrade = {
    frequency   = "WEEKLY"
    interval    = 1
    duration    = 4
    day_of_week = "TUESDAY"
    start_time  = "19:00"
  }
  maintenance_window_node_os = {
    frequency   = "WEEKLY"
    interval    = 1
    duration    = 4
    day_of_week = "TUESDAY"
    start_time  = "15:00"
  }
  not_allowed = []
}

Updated Configuration:
maintenance_window = {
  utc_offset = "+00:00"
  maintenance_window_auto_upgrade = {
    frequency   = "WEEKLY"
    interval    = 1
    duration    = 4
    day_of_week = "TUESDAY"
    start_time  = "09:00"
  }
  maintenance_window_node_os = {
    frequency   = "WEEKLY"
    interval    = 1
    duration    = 4
    day_of_week = "TUESDAY"
    start_time  = "09:00"
  }
  not_allowed = []
}

Debug Output/Panic Output

When I apply these changes, I get the following error:

Error: creating/updating Auto Upgrade Schedule Maintenance Configuration for Kubernetes Cluster (Subscription: "redact"
Resource Group Name: "redact-aks-2"
Kubernetes Cluster Name: "redact-aks-2"): maintenanceconfigurations.MaintenanceConfigurationsClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="InvalidParameter" Message="The input 'maintenanceWindow.startDate' 2023-07-31 00:00:00 +0000 cxTimeZone is before the current time 2023-08-01 09:22:31.836739265 +0000 UTC m=+127068.839937897."

with module.aks.module.cluster.azurerm_kubernetes_cluster.default,
on .terraform/redact.tf line 30, in resource "azurerm_kubernetes_cluster" "default":
30: resource "azurerm_kubernetes_cluster" "default" {

Problem/Expected Behaviour

I am experiencing an issue when attempting to update an existing maintenance_window in Azure Kubernetes Service (AKS) using Terraform version 1.5.4 and the Azure provider (azurerm) version 3.67.0. The issue occurs when running the terraform apply command, specifically when the computed window start time falls in the past.

The AKS cluster is located in a specific region and is running the latest AKS version as of 2023-07-23. I am trying to update the day and time of the maintenance window. Above are the changes I am making.

The key point of this issue is that the logic to calculate the window start date doesn't seem to consider the current time, thereby allowing a past date to be used where only future dates should be valid. This results in a failure during the apply step.

I would greatly appreciate any insights into why this might be happening.

Extra

When modifying the maintenance window through the az command-line interface, there are no issues and the action is successful. The start date is set for as required.

Deleting the maintenance configuration and re-applying it in the past in addition has no issue

`az aks maintenanceconfiguration update -g redact-aks-2 --cluster-name redact-aks-2 --name aksManagedNodeOSUpgradeSchedule --schedule-type Weekly --day-of-week Tuesday --interval-weeks 1 --start-time 09:00 --duration 4`

Details:

  {
    "id": "/subscriptions/redact-aks-2/resourceGroups/redact-aks-2/providers/Microsoft.ContainerService/managedClusters/redact-aks-2/maintenanceConfigurations/aksManagedAutoUpgradeSchedule",
    "maintenanceWindow": {
      "durationHours": 4,
      "notAllowedDates": null,
      "schedule": {
        "absoluteMonthly": null,
        "daily": null,
        "relativeMonthly": null,
        "weekly": {
          "dayOfWeek": "Tuesday",
          "intervalWeeks": 1
        }
      },
      **"startDate": "2023-08-01",**
      "startTime": "09:00",
      "utcOffset": "+00:00"
    },
    "name": "aksManagedNodeOSUpgradeSchedule",
    "notAllowedTime": null,
    "resourceGroup": "redact-aks-2",
    "systemData": null,
    "timeInWeek": null,
    "type": null
  },
@stephybun
Copy link
Member

Thanks for raising this issue @aydosman.

I appreciate that there's information in your configuration that's sensitive, but I am unable to reproduce the error with the information provided. Would you be able to supply a minimal terraform config (no modules, variables etc.) that can reproduce the error?

@bamarch
Copy link

bamarch commented Aug 9, 2023

Getting a similar bug here @stephybun @aydosman

The input 'maintenanceWindow.startDate' 2023-07-14 00:00:00 +0000 cxTimeZone is before the current time

image

It looks like if you set up a schedule this startDate is entered into the state

Then if you wait a while and try to change the schedule after that date you will get this error (the provider doesn't update the date)

Replication steps

  1. At some point in the past I created a schedule
  automatic_channel_upgrade = "patch"
  
  maintenance_window_auto_upgrade {
    frequency = "Weekly"
    interval  = 1
    day_of_week = "Friday"
    start_time  = "00:00"
    utc_offset  = "+00:00"
    duration = 4
  }

  node_os_channel_upgrade = "NodeImage"
  
  maintenance_window_node_os {
    frequency = "Daily"
    interval  = 1
    start_time  = "00:00"
    utc_offset  = "+00:00"
    duration = 4
  }
  1. Then in the present I tried to change that schedule to this
automatic_channel_upgrade = "patch"

maintenance_window_auto_upgrade {
  frequency = "Weekly"
  interval  = 1
  day_of_week = "Saturday"
  start_time  = "00:00"
  utc_offset  = "+00:00"
  duration = 6

node_os_channel_upgrade = "NodeImage"

maintenance_window_node_os {
  frequency = "Weekly"
  interval  = 1
  day_of_week = "Saturday"
  start_time  = "00:00"
  utc_offset  = "+00:00"
  duration = 6
}

Extra info

This is the presented plan

image

The current terraform state from terraform show -json:

                "maintenance_window_auto_upgrade": [
                  {
                    "day_of_month": 0,
                    "day_of_week": "Friday",
                    "duration": 4,
                    "frequency": "Weekly",
                    "interval": 1,
                    "not_allowed": [],
                    "start_date": "2023-07-14T00:00:00Z",
                    "start_time": "00:00",
                    "utc_offset": "+00:00",
                    "week_index": ""
                  }
                ],
                "maintenance_window_node_os": [
                  {
                    "day_of_month": 0,
                    "day_of_week": "",
                    "duration": 4,
                    "frequency": "Daily",
                    "interval": 1,
                    "not_allowed": [],
                    "start_date": "2023-07-14T00:00:00Z",
                    "start_time": "00:00",
                    "utc_offset": "+00:00",
                    "week_index": ""
                  }
                ],

@bamarch
Copy link

bamarch commented Aug 9, 2023

Workaround

ℹ️ In my testing manually calculating a date in the future of this format 2023-07-14T00:00:00Z and specifying it via start_date optional parameter (e.g. start_date = "2023-08-09T00:00:00Z") successfully works around the issue

image

@bamarch
Copy link

bamarch commented Aug 9, 2023

Suggestions

Solution 1: The provider could calculate a timestamp if start_date is not specified in the hcl and the start_date in the state is from the past

Solution 2: It might be sufficient just to add a helpful warning printed to console when this happens or update some of the provider documentation to resolve this issue

@stevehipwell
Copy link

@bamarch as the provider is calculating the start date already and in this scenario it doesn't meet the constraints; I think the only valid solution would be for the provider to correctly calculate the start date.

@rogerioefonseca
Copy link

I'm facing a similar issue.
I created the maintenance using the resource "time_static" "example" {} to pass it's output as start_time like:
start_time = time_static.example.rfc3339

But as it comes with the hour, minute and second of the timestamp I needed to format it's value with FormatDate to get the 00:00:00Z

@TheFuzz4
Copy link

I also am currently running into this issue. I created a maint winodw last month. Now I want to modify the window to update the start time and the duration to new values. When the update runs it fails because its complaining that my start date is in the past. However, I never provided a start date when I created it and I'm not providing one now with this update.
Does anyone know if this has been addressed in an updated version?

@pankaj1203
Copy link

pankaj1203 commented Nov 20, 2023

@TheFuzz4
The issue still exists in the newest version also but the workaround is quite simple, you can create a local variable and assign the timestamp function to it and then use this value in the start_date parameter of the maintenance window. Below is the code snippet for the same.
locals {
current = timestamp()
}

output "current_time" {
value = local.current_time
}

maintenance_window_node_os {
start_date = local.current_time
}

you can also use time_rotating resource in terraform but time_static will not work here
https://registry.terraform.io/providers/hashicorp/time/latest/docs/resources/rotating

@stevehipwell
Copy link

@pankaj1203 that's not a solution as it will churn every time a change is made. The workaround is to create a start date from the input and then lock it until the input changes.

@TheFuzz4
Copy link

@stevehipwell my problem is that if you don't provide a startdate the system defaults to the date of when the window was created. With AZ CLI you can run updates all day and night without providing a startdate. So terraform should be able to also work the same as the CLI.

@stevehipwell
Copy link

@TheFuzz4 one of my team opened this issue so I fully understand the context and how it really isn't working correctly at the moment. We also have a working solution involving a terraform_data block per date we need to freeze and some messy Terraform date logic.

@Klodjangogo
Copy link

Klodjangogo commented Nov 21, 2023

I am facing the same issue when I try to update time schedule .

This is my error on terraform :

│ Kubernetes Cluster Name: "xxx"): maintenanceconfigurations.MaintenanceConfigurationsClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="InvalidParameter" Message="The input 'maintenanceWindow.startDate' 2023-11-13 00:00:00 +0000 cxTimeZone is before the current time 2023-11-17 08:47:59.892932169 +0000 UTC m=+84895.203340692."
│ 
│   with module.service.module.aks-arch.module.cluster.azurerm_kubernetes_cluster.kubernetes_cluster,
│   on .terraform/modules/service.aks-arch.cluster/main.tf line 20, in resource "azurerm_kubernetes_cluster" "kubernetes_cluster":
│   20: resource "azurerm_kubernetes_cluster" "kubernetes_cluster" {

As workaround I used start_date as format below but the main problem is that OS(NodeImage) upgrade does not happen . Not sure if I am missing something ?

start_date = "2023-11-20T15:00:00Z"

I have set these options :

{
  "nodeOsUpgradeChannel": "NodeImage",
  "upgradeChannel": "node-image"
}

and a maintenance window (maintenance_window_node_os) to happen on every Day ( I have tested last week too ) but nothing happens or triggered . AKS version 1.25.6 .

[
  {
    "id": "/subscriptions/xxx/maintenanceConfigurations/aksManagedNodeOSUpgradeSchedule",
    "maintenanceWindow": {
      "durationHours": 4,
      "notAllowedDates": null,
      "schedule": {
        "absoluteMonthly": null,
        "daily": {
          "intervalDays": 1
        },
        "relativeMonthly": null,
        "weekly": null
      },
      "startDate": "2023-11-20",
      "startTime": "15:00",
      "utcOffset": "+00:00"
    },
    "name": "aksManagedNodeOSUpgradeSchedule",
    "notAllowedTime": null,
    "resourceGroup": "xxx",
    "systemData": null,
    "timeInWeek": null,
    "type": null
  }
]

Is there a way to check why does not it start or maybe what is the issue that it is not triggered ( AKS version etc) ?.

@TheFuzz4
Copy link

TheFuzz4 commented Nov 21, 2023

Thank you @stevehipwell for opening this issue. I'm working with my POC at MSFT with this issue as well hoping to get some traction on this.
@Klodjangogo check the Activity Log and see what is going on in there. We had a PDB in place that was blocking us with one of our node pools from proceeding through an update. Once I fixed the PDB things rolled on as they should. Also did you register the preview flag? https://learn.microsoft.com/en-us/azure/aks/auto-upgrade-node-image#register-the-nodeosupgradechannelpreview-feature-flag

@Klodjangogo
Copy link

@TheFuzz4 thank you for your reply . I am not seeing any log on Activity log . I saw that part of preview flag but as mentioned one section above of that for Prerequisites SecurityPatch

"The following prerequisites are only applicable when using the SecurityPatch channel. If you aren't using this channel, you can ignore these requirements.
Must be using API version 11-02-preview or later
If using Azure CLI, the aks-preview CLI extension version 0.5.127 or later must be installed
The NodeOsUpgradeChannelPreview feature flag must be enabled on your subscription

Here it is specified that NodeOsUpgradeChannelPreview feature flag must be enabled only if SecurityPatch channel is used meanwhile i have Node-Image in place .

{
  "nodeOsUpgradeChannel": "NodeImage",
  "upgradeChannel": "node-image"
}

Should I register the preview flag?

@TheFuzz4
Copy link

@Klodjangogo so we set ours to non for the upgradechannel because there is no need for us to update our K8s automagically. We want to do it when we're ready to do so. For the nodeOsUpgradeChannel we are set to SecurityPatch. So that was my apologies for not thinking about that only being applicable to that particular channel. We are currently patiently waiting for the next securitypatch image to be released. Right now my nodes are on kernel 5.15.0-1049. I had to use the az cli to update some settings for our window because of this issue so I'm hoping to see if my nodes bounce anyday now.

@ms-henglu
Copy link
Contributor

Hi all,

I've opened a PR to fix this issue, and here's a workaround which uses azapi provider to manage the maintenance configs: https://gist.github.com/ms-henglu/df1119f4243f86e25722ab9320c48bfc

@TheFuzz4
Copy link

thank you @ms-henglu any idea when your PR will be merged in? Do you know if this will be backwards compatible or will we need to change all of our providers?

@ms-henglu
Copy link
Contributor

Hi @TheFuzz4 , it would be merged by the end of this month.

Yes, it will be backwards compatible as long as you didn't specify a start_date which is before the current date in the config.

@TheFuzz4
Copy link

@ms-henglu yeah we don't pass in the start date we just want it to function like the az cli does where if you don't specify one it just defaults to current date/time.

@rgarcia89
Copy link

@TheFuzz4 I have configured exactly that case as follows:

locals {
  current_time = timestamp()
  start_time   = timeadd(local.current_time, "1h")
}

resource "azurerm_kubernetes_cluster" "aks" {
...
  maintenance_window_auto_upgrade {
    frequency   = "RelativeMonthly"
    interval    = 1
    duration    = 4
    day_of_week = "Tuesday"
    week_index  = "First"
    start_time  = "08:00"
    utc_offset  = "+01:00"
    start_date  = local.start_time
  }
}

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet