Feature request: Please support parent_id on recurring Downtimes #109

ardrigh · 2018-10-26T05:47:38Z

Our team is trying to use Terraform to manage a scheduled monthly downtime for Datadog. It occurs on the first day of the month for one hour.

I imported the existing downtime monitor to avoid manually adding the start and end values, and it worked fine until the next downtime was completed and the id value changed.

I asked about this behaviour in the Datadog Slack channel and I was told this is the way the downtime monitors work. The first id value runs, when it is complete a new id value is created with the parent_id value set to the original id value.

If the Datadog provider can process the extra pieces of information, the downtimes would not appear in the plan as a creation. It would hopefully manage the id value transparently in the state file somehow.

Terraform Version

Terraform v0.11.10

Affected Resource(s)

datadog_downtime

Terraform Configuration Files

resource "datadog_downtime" "scheduled_outage" {
  scope      = ["host:host.example.com"]
  monitor_id = 0000001

  recurrence {
    type   = "months"
    period = 1
  }

  message = "host downtime for monthly backup of vm. Notify: @slack-devops"

  lifecycle {
    ignore_changes = ["start", "end", "active", "disabled"]
  }
}

Expected Behavior

terraform plan will look for updates to the downtime_monitor but will not consider it an addition.

Actual Behavior

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + datadog_downtime.scheduled_outage
      id:                  <computed>
      message:             "host downtime for monthly backup of vm. Notify: @slack-devops"
      monitor_id:          "0000001"
      recurrence.#:        "1"
      recurrence.0.period: "1"
      recurrence.0.type:   "months"
      scope.#:             "1"
      scope.0:             "host:host.example.com"


Plan: 1 to add, 0 to change, 0 to destroy.

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

terraform init
terraform plan

References

The API code examples show the parent_id field, it is not mentioned in the attached documentation.
https://docs.datadoghq.com/api/?lang=python#schedule-monitor-downtime

The text was updated successfully, but these errors were encountered:

vanvlack · 2019-03-23T21:00:18Z

Added ParentId to the client in PR zorkian/go-datadog-api#227 - Once that's accepted, likely can make changes here to bring it in.

vanvlack · 2019-04-22T17:18:56Z

ParentId is now supported as of zorkian/go-datadog-api#227 being merged

ardrigh · 2019-04-23T02:37:19Z

@vanvlack thanks very much for getting that piece done.

I don't know what code is required to add support in this provider.

I believe if the provider was able to compare a new parent_id with the previous list of id in the state file, it would avoid the downtime_monitor being viewed as a change, and thus avoid creating duplicates.

bal2ag · 2019-05-01T21:02:43Z

I wanted to add some color to this issue. We've been trying to get our entire monitoring infrastructure defined in Terraform, and recurring downtimes are the only resource we've been unable to manage in Terraform. We use recurring downtimes almost exclusively, for e.g. anomaly monitors that get noisy during off hours.

Because the recurring downtime model changes the ID on every reoccurrence, it breaks Terraform's model that re-applying Terraform configuration without any changes should have no backend resource changes. This was quite frustrating as from Terraform's perspective, it thinks the originally created downtime just "disappears" and re-applying the same configuration causes a 400 error (since it tries to create a new reoccurring downtime in the past, since you have to specify start and end in absolute epoch time).

I know I'm not being super helpful by describing a problem we already know exists, but it might be worth updating the Terraform documentation to reflect that reoccurring downtimes don't work as expected from Terraform's perspective until this issue is resolved (I also think it's closer to a bug than a feature request, IMHO). Happy to provide additional insight from our experience but I suspect many people have taken a similar path to us and just reverted to managing downtimes through DataDog's UI.

pdecat · 2019-09-17T08:59:34Z

Hi, I've looked into implementing this using the downtime parent_id field.
As each new occurrence of the downtime takes the parent id from its immediate predecessor, this actually makes a linked-list of downtime items.
But completed downtime items being deleted from the https://api.datadoghq.com/api/v1/downtime/ endpoint after a few hours, the full linked-list to the original downtime cannot be re-built so the identity of the current downtime item cannot be determined with certainty.

Steps to reproduce:

create a downtime for 1h with a 1 day recurring period, let's say it gets id 0001,
verify its attributes with curl -s "https://api.datadoghq.com/api/v1/downtime/0001?api_key=${DATADOG_API_KEY}&application_key=${DATADOG_APP_KEY}", as expected its parent_id is null
wait until it completes, it is still accessible using the above command for a few hours (must be a batch process of some kind)
after a few hours, the above command will fail with HTTP/1.1 404 Not Found and payload {"errors":["Downtime not found"]}
get all currently existing downtimes with curl -s "https://api.datadoghq.com/api/v1/downtime?api_key=${DATADOG_API_KEY}&application_key=${DATADOG_APP_KEY}", you'll find a downtime whose parent_id field is the original downtime, let's say it gets id 0002
verify its attributes with curl -s "https://api.datadoghq.com/api/v1/downtime/0002?api_key=${DATADOG_API_KEY}&application_key=${DATADOG_APP_KEY}", as expected its parent_id is 0001
wait until it completes and is replaced by another occurrence which id 0003
after a few hours, the above command will fail with HTTP/1.1 404 Not Found and payload {"errors":["Downtime not found"]}
at that point, there's no more link to the original downtime id.

I can think about at least two options that could let this work:

The downtime API could expose an original_parent_id field on each downtime item. That way, the link could always be restored. Better yet, it could add an option to query downtimes by that field to avoid having to retrieve all downtimes and search client side.
The datadog API could expose some kind of downtime generator items whose id would stay stable over time.

pdecat · 2019-09-17T09:43:04Z

FWIW, I pushed a POC here: https://github.com/pdecat/terraform-provider-datadog/tree/recurrent_downtimes (https://github.com/pdecat/terraform-provider-datadog/commit/44f4ecd27b36371e9ca4cb8f0855d90c2d1a3947)

Applied this yesterday (Monday 2019/09/16):

resource "datadog_downtime" "test" {
  disabled   = false
  message    = "Managed by Terraform. Imported from web."
  monitor_id = null
  scope      = ["*"]

  start = 1568647200
  end   = 1568647300

  timezone = "Europe/Paris"

  recurrence {
    period = 1
    type   = "days"
  }
}

Today's plan with 2.4.0 (Tuesday 2019/09/17):

Terraform will perform the following actions:

  # datadog_downtime.test will be created
  + resource "datadog_downtime" "test" {
      + disabled = false
      + end      = 1568647300
      + id       = (known after apply)
      + message  = "Managed by Terraform. Imported from web."
      + scope    = [
          + "*",
        ]
      + start    = 1568647200
      + timezone = "Europe/Paris"

      + recurrence {
          + period = 1
          + type   = "days"
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Update:

And as expected, the day after (Wednesday 2019/09/18), this no longer works because the first child of the original downtime was deleted:

Terraform will perform the following actions:

  # datadog_downtime.test will be created
  + resource "datadog_downtime" "test" {
      + disabled = false
      + end      = 1568647300
      + id       = (known after apply)
      + message  = "Managed by Terraform. Imported from web."
      + scope    = [
          + "*",
        ]
      + start    = 1568647200
      + timezone = "Europe/Paris"

      + recurrence {
          + period = 1
          + type   = "days"
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

vanvlack · 2019-09-18T19:29:20Z

@pdecat any reason we need to know that original parent_id at all? We can assume that if a parent_id exists, it is a repeating downtime. Wonder if there is a way forward with that assumption?

edit: realizing this doesnt actually help us, as changes will need to somehow be tied to the new rotated monitors...

ardrigh · 2019-09-18T22:54:57Z

@pdecat any reason we need to know that original parent_id at all? We can assume that if a parent_id exists, it is a repeating downtime. Wonder if there is a way forward with that assumption?

edit: realizing this doesnt actually help us, as changes will need to somehow be tied to the new rotated monitors...

The parent_id indicates it is an existing resource, but without the full history of that list, then you can only use data from the resource as written in the Terraform code - and apart from the name of it, I don't think you could safely rely on those fields.

It might need a request to Datadog to support something like a grandparent_id field, if they don't provide a way to query the history of a parent_id back to the original id value.

I am happy to put in a support query to see what they say

platinummonkey · 2019-10-02T13:25:46Z

👋 this is something we’re looking to address in the nearish future. Among some other changes making downtimes (mostly) immutable (to address other edge cases people have run into).

Thank you for this helpful feedback 😄

ardrigh · 2019-10-03T04:54:53Z

@platinummonkey that's great news. Please keep us updated on any progress 🍻

bal2ag · 2020-01-10T00:52:02Z

@platinummonkey has there been any progress towards making recurring downtimes manageable in Terraform?

MrLemur · 2021-02-18T12:27:44Z

@platinummonkey Just wondering if there has been any progress on this yet?

platinummonkey · 2021-02-18T15:40:52Z

@phillip-dd ^

phillip-dd · 2021-02-19T22:45:53Z

We're tracking this internally and have some worked queued that should address this.

NBParis · 2021-06-18T08:36:21Z

Hello,

Thanks for your patience on this.
I’m happy to share that the issue has been addressed and recurring downtimes are now properly handled with the new version of the terraform provider (v3.1.0 - see updates here).

You have to update your terraform provider to the version 3.10 to benefit from the fix.

You can find here the PR that addresses this issue and which contains a very detailed description about the change made and the remaining caveat that we are still working to improve.

I'll go ahead and resolve this issue but feel free to let us know if you have any question or feedback.

Thanks again for reporting this issue and helping us improve the terraform provider to better manage Downtimes.

masterzen mentioned this issue Nov 17, 2018

datadog_screenboard - panic: interface conversion: interface {} is string, not float64 #117

Closed

zippolyte mentioned this issue Oct 2, 2019

Remove downtime recurrence as it is not supported DataDog/datadog-cloudformation-resources#21

Merged

bkabrda added the feature-request label Feb 4, 2020

phillip-dd mentioned this issue Aug 3, 2020

Add downtime rrule attribute. #610

Merged

armcburney mentioned this issue Jun 2, 2021

Properly handle recurring downtimes definitions #1092

Merged

NBParis closed this as completed Jun 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Please support parent_id on recurring Downtimes #109

Feature request: Please support parent_id on recurring Downtimes #109

ardrigh commented Oct 26, 2018 •

edited

Loading

vanvlack commented Mar 23, 2019

vanvlack commented Apr 22, 2019

ardrigh commented Apr 23, 2019 •

edited

Loading

bal2ag commented May 1, 2019

pdecat commented Sep 17, 2019

pdecat commented Sep 17, 2019 •

edited

Loading

vanvlack commented Sep 18, 2019 •

edited

Loading

ardrigh commented Sep 18, 2019

platinummonkey commented Oct 2, 2019

ardrigh commented Oct 3, 2019 •

edited

Loading

bal2ag commented Jan 10, 2020

MrLemur commented Feb 18, 2021

platinummonkey commented Feb 18, 2021

phillip-dd commented Feb 19, 2021

NBParis commented Jun 18, 2021 •

edited

Loading

Feature request: Please support parent_id on recurring Downtimes #109

Feature request: Please support parent_id on recurring Downtimes #109

Comments

ardrigh commented Oct 26, 2018 • edited Loading

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Expected Behavior

Actual Behavior

Steps to Reproduce

References

vanvlack commented Mar 23, 2019

vanvlack commented Apr 22, 2019

ardrigh commented Apr 23, 2019 • edited Loading

bal2ag commented May 1, 2019

pdecat commented Sep 17, 2019

pdecat commented Sep 17, 2019 • edited Loading

vanvlack commented Sep 18, 2019 • edited Loading

ardrigh commented Sep 18, 2019

platinummonkey commented Oct 2, 2019

ardrigh commented Oct 3, 2019 • edited Loading

bal2ag commented Jan 10, 2020

MrLemur commented Feb 18, 2021

platinummonkey commented Feb 18, 2021

phillip-dd commented Feb 19, 2021

NBParis commented Jun 18, 2021 • edited Loading

ardrigh commented Oct 26, 2018 •

edited

Loading

ardrigh commented Apr 23, 2019 •

edited

Loading

pdecat commented Sep 17, 2019 •

edited

Loading

vanvlack commented Sep 18, 2019 •

edited

Loading

ardrigh commented Oct 3, 2019 •

edited

Loading

NBParis commented Jun 18, 2021 •

edited

Loading