Skip to content

Commit

Permalink
Merge pull request #623 from cal-itp/feat/uptime-monitor
Browse files Browse the repository at this point in the history
Feature: set up uptime monitoring
  • Loading branch information
afeld authored Jun 10, 2022
2 parents 14c5255 + 81b275c commit e193198
Show file tree
Hide file tree
Showing 11 changed files with 265 additions and 0 deletions.
31 changes: 31 additions & 0 deletions docs/deployment/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

[dev-benefits.calitp.org](https://dev-benefits.calitp.org) is currently deployed into a Microsoft Azure account provided by [California Department of Technology (CDT)'s Office of Enterprise Technology (OET)](https://techblog.cdt.ca.gov/2020/06/cdt-taking-the-lead-in-digital-transformation/), a.k.a. the "DevSecOps" team. More specifically, it uses [custom containers](https://docs.microsoft.com/en-us/azure/app-service/configure-custom-container) on [Azure App Service](https://docs.microsoft.com/en-us/azure/app-service/overview).

The infrastructure is configured as code via [Terraform](https://www.terraform.io/), for [various reasons](https://techcommunity.microsoft.com/t5/fasttrack-for-azure/the-benefits-of-infrastructure-as-code/ba-p/2069350). We are adding existing resources to the configuration progressively. In other words, not _all_ our resources in Azure show up under [`terraform/`][terraform-dir], but we are [moving that direction](https://github.com/cal-itp/benefits/issues/618).

## Architecture

### System interconnections
Expand Down Expand Up @@ -53,3 +55,32 @@ flowchart LR
```

WAF: [Web Application Firewall](https://azure.microsoft.com/en-us/services/web-application-firewall/)

## Monitoring

We have [ping tests](https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability) set up to notify about availability of the dev, test, and prod deployments. Alerts go to [#benefits-notify](https://cal-itp.slack.com/archives/C022HHSEE3F).

## Making changes

1. Get access to the Azure account through the DevSecOps team.
1. Install dependencies:
- [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)
- [Terraform](https://www.terraform.io/downloads)
1. [Authenticate using the Azure CLI](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/guides/azure_cli), specifying the `CDT/ODI Production` Subscription.
1. Outside the [dev container](../../getting-started/), navigate to the [`terraform/`][terraform-dir] directory.
1. [Initialize Terraform.](https://www.terraform.io/cli/commands/init)

```sh
terraform init
```

1. Make changes to Terraform files.
1. [Plan](https://www.terraform.io/cli/commands/plan)/[apply](https://www.terraform.io/cli/commands/apply) the changes, as necessary.

```sh
terraform apply
```

1. [Submit the changes via pull request.](../development/commits-branches-merging/) Be sure to specify whether they've been applied, i.e. whether they're live or not.

[terraform-dir]: https://github.com/cal-itp/benefits/tree/dev/terraform
36 changes: 36 additions & 0 deletions terraform/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# https://github.com/github/gitignore/blob/e5323759e387ba347a9d50f8b0ddd16502eb71d4/Terraform.gitignore

# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version
# control as they are data points which are potentially sensitive and subject
# to change depending on the environment.
*.tfvars
*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc
22 changes: 22 additions & 0 deletions terraform/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions terraform/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[Documentation](https://docs.calitp.org/benefits/deployment/azure/)
26 changes: 26 additions & 0 deletions terraform/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.7.0"
}
}

backend "azurerm" {
resource_group_name = "RG-CDT-PUB-VIP-CALITP-P-001"
storage_account_name = "sacdtcalitpp001"
container_name = "tfstate"
key = "terraform.tfstate"
}
}

provider "azurerm" {
# temporary workaround for permissions issue
skip_provider_registration = true

features {}
}

data "azurerm_resource_group" "benefits" {
name = "RG-CDT-PUB-VIP-CALITP-P-001"
}
26 changes: 26 additions & 0 deletions terraform/monitor.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
data "azurerm_key_vault" "main" {
name = "KV-CDT-PUB-CALITP-P-001"
resource_group_name = data.azurerm_resource_group.benefits.name
}

# created manually
# https://slack.com/help/articles/206819278-Send-emails-to-Slack
data "azurerm_key_vault_secret" "slack_benefits_notify_email" {
name = "slack-benefits-notify-email"
key_vault_id = data.azurerm_key_vault.main.id
}

resource "azurerm_monitor_action_group" "dev_email" {
name = "benefits-notify Slack channel email"
resource_group_name = data.azurerm_resource_group.benefits.name
short_name = "slack-notify"

email_receiver {
name = "Benefits engineering team"
email_address = data.azurerm_key_vault_secret.slack_benefits_notify_email.value
}

lifecycle {
ignore_changes = [tags]
}
}
38 changes: 38 additions & 0 deletions terraform/uptime.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
module "dev_healthcheck" {
source = "./uptime"

action_group_id = azurerm_monitor_action_group.dev_email.id
name = "dev-healthcheck"
resource_group_name = data.azurerm_resource_group.benefits.name
url = "https://dev-benefits.calitp.org/healthcheck"
}

module "test_healthcheck" {
source = "./uptime"

action_group_id = azurerm_monitor_action_group.dev_email.id
name = "test-healthcheck"
resource_group_name = data.azurerm_resource_group.benefits.name
url = "https://test-benefits.calitp.org/healthcheck"
}

module "prod_healthcheck" {
source = "./uptime"

action_group_id = azurerm_monitor_action_group.dev_email.id
name = "prod-healthcheck"
resource_group_name = data.azurerm_resource_group.benefits.name
url = "https://benefits.calitp.org/healthcheck"
}

# migrations

moved {
from = azurerm_application_insights_web_test.dev_healthcheck
to = module.dev_healthcheck.azurerm_application_insights_web_test.healthcheck
}

moved {
from = azurerm_monitor_metric_alert.uptime
to = module.dev_healthcheck.azurerm_monitor_metric_alert.uptime
}
1 change: 1 addition & 0 deletions terraform/uptime/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[Terraform module](https://www.terraform.io/language/modules) to set up [ping tests](https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability).
55 changes: 55 additions & 0 deletions terraform/uptime/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
data "azurerm_application_insights" "benefits" {
name = "AI-CDT-PUB-VIP-CALITP-P-001"
resource_group_name = var.resource_group_name
}

resource "azurerm_application_insights_web_test" "healthcheck" {
name = var.name
location = data.azurerm_application_insights.benefits.location
resource_group_name = var.resource_group_name
application_insights_id = data.azurerm_application_insights.benefits.id
kind = "ping"
enabled = true

# "We strongly recommend testing from … a minimum of five locations."
# https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability#create-a-test
geo_locations = [
"us-fl-mia-edge", # Central US
"us-va-ash-azr", # East US
"us-il-ch1-azr", # North Central US
"us-tx-sn1-azr", # South Central US
"us-ca-sjc-azr", # West US
]

configuration = templatefile("${path.module}/webtest.xml", { url = var.url })

lifecycle {
ignore_changes = [tags]
}
}

resource "azurerm_monitor_metric_alert" "uptime" {
name = "uptime-${var.name}"
resource_group_name = var.resource_group_name
scopes = [
azurerm_application_insights_web_test.healthcheck.id,
data.azurerm_application_insights.benefits.id
]
severity = var.severity

application_insights_web_test_location_availability_criteria {
web_test_id = azurerm_application_insights_web_test.healthcheck.id
component_id = data.azurerm_application_insights.benefits.id
# "the optimal configuration is to have the number of test locations be equal to the alert location threshold + 2"
# https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability#create-a-test
failed_location_count = length(azurerm_application_insights_web_test.healthcheck.geo_locations) - 2
}

action {
action_group_id = var.action_group_id
}

lifecycle {
ignore_changes = [tags]
}
}
22 changes: 22 additions & 0 deletions terraform/uptime/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
variable "action_group_id" {
type = string
}

variable "name" {
type = string
description = "What to call the ping test"
}

variable "resource_group_name" {
type = string
}

variable "severity" {
type = number
default = 1
description = "https://docs.microsoft.com/en-us/azure/azure-monitor/best-practices-alerts#alert-severity"
}

variable "url" {
type = string
}
7 changes: 7 additions & 0 deletions terraform/uptime/webtest.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<!-- boilerplate configuration -->
<WebTest Name="dev-healthcheck" Enabled="True" CssProjectStructure="" CssIteration="" Timeout="120" WorkItemIds=""
xmlns="http://microsoft.com/schemas/VisualStudio/TeamTest/2010" Description="" CredentialUserName="" CredentialPassword="" PreAuthenticate="True" Proxy="default" StopOnError="False" RecordedResultFile="" ResultsLocale="">
<Items>
<Request Method="GET" Version="1.1" Url="${url}" ThinkTime="0" Timeout="300" ParseDependentRequests="True" FollowRedirects="True" RecordResult="True" Cache="False" ResponseTimeGoal="0" Encoding="utf-8" ExpectedHttpStatusCode="200" ExpectedResponseUrl="" ReportingName="" IgnoreHttpStatusCode="False" />
</Items>
</WebTest>

0 comments on commit e193198

Please sign in to comment.