Skip to content

ministryofjustice/cloud-platform-infrastructure

Repository files navigation

Cloud Platform Infrastructure

Introduction

This repository will contain all that's required to create a MoJ Cloud Platform Kubernetes cluster. The majority of this repo is made up of Terraform scripts that will be actioned by a pipeline.

Here you'll also find instruction on how to operate a MoJ Cloud Platform cluster.

Table of contents

How to run Go tests

Prerequestites

To run test/modsec_logging_test.go you need to add your aws user arn to the opensearch. navigate to (opensearch dashboard)[https://logs.cloud-platform.service.justice.gov.uk/_dashboards/app/security-dashboards-plugin#/roles/edit/all_access/mapuser] -> add your user arn under trhe users section. Because we need to use aws credentials to sign our requests to open search and we don't use profiles in the pipeline, you need to explicitly set your aws key and secret to minic the pipeline and the run the tests.

test/docker/

This dir holds ministryofjustice custom dockerfiles needed for testing, if you need to amend the dockerfiles you can locally build and push to (ministryofjustice dockerhub)[https://hub.docker.com/repository/docker/ministryofjustice/curl-jq/general].

Running the tests

To run the integration tests on a MoJ Cloud Platform cluster you must have the following tools installed: (Tool versioning is very important. I find it best to refer to the official MoJ Cloud Platform tools docker image)

You can then either run:

make run-tests

or using Go:

go test -v ./...

or

cd test; ginkgo -r -v  # for realtime response

Arguments

-cluster # [optional] specifies the cluster name you'd like to use. [default] current context

-kubeconfig # [optional] define where your kubeconfig file is located. [default] ~/.kube/config

Running individual tests

A neat trick in Ginkgo is to place an "F" in front of the "Describe", "It" or "Context" functions. This marks it as focused.

So, if you have spec like:

    It("should be idempotent", func() {

You rewrite it as:

    FIt("should be idempotent", func() {

And it will run exactly that one spec:

[Fail] testing Migrate setCurrentDbVersion [It] should be idempotent
...
Ran 1 of 5 Specs in 0.003 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 4 Skipped

Making changes to Ginkgo tests

Ginkgo works best from the command-line, and ginkgo watch makes it easy to rerun tests on the command line whenever changes are detected.

How to update Go dependencies

With the repository cloned:

cd test; go get -u ./...

Perform the tests as outlined above and confirm they pass.

Create a PR and merge to main.

Terraform and Cloud Platform environment management

Terraform is used to manage all AWS resources, with Terraform resources stored in the terraform/ directory.

Terraform resources are split into five directories with matching state objects in S3, terraform/global-resources, terraform/cloud-platform, terraform/cloud-platform-account, terraform/aws-accounts/cloud-platform-aws/vpc/eks/core and terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components:

  • global-resources contains 'global' AWS resources that are not part of specific clusters or platform environments - e.g. elasticsearch and s3.
  • cloud-platform contains resources for the Cloud Platform environments - e.g. bastion hosts.
  • cloud-platform-account contains account specifics like cloud-trail. We decided to seperate account level Terraform and global "run once" as we're currently running from multiple AWS accounts.
  • terraform/aws-accounts/cloud-platform-aws/vpc/eks/core contains core terraform which need to be present in every cluster.
  • terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components contains optional terraform components.

As all four resources are defined with separate state backends, terraform plan and apply must be run separately:

$ cd terraform/global-resources
$ terraform plan
Refreshing Terraform state in-memory prior to plan...
...
$ cd ../cloud-platform
$ terraform plan
Refreshing Terraform state in-memory prior to plan...
...
$ cd terraform/cloud-platform-account
$ terraform plan
Refreshing Terraform state in-memory prior to plan...
...
$ cd ../core
$ terraform plan
Refreshing Terraform state in-memory prior to plan...
...
...
$ cd ../core/components
$ terraform plan
Refreshing Terraform state in-memory prior to plan...
...

All resources share a single S3 state bucket called cloud-platform-terraform-state located on the aws-cloud-platform account. tfstate files however are seperated by workspace_key_prefix defined in each directories main.tf and environment defined by workspace.

The s3 state store structure appears as follows:

├── cloud-platform-terraform-state
    ├── cloud-platform-account/
    │   ├── cloud-platform/
    │   │   └── terraform.tfstate
    │   ├── mojdsd-platform-integration/
    │   │   └── terraform.tfstate
    ├── cloud-platform-core/
    │   ├── cloud-platform-live-0/
    │   │   └── terraform.tfstate
    │   ├── cloud-platform-test-1/
    │   │   └── terraform.tfstate
    ├── cloud-platform-components/
    │   ├── cloud-platform-live-0/
    │   │   └── terraform.tfstate
    │   ├── cloud-platform-test-1/
    │   │   └── terraform.tfstate
    ├── cloud-platform/
    │   ├── cloud-platform-live-0/
    │   │   └── terraform.tfstate
    │   ├── cloud-platform-test-1/
    │   │   └── terraform.tfstate
    ├── global-resources/
    │   └── terraform.tfstate

cloud-platform, and cloud-platform-components resources can refer to output values of other Terrform states by using the Terraform remote state data resource:

data "terraform_remote_state" "global" {
  backend = "s3"
  config {
    bucket  = "cloud-platform-terraform-state"
    region  = "eu-west-2"
    key     = "global-resources/terraform.tfstate"
    profile = "moj-cp"
  }
}

module "cluster_dns" {
  source = "../modules/cluster_dns"

  parent_zone_id = "${data.terraform_remote_state.global.cp_zone_id}"
}

This structure allows us to reduce the blast radius of errors when compared to a single state store, and also allows us to separate infrastructure into multiple logical areas, with different access controls for each.

Cloud Platform environments

Terraform workspaces are used to manage multiple instances of the cloud-platform, cloud-platform-account and cloud-platform-components resources. To see the workspaces/environments that currently exist:

$ terraform workspace list
* default
  cloud-platform-live-0
  cloud-platform-test-1

Note: the default workspace is not used.

To select a workspace/environment:

$ terraform workspace select cloud-platform-test-1

The selected Terraform workspace is interpolated in Terraform resource declarations to create per-environment AWS resources, e.g.:

locals {
    cluster_name = "${terraform.workspace}"
}

Terraform modules

All cloud-platform resources are defined as Terraform modules, stored in terraform/modules, and any new resources should also be managed as modules, and imported into terraform/cloud-platforms/main.tf. This model allows us to encapsulate multiple resources as logical blocks, and will (later) allow us to manage and version modules separately from the main repository.

How to add your examples

Generally speaking, follow the Ministry of Justice's Using git guide.

1. Clone the repo

git clone [email protected]:ministryofjustice/cloud-platform-infrastructure.git

2. Create a branch

For example:

git checkout -b spike/monitoring-investigation

I used spike/monitoring-investigation as an example of a pattern for branch names. You can come up with your own branch name that matches the pattern (e.g. feature/a-new-monitoring-stack or idea/deploy-using-bash-scripts).

3. Add your work to the branch

Think about where to put it — perhaps in a directory with a useful name (e.g. "prometheus") and collect together similar things (e.g. put "prometheus" directory under a "monitoring" directory).

4. Document what you've done

Add (or add to) a README.md in the aforementioned folder describing the work you've done, why you've done it, how people can use it, or what it might mean.

If you need to add more documentation than seems appropriate for a readme, add a docs directory somewhere that makes sense, and create .md files with names describing what you're documenting.

5. Commit your code

Write a commit message that might be useful for people who come to the code to find out why you've made the change. This might be helpful: How to write a git commit message.

Here's an example:

Add contributing instructions

I added some instructions to the repo in a README file so that
other members of the team would know how to add code to the repo.

I aimed to make the instructions clear and simple to follow. I also
wanted to make sure that people left good context for the contributions
that they were making, so I added quite a lot about commit messages.

The first (subject) line should be written so that it completes the sentence "If applied, this commit will…", and not end with a full stop.

6. Raise a pull request

Raise a pull request by pushing your branch to the GitHub:

git push origin spike/monitoring-investigation

and then navigating to the repo in GitHub and using the create a new pull request button.

When you do this you have the option of adding a reviewer. It's good to share your pull request for review so add a reviewer. Let the reviewer know that you are adding them so they have a chance to plan some time to do the review.

Create/Delete a cluster

See the runbooks site