-
-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaking Change: Migrate to v7 #819
Comments
## Description This PR adds the `wait_for_services_timeout` docker setting for the executor (see https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runnersdocker-section). I've found #513 while looking for a solution to our problem. After migrating from the Kubernetes executer to the docker-machine executer, we have had a couple of Gitlab jobs that are always waiting 30s before actually running the defined steps. I am aware of #819 but I believe this might be a quick win. ## Migrations required NO ## Test the change In order to test my change, I recommend to set ```hcl debug = { "output_runner_config_to_file": true, "output_runner_user_data_to_file": false } ``` and then run `terraform plan`. It will print the locally rendered `config.toml` that now contains the new setting: ``` [...] pre_clone_script = "" request_concurrency = 1 output_limit = 4096 limit = 0 wait_for_services_timeout = 30 [...] ``` --------- Co-authored-by: Matthias Kay <[email protected]>
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days. |
@npalm Can we go ahead with this one? |
NO objection |
Great, I will release the changes on Sunday, September 10th at 1900z |
Changes have been released on Sunday. Will closely monitor the issues here within this week. |
❗ Just found out that the conversion script has a bug. Please check the instance type of the Runner Worker machines. I guess they are all See #975 |
I have tried to migrate to the newer v7.x series of this module, but I am running into issues where my runners no longer register with GitLab. As far as I can tell from applying the configuration, there were no changes required to the actual resources. Can someone look at the v6.5.1 config vs the v7.1.0 one and tell me if I misplaced some config value? As far as I can tell they are equivalent: v6.5.1 module "runner" {
source = "npalm/gitlab-runner/aws"
version = "6.5.1"
aws_region = var.aws_region
# We need to override the environment name to be less than 64 characters in length.
#
# The module code uses the environment name + some postfix string by default, so we
# use that limited to a max length of 21 to allow for postfix string lengths.
environment = substr(var.environment_name, 0, 21)
# To allow the runner to access the internet without requiring a NAT gateway, we must
# give them a public IP.
runners_use_private_address = false
enable_eip = true
vpc_id = data.aws_vpc.main.id
subnet_ids_gitlab_runner = data.aws_subnets.public_subnet_ids.ids
subnet_id_runners = data.aws_subnets.public_subnet_ids.ids[0]
extra_security_group_ids_runner_agent = [data.aws_security_group.rds.id]
enable_cloudwatch_logging = false
cache_bucket_set_random_suffix = true
# This has the jobs run on the same EC2 instance as the agent, no autoscaling is used.
runners_executor = "docker"
runners_name = "django-project-${var.environment_name}"
runners_gitlab_url = "https://gitlab.com"
gitlab_runner_registration_config = {
registration_token = var.runner_token
tag_list = join(", ", var.runner_tags)
description = "Ephemeral runner for the project."
locked_to_project = "true"
run_untagged = "false"
maximum_timeout = "3600"
}
# Buff our runner instance size since we aren't using the docker+machine. This means
# the jobs run directly on a runner, so a t3.micro instance might not cut it.
instance_type = "m5.large"
gitlab_runner_version = "15.11.0"
# Allow SSM access to help debug if runner issues arise.
enable_runner_ssm_access = true
} v7.1.0 module "runner" {
source = "npalm/gitlab-runner/aws"
version = "7.1.0"
# We need to override the environment name to be less than 64 characters in length.
#
# The module code uses the environment name + some postfix string by default, so we
# use that limited to a max length of 21 to allow for postfix string lengths.
environment = substr(var.environment_name, 0, 21)
vpc_id = data.aws_vpc.main.id
subnet_id = data.aws_subnets.public_subnet_ids.ids[0]
runner_gitlab_registration_config = {
registration_token = var.runner_token
description = "Ephemeral runner for the project."
locked_to_project = "true"
run_untagged = "false"
maximum_timeout = "3600"
}
runner_instance = {
# Buff our runner instance size since we aren't using the docker+machine. This means
# the jobs run directly on a runner, so a t3.micro instance might not cut it.
type = "m5.large"
# To allow the runner to access the internet without requiring a NAT gateway, we must
# give them a public IP.
use_eip = true
private_address_only = false
name = "django-project-${var.environment_name}"
# Allow SSM access to help debug if runner issues arise.
ssm_access = true
tag_list = join(", ", var.runner_tags)
}
runner_gitlab = {
url = "https://gitlab.com"
runner_version = "15.11.0"
}
runner_cloudwatch = {
enable = false
}
# This has the jobs run on the same EC2 instance as the agent, no autoscaling is used.
runner_worker = {
type = "docker"
}
runner_worker_cache = {
random_suffix = true
}
runner_networking = {
security_group_ids = [data.aws_security_group.rds.id]
}
# This ends up taking precedence over the `subnet_id` input above, but that input is
# required.
# https://github.com/cattle-ops/terraform-aws-gitlab-runner/blob/5100efd3445c3f06e5089d970da5a3a0341624eb/main.tf#L177C42-L177C79
runner_worker_docker_machine_instance = {
subnet_ids = data.aws_subnets.public_subnet_ids.ids
}
} |
@ryancausey Could you share the error message from theogs please? |
@kayman-mk here's what I get from
|
The last line looks strange. It says Seems that something is broken with the URL and the name. The relevant parts should be these
Are there any lines in the config.toml which show null? |
@kayman-mk it looks like the token is missing. Here's config.toml for module version 7.1.1 # cat /etc/gitlab-runner/config.toml
concurrent = 10
check_interval = 3
sentry_dsn = ""
log_format = "json"
listen_address = ""
[[runners]]
name = "runner-nonprod259196356"
url = "https://gitlab.com"
clone_url = ""
token = "null"
executor = "docker"
environment = []
pre_build_script = ""
post_build_script = ""
pre_clone_script = ""
request_concurrency = 1
output_limit = 4096
limit = 0
[runners.docker]
disable_cache = false
image = "docker:18.03.1-ce"
privileged = true
pull_policies = ["always"]
shm_size = 0
tls_verify = false
volumes = ["/cache"]
[runners.docker.tmpfs]
[runners.docker.services_tmpfs]
[runners.cache]
Type = "s3"
Shared = false
[runners.cache.s3]
AuthenticationType = "iam"
ServerAddress = "s3.amazonaws.com"
BucketName = "<bucket name>"
BucketLocation = "us-west-2"
Insecure = false
[runners.machine]
IdleCount = 0
IdleTime = 600
MachineDriver = "amazonec2"
MachineName = "nonprod259196356-%s"
MachineOptions = [
"amazonec2-instance-type=m5.large",
"amazonec2-region=us-west-2",
"amazonec2-zone=b",
"amazonec2-vpc-id=<vpc id>",
"amazonec2-subnet-id=<subnet id>",
"amazonec2-subnet-id=<subnet id>",
"amazonec2-subnet-id=<subnet id>",
"amazonec2-subnet-id=<subnet id>",
"amazonec2-private-address-only=true",
"amazonec2-use-private-address=false",
"amazonec2-request-spot-instance=true",
"amazonec2-security-group=",
"amazonec2-tags=Environment,nonprod259196356,gitlab-runner-parent-id,i-0b9646157e05a5e79",
"amazonec2-use-ebs-optimized-instance=true",
"amazonec2-monitoring=false",
"amazonec2-iam-instance-profile=",
"amazonec2-root-size=8",
"amazonec2-volume-type=gp2",
"amazonec2-userdata=",
"amazonec2-ami="
,"amazonec2-metadata-token=required", "amazonec2-metadata-token-response-hop-limit=2",
]
MaxGrowthRate = 0 Compare the above to the config.toml for module version 6.5.2 # cat /etc/gitlab-runner/config.toml
concurrent = 10
check_interval = 3
sentry_dsn = ""
log_format = "json"
listen_address = ""
[[runners]]
name = "runner-nonprod259196356"
url = "https://gitlab.com"
clone_url = ""
token = "<token is populated>"
executor = "docker"
environment = []
pre_build_script = ""
post_build_script = ""
pre_clone_script = ""
request_concurrency = 1
output_limit = 4096
limit = 0
[runners.docker]
tls_verify = false
image = "docker:18.03.1-ce"
privileged = true
disable_cache = false
volumes = ["/cache"]
extra_hosts = []
shm_size = 0
pull_policy = ["always"]
runtime = ""
helper_image = ""
wait_for_services_timeout = 30
[runners.docker.tmpfs]
[runners.docker.services_tmpfs]
[runners.cache]
Type = "s3"
Shared = false
[runners.cache.s3]
AuthenticationType = "iam"
ServerAddress = "s3.amazonaws.com"
BucketName = "<bucket name>"
BucketLocation = "us-west-2"
Insecure = false
[runners.machine]
IdleCount = 0
IdleTime = 600
MachineDriver = "amazonec2"
MachineName = "nonprod259196356-%s"
MachineOptions = [
"amazonec2-instance-type=m5.large",
"amazonec2-region=us-west-2",
"amazonec2-zone=b",
"amazonec2-vpc-id=<vpc id>",
"amazonec2-subnet-id=<subnet id>",
"amazonec2-private-address-only=false",
"amazonec2-use-private-address=true",
"amazonec2-request-spot-instance=true",
"amazonec2-security-group=",
"amazonec2-tags=Environment,nonprod259196356,gitlab-runner-parent-id,i-0063aa2426d25b3f5",
"amazonec2-use-ebs-optimized-instance=true",
"amazonec2-monitoring=false",
"amazonec2-iam-instance-profile=",
"amazonec2-root-size=16",
"amazonec2-volume-type=gp2",
"amazonec2-userdata=",
"amazonec2-ami="
,"amazonec2-metadata-token=required", "amazonec2-metadata-token-response-hop-limit=2",
] |
@kayman-mk we are getting below errors when trying to execute the migration script.
runner.tf module calling file looks like below
would you be able to help us on this ? |
running this in macos |
Yeah, have hear about MacOS before. As far as I remember there is a comment somewhere. In case it doesnn't run on the machine, try starting an Alpine Linux and run the script inside. This should fix the problems. |
I get the following errors running on Alpine:
The original file has the following content: module "gitlab_runner" {
source = "cattle-ops/gitlab-runner/aws"
version = "6.5.2"
environment = "gitlab-runner04"
vpc_id = aws_vpc.default.id
subnet_id = element(aws_subnet.private.*.id, 3)
runner_gitlab_registration_config = {
registration_token = var.GITLAB_RUNNER_TOKEN
tag_list = "docker"
description = "gitlab-runner04"
locked_to_project = "false"
run_untagged = "false"
maximum_timeout = "3600"
}
tags = {
Environment = local.environment
Tool = "Terraform"
}
runner_instance = {
type = "t3a.micro"
docker_machine_type = "m6a.large"
collect_autoscaling_metrics = ["GroupDesiredCapacity", "GroupInServiceCapacity"]
name = "gitlab-runner04"
ssm_access = true
}
runner_gitlab = {
url = "https://<redacted>"
}
runner_worker = {
type = "docker+machine"
}
runner_networking = {
allow_incoming_ping_security_group_ids = [data.aws_security_group.default.id]
} And the generated file is: module "gitlab_runner" {
source = "cattle-ops/gitlab-runner/aws"
version = "6.5.2"
environment = "gitlab-runner04"
vpc_id = aws_vpc.default.id
subnet_id = element(aws_subnet.private.*.id, 3)
runner_gitlab_registration_config = {
registration_token = var.GITLAB_RUNNER_TOKEN
tag_list = "docker"
description = "gitlab-runner04"
locked_to_project = "false"
run_untagged = "false"
maximum_timeout = "3600"
}
tags = {
Environment = local.environment
Tool = "Terraform"
}
}
runner_instance = {
type = "t3a.micro"
docker_machine_type = "m6a.large"
collect_autoscaling_metrics = ["GroupDesiredCapacity", "GroupInServiceCapacity"]
name = "gitlab-runner04"
ssm_access = true
}
runner_gitlab = {
url = "https://<redacted>"
}
runner_worker = {
type = "docker+machine"
}
runner_networking = {
allow_incoming_ping_security_group_ids = [data.aws_security_group.default.id]
} |
I will add some information on how to migrate to the new version in the next days. Still some time to go and not 100% discussed internally.
ToDo:
Major Version 7
Main reasons
Feature added
idle_scale_factor
Migration
We know that this is a breaking change causing some pain, but we think it is worth it. We hope you agree. And to make the
transition as smooth as possible, we have added a migration script. It will cover almost all cases, but some minor rework might still be possible.
Steps to follow:
The text was updated successfully, but these errors were encountered: