Skip to content
This repository has been archived by the owner on Jan 10, 2025. It is now read-only.

Commit

Permalink
feat(runners): add configurable eviction strategy to idle config (#3375)
Browse files Browse the repository at this point in the history
We do some on-instance caching so when we scale down we'd prefer to keep
the older instances around instead of the new ones (because they will
have a hotter cache). This adds a configurable setting to the idleConfig
to pick a sorting strategy. Never contributed to this repo, so please
tell me if I'm doing something wrong!

---------

Co-authored-by: Niek Palm <[email protected]>
  • Loading branch information
maschwenk and npalm authored Aug 8, 2023
1 parent 8b8116b commit 896f473
Show file tree
Hide file tree
Showing 10 changed files with 103 additions and 40 deletions.
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,11 +292,15 @@ The pool is NOT enabled by default and can be enabled by setting at least one ob

The module will scale down to zero runners by default. By specifying a `idle_config` config, idle runners can be kept active. The scale down lambda checks if any of the cron expressions matches the current time with a margin of 5 seconds. When there is a match, the number of runners specified in the idle config will be kept active. In case multiple cron expressions matches, only the first one is taken into account. Below is an idle configuration for keeping runners active from 9:00am to 5:59pm on working days. The [cron expression generator by Cronhub](https://crontab.cronhub.io/) is a great resource to set up your idle config.

By default, the oldest instances are evicted. This helps keep your environment up-to-date and reduce problems like running out of disk space or RAM. Alternatively, if your older instances have a long-living cache, you can override the `evictionStrategy` to `newest_first` to evict the newest instances first instead.

```hcl
idle_config = [{
cron = "* * 9-17 * * 1-5"
timeZone = "Europe/Amsterdam"
idleCount = 2
cron = "* * 9-17 * * 1-5"
timeZone = "Europe/Amsterdam"
idleCount = 2
# Defaults to 'oldest_first'
evictionStrategy = "oldest_first"
}]
```

Expand Down Expand Up @@ -521,7 +525,7 @@ We welcome any improvement to the standard module to make the default as secure
| <a name="input_ghes_ssl_verify"></a> [ghes\_ssl\_verify](#input\_ghes\_ssl\_verify) | GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure). | `bool` | `true` | no |
| <a name="input_ghes_url"></a> [ghes\_url](#input\_ghes\_url) | GitHub Enterprise Server URL. Example: https://github.internal.co - DO NOT SET IF USING PUBLIC GITHUB | `string` | `null` | no |
| <a name="input_github_app"></a> [github\_app](#input\_github\_app) | GitHub app parameters, see your github app. Ensure the key is the base64-encoded `.pem` file (the output of `base64 app.private-key.pem`, not the content of `private-key.pem`). | <pre>object({<br> key_base64 = string<br> id = string<br> webhook_secret = string<br> })</pre> | n/a | yes |
| <a name="input_idle_config"></a> [idle\_config](#input\_idle\_config) | List of time periods, defined as a cron expression, to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle. | <pre>list(object({<br> cron = string<br> timeZone = string<br> idleCount = number<br> }))</pre> | `[]` | no |
| <a name="input_idle_config"></a> [idle\_config](#input\_idle\_config) | List of time periods, defined as a cron expression, to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle. | <pre>list(object({<br> cron = string<br> timeZone = string<br> idleCount = number<br> evictionStrategy = optional(string, "oldest_first")<br> }))</pre> | `[]` | no |
| <a name="input_instance_allocation_strategy"></a> [instance\_allocation\_strategy](#input\_instance\_allocation\_strategy) | The allocation strategy for spot instances. AWS recommends using `price-capacity-optimized` however the AWS default is `lowest-price`. | `string` | `"lowest-price"` | no |
| <a name="input_instance_max_spot_price"></a> [instance\_max\_spot\_price](#input\_instance\_max\_spot\_price) | Max price price for spot instances per hour. This variable will be passed to the create fleet as max spot price for the fleet. | `string` | `null` | no |
| <a name="input_instance_profile_path"></a> [instance\_profile\_path](#input\_instance\_profile\_path) | The path that will be added to the instance\_profile, if not set the environment name will be used. | `string` | `null` | no |
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
import moment from 'moment-timezone';

import { ScalingDownConfigList, getIdleRunnerCount } from './scale-down-config';
import { EvictionStrategy, ScalingDownConfigList, getEvictionStrategy, getIdleRunnerCount } from './scale-down-config';

const DEFAULT_TIMEZONE = 'America/Los_Angeles';
const DEFAULT_IDLE_COUNT = 1;
const DEFAULT_EVICTION_STRATEGY: EvictionStrategy = 'oldest_first';
const now = moment.tz(new Date(), 'America/Los_Angeles');

function getConfig(cronTabs: string[]): ScalingDownConfigList {
function getConfig(
cronTabs: string[],
evictionStrategy: EvictionStrategy | undefined = undefined,
): ScalingDownConfigList {
return cronTabs.map((cron) => ({
cron: cron,
idleCount: DEFAULT_IDLE_COUNT,
timeZone: DEFAULT_TIMEZONE,
evictionStrategy,
}));
}

Expand All @@ -31,4 +36,21 @@ describe('scaleDownConfig', () => {
expect(getIdleRunnerCount(scaleDownConfig)).toEqual(DEFAULT_IDLE_COUNT);
});
});

describe('Determine eviction strategy.', () => {
it('Default eviction strategy', async () => {
const scaleDownConfig = getConfig(['* * * * * *']);
expect(getEvictionStrategy(scaleDownConfig)).toEqual('oldest_first');
});

it('Overriding eviction strategy to newest_first', async () => {
const scaleDownConfig = getConfig(['* * * * * *'], 'newest_first');
expect(getEvictionStrategy(scaleDownConfig)).toEqual('newest_first');
});

it('No active cron configuration', async () => {
const scaleDownConfig = getConfig(['* * * * * ' + ((now.day() + 1) % 7)]);
expect(getEvictionStrategy(scaleDownConfig)).toEqual(DEFAULT_EVICTION_STRATEGY);
});
});
});
Original file line number Diff line number Diff line change
@@ -1,13 +1,18 @@
import { createChildLogger } from '@terraform-aws-github-runner/aws-powertools-util';
import parser from 'cron-parser';
import moment from 'moment';

export type ScalingDownConfigList = ScalingDownConfig[];
export type EvictionStrategy = 'newest_first' | 'oldest_first';
export interface ScalingDownConfig {
cron: string;
idleCount: number;
timeZone: string;
evictionStrategy?: EvictionStrategy;
}

const logger = createChildLogger('scale-down-config.ts');

function inPeriod(period: ScalingDownConfig): boolean {
const now = moment(new Date());
const expr = parser.parseExpression(period.cron, {
Expand All @@ -25,3 +30,14 @@ export function getIdleRunnerCount(scalingDownConfigs: ScalingDownConfigList): n
}
return 0;
}

export function getEvictionStrategy(scalingDownConfigs: ScalingDownConfigList): EvictionStrategy {
for (const scalingDownConfig of scalingDownConfigs) {
if (inPeriod(scalingDownConfig)) {
const evictionStrategy = scalingDownConfig.evictionStrategy ?? 'oldest_first';
logger.debug(`Using evictionStrategy '${evictionStrategy}' for period ${scalingDownConfig.cron}`);
return evictionStrategy;
}
}
return 'oldest_first';
}
Original file line number Diff line number Diff line change
Expand Up @@ -394,14 +394,13 @@ describe('scaleDown', () => {
});

describe('With idle config', () => {
const defaultConfig = {
idleCount: 3,
cron: '* * * * * *',
timeZone: 'Europe/Amsterdam',
};
beforeEach(() => {
process.env.SCALE_DOWN_CONFIG = JSON.stringify([
{
idleCount: 3,
cron: '* * * * * *',
timeZone: 'Europe/Amsterdam',
},
]);
process.env.SCALE_DOWN_CONFIG = JSON.stringify([defaultConfig]);
});

it('Terminates 1 runner owned by orgs', async () => {
Expand Down Expand Up @@ -431,6 +430,19 @@ describe('scaleDown', () => {
expect(mockOctokit.apps.getRepoInstallation).toBeCalled();
expect(terminateRunner).not.toBeCalled();
});

describe('With newest_first eviction strategy', () => {
beforeEach(() => {
process.env.SCALE_DOWN_CONFIG = JSON.stringify([{ ...defaultConfig, evictionStrategy: 'newest_first' }]);
});

it('Terminates the newest org', async () => {
mockListRunners.mockResolvedValue(RUNNERS_ORG_WITH_AUTO_SCALING_CONFIG);
await scaleDown();
expect(terminateRunner).toBeCalledTimes(1);
expect(terminateRunner).toHaveBeenCalledWith('i-idle-102');
});
});
});

it('No instances terminates when delete runner in github results in a non 204 status.', async () => {
Expand Down
38 changes: 22 additions & 16 deletions lambdas/functions/control-plane/src/scale-runners/scale-down.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import { createGithubAppAuth, createGithubInstallationAuth, createOctoClient } f
import { bootTimeExceeded, listEC2Runners, terminateRunner } from './../aws/runners';
import { RunnerInfo, RunnerList } from './../aws/runners.d';
import { GhRunners, githubCache } from './cache';
import { ScalingDownConfig, getIdleRunnerCount } from './scale-down-config';
import { ScalingDownConfig, getEvictionStrategy, getIdleRunnerCount } from './scale-down-config';

const logger = createChildLogger('scale-down');

Expand Down Expand Up @@ -148,10 +148,13 @@ async function evaluateAndRemoveRunners(
scaleDownConfigs: ScalingDownConfig[],
): Promise<void> {
let idleCounter = getIdleRunnerCount(scaleDownConfigs);
const evictionStrategy = getEvictionStrategy(scaleDownConfigs);
const ownerTags = new Set(ec2Runners.map((runner) => runner.owner));

for (const ownerTag of ownerTags) {
const ec2RunnersFiltered = ec2Runners.filter((runner) => runner.owner === ownerTag);
const ec2RunnersFiltered = ec2Runners
.filter((runner) => runner.owner === ownerTag)
.sort(evictionStrategy === 'oldest_first' ? oldestFirstStrategy : newestFirstStrategy);
logger.debug(`Found: '${ec2RunnersFiltered.length}' active GitHub runners with owner tag: '${ownerTag}'`);
for (const ec2Runner of ec2RunnersFiltered) {
const ghRunners = await listGitHubRunners(ec2Runner);
Expand Down Expand Up @@ -191,17 +194,21 @@ async function terminateOrphan(instanceId: string): Promise<void> {
}
}

async function listAndSortRunners(environment: string) {
return (
await listEC2Runners({
environment,
})
).sort((a, b): number => {
if (a.launchTime === undefined) return 1;
if (b.launchTime === undefined) return 1;
if (a.launchTime < b.launchTime) return 1;
if (a.launchTime > b.launchTime) return -1;
return 0;
function oldestFirstStrategy(a: RunnerInfo, b: RunnerInfo): number {
if (a.launchTime === undefined) return 1;
if (b.launchTime === undefined) return 1;
if (a.launchTime < b.launchTime) return 1;
if (a.launchTime > b.launchTime) return -1;
return 0;
}

function newestFirstStrategy(a: RunnerInfo, b: RunnerInfo): number {
return oldestFirstStrategy(a, b) * -1;
}

async function listRunners(environment: string) {
return await listEC2Runners({
environment,
});
}

Expand All @@ -214,8 +221,7 @@ export async function scaleDown(): Promise<void> {
const scaleDownConfigs = JSON.parse(process.env.SCALE_DOWN_CONFIG) as [ScalingDownConfig];
const environment = process.env.ENVIRONMENT;

// list and sort runners, newest first. This ensure we keep the newest runners longer.
const ec2Runners = await listAndSortRunners(environment);
const ec2Runners = await listRunners(environment);
const activeEc2RunnersCount = ec2Runners.length;
logger.info(`Found: '${activeEc2RunnersCount}' active GitHub EC2 runner instances before clean-up.`);

Expand All @@ -227,6 +233,6 @@ export async function scaleDown(): Promise<void> {
const runners = filterRunners(ec2Runners);
await evaluateAndRemoveRunners(runners, scaleDownConfigs);

const activeEc2RunnersCountAfter = (await listAndSortRunners(environment)).length;
const activeEc2RunnersCountAfter = (await listRunners(environment)).length;
logger.info(`Found: '${activeEc2RunnersCountAfter}' active GitHub EC2 runners instances after clean-up.`);
}
Loading

0 comments on commit 896f473

Please sign in to comment.