Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added changes to enable tracing in lambdas. #3554

Merged
merged 39 commits into from
Nov 8, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
6eabc8a
feat: added changes to enable tracing in lambdas.
GuptaNavdeep1983 Oct 20, 2023
9060e15
docs: auto update terraform docs
github-actions[bot] Oct 20, 2023
864e5ba
Merge branch 'main' into nav/enable-tracing
GuptaNavdeep1983 Oct 20, 2023
dad0312
fix: missed this file.
GuptaNavdeep1983 Oct 20, 2023
91abc18
Merge branch 'nav/enable-tracing' of github.com:philips-labs/terrafor…
GuptaNavdeep1983 Oct 20, 2023
85ae608
fix: multi runners.
GuptaNavdeep1983 Oct 21, 2023
77a0aa6
docs: auto update terraform docs
github-actions[bot] Oct 21, 2023
afdd086
fix: multi runner.
GuptaNavdeep1983 Oct 21, 2023
8138f8c
Merge branch 'nav/enable-tracing' of github.com:philips-labs/terrafor…
GuptaNavdeep1983 Oct 21, 2023
dcd9611
fix: default.
GuptaNavdeep1983 Oct 21, 2023
751443d
docs: auto update terraform docs
github-actions[bot] Oct 21, 2023
a08798d
fix: more changes.
GuptaNavdeep1983 Oct 25, 2023
926e345
docs: auto update terraform docs
github-actions[bot] Oct 25, 2023
856f7ef
fix: added tracing for github apis.
GuptaNavdeep1983 Oct 28, 2023
e4f62f2
Merge branch 'nav/enable-tracing' of github.com:philips-labs/terrafor…
GuptaNavdeep1983 Oct 28, 2023
9a306b1
docs: auto update terraform docs
github-actions[bot] Oct 28, 2023
0c8a9b1
fix: more changes.
GuptaNavdeep1983 Oct 30, 2023
696cc6d
fix: start script.
GuptaNavdeep1983 Oct 30, 2023
a5caa41
fix: added tracing config section.
GuptaNavdeep1983 Oct 31, 2023
1fb8f9e
Merge branch 'main' into nav/enable-tracing
GuptaNavdeep1983 Oct 31, 2023
ab0a1c8
docs: auto update terraform docs
github-actions[bot] Oct 31, 2023
0aab19e
fix: comments.
GuptaNavdeep1983 Oct 31, 2023
115eb42
Merge branch 'nav/enable-tracing' of github.com:philips-labs/terrafor…
GuptaNavdeep1983 Oct 31, 2023
190ff01
docs: auto update terraform docs
github-actions[bot] Oct 31, 2023
b23df30
fix: ami housekeeper.
GuptaNavdeep1983 Oct 31, 2023
3fa0e8e
docs: auto update terraform docs
github-actions[bot] Oct 31, 2023
b1e59e9
fix: ssm housekeeper.
GuptaNavdeep1983 Oct 31, 2023
93f6adb
Merge branch 'nav/enable-tracing' of github.com:philips-labs/terrafor…
GuptaNavdeep1983 Oct 31, 2023
c88e508
fix: tests.
GuptaNavdeep1983 Oct 31, 2023
fe3342d
comments.
GuptaNavdeep1983 Oct 31, 2023
930ed0d
fix: added comment.
GuptaNavdeep1983 Nov 2, 2023
f165fd0
Merge branch 'main' into nav/enable-tracing
npalm Nov 3, 2023
1677f9a
fix: comments.
GuptaNavdeep1983 Nov 6, 2023
916b2c8
Merge branch 'nav/enable-tracing' of github.com:philips-labs/terrafor…
GuptaNavdeep1983 Nov 6, 2023
a6d6df9
fix: comments.
GuptaNavdeep1983 Nov 6, 2023
d7a5319
docs: auto update terraform docs
github-actions[bot] Nov 6, 2023
ed7f05b
fix: comments.
GuptaNavdeep1983 Nov 7, 2023
e9ffa1a
Merge branch 'main' into nav/enable-tracing
npalm Nov 8, 2023
1ed4561
Merge branch 'main' into nav/enable-tracing
GuptaNavdeep1983 Nov 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions README.md
npalm marked this conversation as resolved.
Show resolved Hide resolved
GuptaNavdeep1983 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ This [Terraform](https://www.terraform.io/) module creates the required infrastr
- [Examples](#examples)
- [Sub modules](#sub-modules)
- [Logging](#logging)
- [Tracing](#tracing)
- [Debugging](#debugging)
- [Security Considerations](#security-considerations)
- [Requirements](#requirements)
Expand Down Expand Up @@ -427,6 +428,16 @@ An example log message of the scale-up function:
}
}
```
## Tracing
For the distributed architecture of this application it can be difficult to troubleshoot this application.
We support the option to enable tracing for all the lambda functions created by this application. To enable tracing user can simply provide the `tracing_config` option inside the root module or inner modules.

This tracing config generates timelines for following events:
GuptaNavdeep1983 marked this conversation as resolved.
Show resolved Hide resolved
- Basic lifecycle of lambda function
- Traces for Github API calls (can be configured by capture_http_requests).
- Traces for all AWS SDK calls



## Debugging

Expand Down Expand Up @@ -543,6 +554,7 @@ We welcome any improvement to the standard module to make the default as secure
| <a name="input_lambda_s3_bucket"></a> [lambda\_s3\_bucket](#input\_lambda\_s3\_bucket) | S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly. | `string` | `null` | no |
| <a name="input_lambda_security_group_ids"></a> [lambda\_security\_group\_ids](#input\_lambda\_security\_group\_ids) | List of security group IDs associated with the Lambda function. | `list(string)` | `[]` | no |
| <a name="input_lambda_subnet_ids"></a> [lambda\_subnet\_ids](#input\_lambda\_subnet\_ids) | List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`. | `list(string)` | `[]` | no |
| <a name="input_lambda_tracing_mode"></a> [lambda\_tracing\_mode](#input\_lambda\_tracing\_mode) | DEPRECATED: Replaced by `tracing_config`. | `string` | `null` | no |
| <a name="input_log_level"></a> [log\_level](#input\_log\_level) | Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'. | `string` | `"info"` | no |
| <a name="input_logging_kms_key_id"></a> [logging\_kms\_key\_id](#input\_logging\_kms\_key\_id) | Specifies the kms key id to encrypt the logs with. | `string` | `null` | no |
| <a name="input_logging_retention_in_days"></a> [logging\_retention\_in\_days](#input\_logging\_retention\_in\_days) | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | `number` | `180` | no |
Expand Down
2 changes: 1 addition & 1 deletion lambdas/functions/control-plane/src/aws/runners.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,5 +39,5 @@ export interface RunnerInputParameters {
};
numberOfRunners?: number;
amiIdSsmParameterName?: string;
runnerTracingEnabled?: boolean;
tracingEnabled?: boolean;
}
12 changes: 6 additions & 6 deletions lambdas/functions/control-plane/src/aws/runners.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -240,10 +240,10 @@
it('calls create fleet of 1 instance with runner tracing enabled', async () => {
tracer.getRootXrayTraceId = jest.fn().mockReturnValue('123');

await createRunner(createRunnerConfig({ ...defaultRunnerConfig, runnerTracingEnabled: true }));
await createRunner(createRunnerConfig({ ...defaultRunnerConfig, tracingEnabled: true }));

expect(mockEC2Client).toHaveReceivedCommandWith(CreateFleetCommand, {
...expectedCreateFleetRequest({ ...defaultExpectedFleetRequestValues, runnerTracingEnabled: true }),
...expectedCreateFleetRequest({ ...defaultExpectedFleetRequestValues, tracingEnabled: true }),
});
});
});
Expand Down Expand Up @@ -360,7 +360,7 @@
allocationStrategy: SpotAllocationStrategy;
maxSpotPrice?: string;
amiIdSsmParameterName?: string;
runnerTracingEnabled?: boolean;
tracingEnabled?: boolean;
}

function createRunnerConfig(runnerConfig: RunnerConfig): RunnerInputParameters {
Expand All @@ -377,7 +377,7 @@
},
subnets: ['subnet-123', 'subnet-456'],
amiIdSsmParameterName: runnerConfig.amiIdSsmParameterName,
runnerTracingEnabled: runnerConfig.runnerTracingEnabled,
tracingEnabled: runnerConfig.tracingEnabled,
};
}

Expand All @@ -388,7 +388,7 @@
maxSpotPrice?: string;
totalTargetCapacity: number;
imageId?: string;
runnerTracingEnabled?: boolean;
tracingEnabled?: boolean;
}

function expectedCreateFleetRequest(expectedValues: ExpectedFleetRequestValues): CreateFleetCommandInput {
Expand All @@ -398,10 +398,10 @@
{ Key: 'ghr:Type', Value: expectedValues.type },
{ Key: 'ghr:Owner', Value: REPO_NAME },
];
if (expectedValues.runnerTracingEnabled) {
if (expectedValues.tracingEnabled) {
const traceId = tracer.getRootXrayTraceId();
tags.push({ Key: 'ghr:trace_id', Value: traceId! });
}

Check warning on line 404 in lambdas/functions/control-plane/src/aws/runners.test.ts

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

❌ New issue: Complex Method

expectedCreateFleetRequest has a cyclomatic complexity of 9, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
const request: CreateFleetCommandInput = {
LaunchTemplateConfigs: [
{
Expand Down
2 changes: 1 addition & 1 deletion lambdas/functions/control-plane/src/aws/runners.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import {

Check notice on line 1 in lambdas/functions/control-plane/src/aws/runners.ts

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

ℹ Getting worse: Overall Code Complexity

The mean cyclomatic complexity increases from 6.00 to 6.13, threshold = 4. This file has many conditional statements (e.g. if, for, while) across its implementation, leading to lower code health. Avoid adding more conditionals.
CreateFleetCommand,
CreateFleetResult,
DescribeInstancesCommand,
Expand Down Expand Up @@ -127,38 +127,38 @@
},
});

const ec2Client = getTracedAWSV3Client(new EC2Client({ region: process.env.AWS_REGION }));

let amiIdOverride = undefined;

if (runnerParameters.amiIdSsmParameterName) {
try {
amiIdOverride = await getParameter(runnerParameters.amiIdSsmParameterName);
logger.debug(`AMI override SSM parameter (${runnerParameters.amiIdSsmParameterName}) set to: ${amiIdOverride}`);
} catch (e) {
logger.error(
`Failed to lookup runner AMI ID from SSM parameter: ${runnerParameters.amiIdSsmParameterName}. ` +
'Please ensure that the given parameter exists on this region and contains a valid runner AMI ID',
{ error: e },
);
throw e;
}
}

const numberOfRunners = runnerParameters.numberOfRunners ? runnerParameters.numberOfRunners : 1;

const tags = [
{ Key: 'ghr:Application', Value: 'github-action-runner' },
{ Key: 'ghr:created_by', Value: numberOfRunners === 1 ? 'scale-up-lambda' : 'pool-lambda' },
{ Key: 'ghr:Type', Value: runnerParameters.runnerType },
{ Key: 'ghr:Owner', Value: runnerParameters.runnerOwner },
];

if (runnerParameters.runnerTracingEnabled) {
if (runnerParameters.tracingEnabled) {
const traceId = tracer.getRootXrayTraceId();
tags.push({ Key: 'ghr:trace_id', Value: traceId! });
}

Check warning on line 161 in lambdas/functions/control-plane/src/aws/runners.ts

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

❌ Getting worse: Complex Method

createRunner increases in cyclomatic complexity from 19 to 20, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
let fleet: CreateFleetResult;
try {
// see for spec https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateFleet.html
Expand Down
1 change: 1 addition & 0 deletions lambdas/functions/control-plane/src/lambda.ts
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ export const addMiddleware = () => {
middy(scaleUpHandler).use(handler);
middy(scaleDownHandler).use(handler);
middy(adjustPool).use(handler);
GuptaNavdeep1983 marked this conversation as resolved.
Show resolved Hide resolved
middy(ssmHousekeeper).use(handler);
};
addMiddleware();

Expand Down
4 changes: 2 additions & 2 deletions lambdas/functions/control-plane/src/pool/pool.ts
Original file line number Diff line number Diff line change
Expand Up @@ -36,90 +36,90 @@
const instanceAllocationStrategy = process.env.INSTANCE_ALLOCATION_STRATEGY || 'lowest-price'; // same as AWS default
const runnerOwner = process.env.RUNNER_OWNER;
const amiIdSsmParameterName = process.env.AMI_ID_SSM_PARAMETER_NAME;
const runnerTracingEnabled = yn(process.env.POWERTOOLS_TRACE_ENABLED, { default: false });
const tracingEnabled = yn(process.env.POWERTOOLS_TRACE_ENABLED, { default: false });

let ghesApiUrl = '';
if (ghesBaseUrl) {
ghesApiUrl = `${ghesBaseUrl}/api/v3`;
}

const installationId = await getInstallationId(ghesApiUrl, runnerOwner);
const ghAuth = await createGithubInstallationAuth(installationId, ghesApiUrl);
const githubInstallationClient = await createOctoClient(ghAuth.token, ghesApiUrl);

// Look up the runners registered in GitHub, could be also non managed by this module.
const runners = await githubInstallationClient.paginate(
githubInstallationClient.actions.listSelfHostedRunnersForOrg,
{
org: runnerOwner,
per_page: 100,
},
);
const runnerStatus = new Map<string, RunnerStatus>();
for (const runner of runners) {
runner.name = runnerNamePrefix ? runner.name.replace(runnerNamePrefix, '') : runner.name;
runnerStatus.set(runner.name, { busy: runner.busy, status: runner.status });
}

// Look up the managed ec2 runners in AWS, but running does not mean idle
const ec2runners = await listEC2Runners({
environment,
runnerOwner,
runnerType: 'Org',
statuses: ['running'],
});

// Runner should be considered idle if it is still booting, or is idle in GitHub
let numberOfRunnersInPool = 0;
for (const ec2Instance of ec2runners) {
if (
runnerStatus.get(ec2Instance.instanceId)?.busy === false &&
runnerStatus.get(ec2Instance.instanceId)?.status === 'online'
) {
numberOfRunnersInPool++;
logger.debug(`Runner ${ec2Instance.instanceId} is idle in GitHub and counted as part of the pool`);
} else if (runnerStatus.get(ec2Instance.instanceId) != null) {
logger.debug(`Runner ${ec2Instance.instanceId} is not idle in GitHub and NOT counted as part of the pool`);
} else if (!bootTimeExceeded(ec2Instance)) {
numberOfRunnersInPool++;
logger.info(`Runner ${ec2Instance.instanceId} is still booting and counted as part of the pool`);
} else {
logger.debug(
`Runner ${ec2Instance.instanceId} is not idle in GitHub nor booting and not counted as part of the pool`,
);
}
}

const topUp = event.poolSize - numberOfRunnersInPool;
if (topUp > 0) {
logger.info(`The pool will be topped up with ${topUp} runners.`);
await createRunners(
{
ephemeral,
enableJitConfig,
ghesBaseUrl,
runnerLabels,
runnerGroup,
runnerOwner,
runnerNamePrefix,
runnerType: 'Org',
disableAutoUpdate: disableAutoUpdate,
ssmTokenPath,
ssmConfigPath,
},
{
ec2instanceCriteria: {
instanceTypes,
targetCapacityType: instanceTargetTargetCapacityType,
maxSpotPrice: instanceMaxSpotPrice,
instanceAllocationStrategy: instanceAllocationStrategy,
},
environment,
launchTemplateName,
subnets,
numberOfRunners: topUp,
amiIdSsmParameterName,
runnerTracingEnabled,
tracingEnabled,

Check warning on line 122 in lambdas/functions/control-plane/src/pool/pool.ts

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

❌ Getting worse: Complex Method

adjust already has high cyclomatic complexity, and now it increases in Lines of Code from 100 to 102. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
},
githubInstallationClient,
);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ const EXPECTED_RUNNER_PARAMS: RunnerInputParameters = {
instanceAllocationStrategy: 'lowest-price',
},
subnets: ['subnet-123'],
runnerTracingEnabled: false,
tracingEnabled: false,
};
let expectedRunnerParams: RunnerInputParameters;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
ec2instanceCriteria: RunnerInputParameters['ec2instanceCriteria'];
numberOfRunners?: number;
amiIdSsmParameterName?: string;
runnerTracingEnabled?: boolean;
tracingEnabled?: boolean;
}

function generateRunnerServiceConfig(githubRunnerConfig: CreateGitHubRunnerConfig, token: string) {
Expand Down Expand Up @@ -236,77 +236,77 @@
const amiIdSsmParameterName = process.env.AMI_ID_SSM_PARAMETER_NAME;
const runnerNamePrefix = process.env.RUNNER_NAME_PREFIX || '';
const ssmConfigPath = process.env.SSM_CONFIG_PATH || '';
const runnerTracingEnabled = yn(process.env.POWERTOOLS_TRACE_ENABLED, { default: false });
const tracingEnabled = yn(process.env.POWERTOOLS_TRACE_ENABLED, { default: false });

if (ephemeralEnabled && payload.eventType !== 'workflow_job') {
logger.warn(`${payload.eventType} event is not supported in combination with ephemeral runners.`);
throw Error(
`The event type ${payload.eventType} is not supported in combination with ephemeral runners.` +
`Please ensure you have enabled workflow_job events.`,
);
}
const ephemeral = ephemeralEnabled && payload.eventType === 'workflow_job';
const runnerType = enableOrgLevel ? 'Org' : 'Repo';
const runnerOwner = enableOrgLevel ? payload.repositoryOwner : `${payload.repositoryOwner}/${payload.repositoryName}`;

addPersistentContextToChildLogger({
runner: {
type: runnerType,
owner: runnerOwner,
},
github: {
event: payload.eventType,
workflow_job_id: payload.id.toString(),
},
});

logger.info(`Received event`);

let ghesApiUrl = '';
if (ghesBaseUrl) {
ghesApiUrl = `${ghesBaseUrl}/api/v3`;
}

const installationId = await getInstallationId(ghesApiUrl, enableOrgLevel, payload);
const ghAuth = await createGithubInstallationAuth(installationId, ghesApiUrl);
const githubInstallationClient = await createOctoClient(ghAuth.token, ghesApiUrl);
if (!enableJobQueuedCheck || (await isJobQueued(githubInstallationClient, payload))) {
const currentRunners = await listEC2Runners({
environment,
runnerType,
runnerOwner,
});
logger.info(`Current runners: ${currentRunners.length} of ${maximumRunners}`);

if (currentRunners.length < maximumRunners) {
logger.info(`Attempting to launch a new runner`);

await createRunners(
{
ephemeral,
enableJitConfig,
ghesBaseUrl,
runnerLabels,
runnerGroup,
runnerNamePrefix,
runnerOwner,
runnerType,
disableAutoUpdate,
ssmTokenPath,
ssmConfigPath,
},
{
ec2instanceCriteria: {
instanceTypes,
targetCapacityType: instanceTargetTargetCapacityType,
maxSpotPrice: instanceMaxSpotPrice,
instanceAllocationStrategy: instanceAllocationStrategy,
},
environment,
launchTemplateName,
subnets,
amiIdSsmParameterName,
runnerTracingEnabled,
tracingEnabled,

Check notice on line 309 in lambdas/functions/control-plane/src/scale-runners/scale-up.ts

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

ℹ Getting worse: Complex Method

scaleUp already has high cyclomatic complexity, and now it increases in Lines of Code from 96 to 98. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
},
githubInstallationClient,
);
Expand Down
9 changes: 1 addition & 8 deletions modules/runners/templates/start-runner.sh
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,6 @@ cleanup() {

if [ "$exit_code" -ne 0 ]; then
echo "ERROR: runner-start-failed with exit code $exit_code occurred on $error_location"
# Create a CloudWatch metric for error
aws cloudwatch put-metric-data \
--metric-name "RunnerInstanceUnhealthy" \
--namespace "Github Runners metrics" \
--value 1 \
--region "$region" \
--dimensions "InstanceId=$instance_id"
create_xray_error_segment "$SEGMENT" "runner-start-failed with exit code $exit_code occurred on $error_location - $error_lineno"
GuptaNavdeep1983 marked this conversation as resolved.
Show resolved Hide resolved
else
create_xray_success_segment "$SEGMENT"
Expand Down Expand Up @@ -160,7 +153,7 @@ if [[ "$xray_trace_id" != "" ]]; then
# run xray service
curl https://s3.us-east-2.amazonaws.com/aws-xray-assets.us-east-2/xray-daemon/aws-xray-daemon-linux-3.x.zip -o aws-xray-daemon-linux-3.x.zip
unzip aws-xray-daemon-linux-3.x.zip -d aws-xray-daemon-linux-3.x
sudo chmod +x ./aws-xray-daemon-linux-3.x/xray
chmod +x ./aws-xray-daemon-linux-3.x/xray
./aws-xray-daemon-linux-3.x/xray -o -n "$region" &


Expand Down
10 changes: 10 additions & 0 deletions variables.deprecated.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
variable "lambda_tracing_mode" {
description = "DEPRECATED: Replaced by `tracing_config`."
type = string
default = null

validation {
condition = anytrue([var.lambda_tracing_mode == null])
error_message = "DEPRECATED, Replaced by `tracing_config`."
}
}
Loading