Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added changes to enable tracing in lambdas. #3554

Merged
merged 39 commits into from
Nov 8, 2023

Conversation

GuptaNavdeep1983
Copy link
Contributor

@GuptaNavdeep1983 GuptaNavdeep1983 commented Oct 20, 2023

This PR addresses the need to enable tracing for the lambdas used in the runners architecture:

Highlights:

  • This feature enables the tracing in all lambdas which allow to debug/investigate any issues that arise out of day-to-day use of runners infrastructure.
  • If user decides to add all the features provisioned in the PR, user should be able to find the complete linked trace between the time a webhook is triggered with workflow job event to the API gateway endpoint to the execution of scale up lambda which creates a new runner to fulfill the need of creating a new runner and also find the relevant logs linked to the trace in AWS CloudWatch ServiceLens. As of result, user need not navigate to various log groups to find any issue in any given service.
  • Please find the X-ray costing in this link detailing the cost involved in enabling this feature.

Additions:

  • Provide an option to enable traces in EC2 bash script which allows to find and link any issues that may arise out of starting the runner and find this information linked in the trace created out of this feature.

Options:

Use Cloudwatch config agent which now supports to capture traces (link) can be used to capture traces and link them to the log groups.

modules/webhook/webhook.tf Outdated Show resolved Hide resolved
Copy link
Collaborator

@npalm npalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuptaNavdeep1983 nice work. Made some comments. If you like we can pair next week to get them resolved.

I also will try to run some tests later this week.

README.md Show resolved Hide resolved
lambdas/functions/control-plane/src/aws/runners.test.ts Outdated Show resolved Hide resolved
lambdas/functions/control-plane/src/lambda.ts Show resolved Hide resolved
lambdas/functions/control-plane/src/lambda.ts Show resolved Hide resolved
modules/runners/templates/start-runner.sh Outdated Show resolved Hide resolved
modules/runners/templates/start-runner.sh Outdated Show resolved Hide resolved
modules/runners/templates/start-runner.sh Show resolved Hide resolved
modules/webhook/variables.tf Show resolved Hide resolved
@GuptaNavdeep1983
Copy link
Contributor Author

@npalm are these comments blocker to the PR? Can we not address them separately?

README.md Show resolved Hide resolved
variables.tf Show resolved Hide resolved
variables.tf Show resolved Hide resolved
README.md Show resolved Hide resolved
Copy link
Collaborator

@npalm npalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Let's finish the PR in tomorrows call

@npalm
Copy link
Collaborator

npalm commented Nov 8, 2023

Notes about testing:

Testing deployment of main -> updateing to PR -> enable tracing

  • multi-runner example
  • ephemeral with pool

@GuptaNavdeep1983 GuptaNavdeep1983 merged commit 970e8a6 into main Nov 8, 2023
36 of 37 checks passed
@GuptaNavdeep1983 GuptaNavdeep1983 deleted the nav/enable-tracing branch November 8, 2023 16:31
GuptaNavdeep1983 pushed a commit that referenced this pull request Nov 8, 2023
🤖 I have created a release *beep* *boop*
---


##
[5.4.0](philips-labs/terraform-aws-github-runner@v5.3.0...v5.4.0)
(2023-11-08)


### Features

* added changes to enable tracing in lambdas.
([#3554](https://github.com/philips-labs/terraform-aws-github-runner/issues/3554))
([970e8a6](philips-labs/terraform-aws-github-runner@970e8a6))


### Bug Fixes

* **lambda:** Bump the aws group in /lambdas with 5 updates
([#3595](https://github.com/philips-labs/terraform-aws-github-runner/issues/3595))
([581a4bf](philips-labs/terraform-aws-github-runner@581a4bf))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: forest-releaser[bot] <80285352+forest-releaser[bot]@users.noreply.github.com>
npalm added a commit that referenced this pull request Aug 7, 2024
## Problem
Updating octkit auth app to 6.1.1. fails runtime. Not in our tests due
to most of the SDK layer is mocked.

Only information we catch is a generic Error.
```
 HttpError: response.json(...).catch is not a function
```

Via [issue](octokit/auth-app.js#634)
@wolfy1339 provides feedback that the cause could be related to
overriding the fetch in octokit. In PR #3554 where tracing was
introduced also the fetch command was overriden, this due the fact the
node fetch was not working properly for tracing HTTP calls, which are in
our case the cals to GitHub. This problem seems now to be fixed in the
[AWS xray
sdk](https://github.com/aws/aws-xray-sdk-node/blob/master/CHANGELOG.md#370).

## Solution
The solution is quite simple. Just remove axios as fetch implementation
(override) and use the default provide by octokit.


## Testing
The problem was not detected in the unit tests. Reason is that the unit
test mocking the octokit SDK. In case of an ovrride we have doen. We
hsould have added a test using nock to mock the HTTP calls and not the
full SDK to catch problems like this in the unit test. By removing the
override, we can relay again on the test efforts done by octokit.

## References
- fix #3966
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants