Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AWS extension as an actor #1277

Merged
merged 34 commits into from
Jun 7, 2022
Merged

Implement AWS extension as an actor #1277

merged 34 commits into from
Jun 7, 2022

Conversation

sl0thentr0py
Copy link
Member

@sl0thentr0py sl0thentr0py commented May 19, 2022

  • Implement an actor AwsExtension that starts in an Arbiter conditionally on the --aws-runtime-api cli option
  • Also add --aws-upstream-dsn that sets the UpstreamDescriptor from the explicitly passed in DSN
  • Uses a reqwest client with 0 timeout in a separate tokio::runtime with 1 worker thread internally to register and poll for next events continuously
    • note there is no sleep here because the requests to next_event will block till next invocation
    • the container also might get frozen in case of inactivity
  • A mock flask server for testing the lifecycle
  • Also note that we don't do anything really useful with the responses from AWS but the protocol is implemented fully and we could use some of this information in the future to add to the envelope/transaction

https://docs.aws.amazon.com/lambda/latest/dg/runtimes-extensions-api.html#runtimes-extensions-registration-api

@sl0thentr0py sl0thentr0py requested review from a team and antonpirker May 19, 2022 14:13
Copy link
Member

@antonpirker antonpirker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not review the rust code so my review is more from a UX/product angle.

relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
relay-aws-extension/src/aws_extension.rs Show resolved Hide resolved
Copy link
Member

@jan-auer jan-auer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review with merely technicalities. Review of the actor follows momentarily.

CHANGELOG.md Outdated
@@ -7,6 +7,7 @@
**Features**:

- Session metrics extraction: Count crashed+abnormal towards errored_preaggr. ([#1274](https://github.com/getsentry/relay/pull/1274))
- Implement AWS extension as an actor. ([#1277](https://github.com/getsentry/relay/pull/1277))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a slightly more descriptive changelog entry around what the feature enables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this

relay-aws-extension/Cargo.toml Show resolved Hide resolved
pip install -r requirements.txt

# Start mock API server
FLASK_APP="mock-aws-lambda-extensions-api" flask run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're already running flask apps in integration tests (see tests/integration/*), what if you move this flask app as fixture there, and then create a new integration test file for this?

relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
relay-server/Cargo.toml Outdated Show resolved Hide resolved
relay-aws-extension/src/lib.rs Show resolved Hide resolved
@@ -143,6 +145,12 @@ impl ServiceState {
let outcome_aggregator = OutcomeAggregator::new(&config, outcome_producer.recipient());
registry.set(outcome_aggregator.start());

if let Ok(aws_runtime_api) = env::var("AWS_LAMBDA_RUNTIME_API") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Move this into a (documented) function in the relay-aws-extension crate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For as long as we're just experimenting (if not for longer), I would strongly prefer a config flag for this in addition to the envvar. There is no telling what would happen if an existing, working customer relay is updated to include this code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a --aws-api-url=<URL> cli arg

relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
relay-aws-extension/src/aws_extension.rs Show resolved Hide resolved
relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
impl AwsExtension {
/// Creates a new `AwsExtension` instance.
pub fn new(aws_runtime_api: String) -> Result<Self, AwsExtensionError> {
let base_url = format!("http://{}/2020-01-01/extension", aws_runtime_api);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which additional validation do we need to put here to ensure this doesn't become an attack vector?

relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
.post(&url)
.header(EXTENSION_NAME_HEADER, EXTENSION_NAME)
.json(&map)
.send()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using the blocking client here, which is not ideal since it blocks the arbiter this actor is running in. You have two options:

  1. Use the non-blocking client in a tokio 1 runtime. We will soon clean this all up to make it easier, but in the meanwhile, you'll want to implement it like here (which means, you can use .await to chain several calls):
    self.reqwest_runtime.spawn(async move {
    let res = client
    .execute(client_request.0)
    .await
    .map_err(UpstreamRequestError::SendFailed);
    tx.send(res)
    });
  2. Document that this actor must run in its completely own thread and place a big TODO on top of this to clean it up as soon as we've migrated to tokio 1. However, that also means you cannot receive messages such as the shutdown while you're waiting for a response, so I'm not sure if that's even a feasible option.

Given the similarity to UpstreamRelay, I'd prefer option 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer option 1. Note that with option 1, the entire actor can be one big async block containing a loop {} (no need for context.notify)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful to keep the outermost loop in a notify cycle so we can break it more easily from the outside, although that doesn't seem required for AWS environments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done now @jan-auer @untitaker, if you guys can take a look that the approach is ok?

relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
relay-aws-extension/src/aws_extension.rs Outdated Show resolved Hide resolved
}
_ => {
relay_log::debug!("Next event request failed");
context.notify(NextEvent);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will keep hammering the API endpoint if it is not available. Is that desired behavior? If so I would ask you to hide this actor behind a config option, today it turns on based on whether some envvar is present.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I need to distinguish between 404s and other failures here.

.post(&url)
.header(EXTENSION_NAME_HEADER, EXTENSION_NAME)
.json(&map)
.send()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer option 1. Note that with option 1, the entire actor can be one big async block containing a loop {} (no need for context.notify)

Copy link
Member

@untitaker untitaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now. Let's just make sure we can't break existing customer relays by accident.

@@ -143,6 +145,12 @@ impl ServiceState {
let outcome_aggregator = OutcomeAggregator::new(&config, outcome_producer.recipient());
registry.set(outcome_aggregator.start());

if let Ok(aws_runtime_api) = env::var("AWS_LAMBDA_RUNTIME_API") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For as long as we're just experimenting (if not for longer), I would strongly prefer a config flag for this in addition to the envvar. There is no telling what would happen if an existing, working customer relay is updated to include this code.

CHANGELOG.md Outdated
@@ -7,6 +7,7 @@
**Features**:

- Session metrics extraction: Count crashed+abnormal towards errored_preaggr. ([#1274](https://github.com/getsentry/relay/pull/1274))
- Implement AWS extension as an actor. ([#1277](https://github.com/getsentry/relay/pull/1277))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this

Copy link
Member

@jan-auer jan-auer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the effort on this, let's get this merged.

@sl0thentr0py sl0thentr0py enabled auto-merge (squash) June 7, 2022 10:38
@sl0thentr0py sl0thentr0py merged commit 564b7e5 into master Jun 7, 2022
@sl0thentr0py sl0thentr0py deleted the neel/aws-ext-actor branch June 7, 2022 10:55
jan-auer added a commit that referenced this pull request Jun 9, 2022
* master:
  ref(metrics): Stop logging relative bucket size (#1302)
  fix(metrics): Rename misnamed aggregator option (#1298)
  fix(server): Avoid a panic in the Sentry middleware (#1301)
  build: Update dependencies with known vulnerabilities (#1294)
  fix(metrics): Stop logging statsd metric per project key (#1295)
  feat(metrics): Limits on bucketing cost in aggregator [INGEST-1132] (#1287)
  fix(metrics): Track memory footprint more accurately (#1288)
  build(deps): Bump dependencies (#1293)
  feat(aws): Add relay-aws-extension crate which implements AWS extension as an actor (#1277)
  fix(meta): Update codeowners for the release actions (#1286)
  feat(metrics): Track memory footprint of metrics buckets (#1284)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants