-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement AWS extension as an actor #1277
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can not review the rust code so my review is more from a UX/product angle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial review with merely technicalities. Review of the actor follows momentarily.
CHANGELOG.md
Outdated
@@ -7,6 +7,7 @@ | |||
**Features**: | |||
|
|||
- Session metrics extraction: Count crashed+abnormal towards errored_preaggr. ([#1274](https://github.com/getsentry/relay/pull/1274)) | |||
- Implement AWS extension as an actor. ([#1277](https://github.com/getsentry/relay/pull/1277)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this a slightly more descriptive changelog entry around what the feature enables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
relay-aws-extension/mock/run.sh
Outdated
pip install -r requirements.txt | ||
|
||
# Start mock API server | ||
FLASK_APP="mock-aws-lambda-extensions-api" flask run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're already running flask apps in integration tests (see tests/integration/*
), what if you move this flask app as fixture there, and then create a new integration test file for this?
relay-server/src/service.rs
Outdated
@@ -143,6 +145,12 @@ impl ServiceState { | |||
let outcome_aggregator = OutcomeAggregator::new(&config, outcome_producer.recipient()); | |||
registry.set(outcome_aggregator.start()); | |||
|
|||
if let Ok(aws_runtime_api) = env::var("AWS_LAMBDA_RUNTIME_API") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Move this into a (documented) function in the relay-aws-extension
crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For as long as we're just experimenting (if not for longer), I would strongly prefer a config flag for this in addition to the envvar. There is no telling what would happen if an existing, working customer relay is updated to include this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a --aws-api-url=<URL>
cli arg
impl AwsExtension { | ||
/// Creates a new `AwsExtension` instance. | ||
pub fn new(aws_runtime_api: String) -> Result<Self, AwsExtensionError> { | ||
let base_url = format!("http://{}/2020-01-01/extension", aws_runtime_api); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which additional validation do we need to put here to ensure this doesn't become an attack vector?
.post(&url) | ||
.header(EXTENSION_NAME_HEADER, EXTENSION_NAME) | ||
.json(&map) | ||
.send() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're using the blocking client here, which is not ideal since it blocks the arbiter this actor is running in. You have two options:
- Use the non-blocking client in a tokio 1 runtime. We will soon clean this all up to make it easier, but in the meanwhile, you'll want to implement it like here (which means, you can use
.await
to chain several calls):relay/relay-server/src/actors/upstream.rs
Lines 581 to 587 in ef952cb
self.reqwest_runtime.spawn(async move { let res = client .execute(client_request.0) .await .map_err(UpstreamRequestError::SendFailed); tx.send(res) }); - Document that this actor must run in its completely own thread and place a big TODO on top of this to clean it up as soon as we've migrated to tokio 1. However, that also means you cannot receive messages such as the shutdown while you're waiting for a response, so I'm not sure if that's even a feasible option.
Given the similarity to UpstreamRelay
, I'd prefer option 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also prefer option 1. Note that with option 1, the entire actor can be one big async block containing a loop {}
(no need for context.notify)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be useful to keep the outermost loop in a notify
cycle so we can break it more easily from the outside, although that doesn't seem required for AWS environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done now @jan-auer @untitaker, if you guys can take a look that the approach is ok?
} | ||
_ => { | ||
relay_log::debug!("Next event request failed"); | ||
context.notify(NextEvent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will keep hammering the API endpoint if it is not available. Is that desired behavior? If so I would ask you to hide this actor behind a config option, today it turns on based on whether some envvar is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I need to distinguish between 404s and other failures here.
.post(&url) | ||
.header(EXTENSION_NAME_HEADER, EXTENSION_NAME) | ||
.json(&map) | ||
.send() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also prefer option 1. Note that with option 1, the entire actor can be one big async block containing a loop {}
(no need for context.notify)
Co-authored-by: Jan Michael Auer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for now. Let's just make sure we can't break existing customer relays by accident.
relay-server/src/service.rs
Outdated
@@ -143,6 +145,12 @@ impl ServiceState { | |||
let outcome_aggregator = OutcomeAggregator::new(&config, outcome_producer.recipient()); | |||
registry.set(outcome_aggregator.start()); | |||
|
|||
if let Ok(aws_runtime_api) = env::var("AWS_LAMBDA_RUNTIME_API") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For as long as we're just experimenting (if not for longer), I would strongly prefer a config flag for this in addition to the envvar. There is no telling what would happen if an existing, working customer relay is updated to include this code.
CHANGELOG.md
Outdated
@@ -7,6 +7,7 @@ | |||
**Features**: | |||
|
|||
- Session metrics extraction: Count crashed+abnormal towards errored_preaggr. ([#1274](https://github.com/getsentry/relay/pull/1274)) | |||
- Implement AWS extension as an actor. ([#1277](https://github.com/getsentry/relay/pull/1277)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
41a606d
to
03268b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the effort on this, let's get this merged.
* master: ref(metrics): Stop logging relative bucket size (#1302) fix(metrics): Rename misnamed aggregator option (#1298) fix(server): Avoid a panic in the Sentry middleware (#1301) build: Update dependencies with known vulnerabilities (#1294) fix(metrics): Stop logging statsd metric per project key (#1295) feat(metrics): Limits on bucketing cost in aggregator [INGEST-1132] (#1287) fix(metrics): Track memory footprint more accurately (#1288) build(deps): Bump dependencies (#1293) feat(aws): Add relay-aws-extension crate which implements AWS extension as an actor (#1277) fix(meta): Update codeowners for the release actions (#1286) feat(metrics): Track memory footprint of metrics buckets (#1284)
AwsExtension
that starts in anArbiter
conditionally on the--aws-runtime-api
cli option--aws-upstream-dsn
that sets theUpstreamDescriptor
from the explicitly passed in DSNreqwest
client with 0 timeout in a separatetokio::runtime
with 1 worker thread internally to register and poll for next events continuouslyhttps://docs.aws.amazon.com/lambda/latest/dg/runtimes-extensions-api.html#runtimes-extensions-registration-api