-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Leader async aggregation. #3564
Conversation
Part of #3436. |
d5191bc
to
f9f6a25
Compare
This change includes unit tests, but no integration tests -- those will need to come with the Helper async aggregation implementation, as without it we do not have anything to integration test against. A few implementation notes: * I renamed the report aggregation states to better match their functionality (IMO). * If the Helper does not provide a retry-after header, the Leader will poll each "processing" aggregation job (at most) once per minute. * The retry-after header can specify either a number of seconds, or a specific date. Currently, we only support receiving a number of seconds.
f9f6a25
to
52fd456
Compare
.headers() | ||
.get(RETRY_AFTER) | ||
.map(parse_retry_after) | ||
.transpose()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably substitute a default poll delay and log a warning rather than fail if we can't parse the Retry-After header. (here and in the other methods)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree here: I think if we get bad input, we should fail loudly rather than continuing on "incorrectly."
report_aggregation_success_counter: self.aggregation_success_counter.clone(), | ||
aggregate_step_failure_counter: self.aggregate_step_failure_counter.clone(), | ||
aggregated_report_share_dimension_histogram: self | ||
.aggregated_report_share_dimension_histogram | ||
.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rethink Prometheus metrics in the context of AggregationJobResp::Processing
responses as well. I haven't looked at the metrics in detail yet, but we may want additional counters, or new labels, in order to distinguish forward progress versus mere polling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wise, but these metrics only concern themselves with completed report aggregations & failed steps, both of which are agnostic to the aggregation mode. I think we can get pretty far by looking to the request verb & the response code [edit: in our standard HTTP metrics], though these metrics won't give full clarity (i.e. we can differentiate between a Leader's attempt to poll vs an attempt to send a new step via the HTTP verb; but I don't think we can tell whether the Helper responded with a processing
or finished
response with these metrics). However, I strongly suspect that our current integrations will always use exactly one of asynchronous or synchronous responses, so this may be moot at the moment, at least for us.
314c2eb
to
8dfe337
Compare
This change includes unit tests, but no integration tests -- those will need to come with the Helper async aggregation implementation, as without it we do not have anything to integration test against.
A few implementation notes: