Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Leader async aggregation. #3564

Merged
merged 2 commits into from
Dec 13, 2024
Merged

Implement Leader async aggregation. #3564

merged 2 commits into from
Dec 13, 2024

Conversation

branlwyd
Copy link
Contributor

@branlwyd branlwyd commented Dec 9, 2024

This change includes unit tests, but no integration tests -- those will need to come with the Helper async aggregation implementation, as without it we do not have anything to integration test against.

A few implementation notes:

  • I renamed the report aggregation states to better match their functionality (IMO).
  • If the Helper does not provide a retry-after header, the Leader will poll each "processing" aggregation job (at most) once per minute.
  • The retry-after header can specify either a number of seconds, or a specific date. Currently, we only support receiving a number of seconds.

@branlwyd branlwyd added the allow-changed-migrations Override the ci-migrations check to allow migrations that have changed. label Dec 9, 2024
@branlwyd branlwyd requested a review from a team as a code owner December 9, 2024 22:31
@branlwyd
Copy link
Contributor Author

branlwyd commented Dec 9, 2024

Part of #3436.

@branlwyd branlwyd force-pushed the bran/async-aggregation branch from d5191bc to f9f6a25 Compare December 9, 2024 22:33
@branlwyd branlwyd mentioned this pull request Dec 9, 2024
20 tasks
This change includes unit tests, but no integration tests -- those will
need to come with the Helper async aggregation implementation, as
without it we do not have anything to integration test against.

A few implementation notes:
 * I renamed the report aggregation states to better match their
   functionality (IMO).
 * If the Helper does not provide a retry-after header, the Leader will
   poll each "processing" aggregation job (at most) once per minute.
 * The retry-after header can specify either a number of seconds, or a
   specific date. Currently, we only support receiving a number of
   seconds.
@branlwyd branlwyd force-pushed the bran/async-aggregation branch from f9f6a25 to 52fd456 Compare December 11, 2024 01:14
aggregator_core/src/datastore/models.rs Outdated Show resolved Hide resolved
aggregator/src/aggregator/aggregation_job_driver.rs Outdated Show resolved Hide resolved
.headers()
.get(RETRY_AFTER)
.map(parse_retry_after)
.transpose()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably substitute a default poll delay and log a warning rather than fail if we can't parse the Retry-After header. (here and in the other methods)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree here: I think if we get bad input, we should fail loudly rather than continuing on "incorrectly."

aggregator/src/aggregator/aggregation_job_driver.rs Outdated Show resolved Hide resolved
Comment on lines +985 to +989
report_aggregation_success_counter: self.aggregation_success_counter.clone(),
aggregate_step_failure_counter: self.aggregate_step_failure_counter.clone(),
aggregated_report_share_dimension_histogram: self
.aggregated_report_share_dimension_histogram
.clone(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rethink Prometheus metrics in the context of AggregationJobResp::Processing responses as well. I haven't looked at the metrics in detail yet, but we may want additional counters, or new labels, in order to distinguish forward progress versus mere polling.

Copy link
Contributor Author

@branlwyd branlwyd Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wise, but these metrics only concern themselves with completed report aggregations & failed steps, both of which are agnostic to the aggregation mode. I think we can get pretty far by looking to the request verb & the response code [edit: in our standard HTTP metrics], though these metrics won't give full clarity (i.e. we can differentiate between a Leader's attempt to poll vs an attempt to send a new step via the HTTP verb; but I don't think we can tell whether the Helper responded with a processing or finished response with these metrics). However, I strongly suspect that our current integrations will always use exactly one of asynchronous or synchronous responses, so this may be moot at the moment, at least for us.

@branlwyd branlwyd enabled auto-merge (squash) December 13, 2024 00:15
@branlwyd branlwyd force-pushed the bran/async-aggregation branch from 314c2eb to 8dfe337 Compare December 13, 2024 00:15
@branlwyd branlwyd merged commit e4aab44 into main Dec 13, 2024
8 checks passed
@branlwyd branlwyd deleted the bran/async-aggregation branch December 13, 2024 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
allow-changed-migrations Override the ci-migrations check to allow migrations that have changed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants