-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aggregator] Propagate cancellation through tick #3895
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3895 +/- ##
========================================
- Coverage 56.9% 56.7% -0.2%
========================================
Files 553 553
Lines 63286 63323 +37
========================================
- Hits 36051 35961 -90
- Misses 24047 24163 +116
- Partials 3188 3199 +11
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
@@ -51,7 +66,14 @@ func Serve( | |||
if err := m3msgServer.ListenAndServe(); err != nil { | |||
return fmt.Errorf("could not start m3msg server at: addr=%s, err=%v", m3msgAddr, err) | |||
} | |||
defer m3msgServer.Close() | |||
|
|||
defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably add all of this logging into x/server
server.go Close
func so you don't need to do these custom defer func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though prob only a subset of the servers we want this for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm actually not all of these are using this x/server, e.g. the http handler just implements x/server
here, so this may not be the best approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logging feels like debug to me and kind of noisy when things are actually working as intended.
src/aggregator/aggregator/map.go
Outdated
) | ||
|
||
// NB: if no doneChan provided, do not interrupt the tick. | ||
if doneCh == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is a done chan not provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be if everything is hooked up correctly, this is largely just to be defensive here; might remove it and hard panic or better yet error if we don't have a chan passed in to make it more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, actually... TIL that you can use nil channels in a select https://play.golang.org/p/DltRooS3v7D, just going to get rid of this section
shardTickResult := shard.Tick(perShardTickDuration) | ||
tickResult = tickResult.merge(shardTickResult) | ||
select { | ||
case <-agg.doneCh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we don't actually need to check here right? since we check within the tickShardFn when iterating over every metric right? just checking my understanding.
the only reason I mention this nit, is it at first it reads like we only check once per shard, which would be bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is if we've signaled on doneCh, we can avoid ticking the remainder of the shards, I'll make this more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right. good point. nah it makes sense.
Aggregator's
tickInternal
method can soft-blockaggregator.Close()
fora long period of time if the doneCh is signalled during a tick; these end up
iterating over internal shard ticks, which can take up to
EntryCheckInterval
,which defaults to an hour. This is much longer than the graceful close period,
which is 15s by default, so aggregator graceful close almost never completes,
and ends up timing out instead.