-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modelindexer: Scale active indexers based on load #9393
modelindexer: Scale active indexers based on load #9393
Conversation
Modifies the `modelindexer` to scale up and down the number of "active" indexers (goroutines pulling out of the internal modelindexer queue), based on consecutive number of flushes up to 20% of the `GOMAXPROCS`. When N number of full flushes (Reached or exceeded `FlushBytes`) occur, a new active indexer will be created by the scale action, as long as the scaling cooldown has elapsed. Equally, when N number of timed flushes (due to `FlushInterval`) occur, an active indexer will be scaled down, since not enough load is going through the server to warrant the current number of active indexers. Active indexer downscaling can also be triggered due to a change in the `GOMAXPROCS`. This is particularly important for containerized or cgroup environments where CPU quotas may be updated "live". In this case, the downscale cooldown is ignored and active indexers are scaled down until until `active <= math.RoundToEven(GOMAXPROCS / 4)`. When traffic isn't going through an active indexer, a new timer has been introduced to allow completely idle indexers to be scaled back. The idle check interval can be configured via `IdleInterval`. Scaling is enabled by default, but can be disabled via the configuration option `output.elasticsearch.scaling.disabled: true`. Last, modifies the default settings for `max_requests` and `flush_bytes` to be `50` and `1MB` respectively. This allows smaller payloads to be sent to Elasticsearch, more available indexers can be used by instances with more processing power, and the indexers are cycled faster, which results in better usage and performance. Signed-off-by: Marc Lopez Rubio <[email protected]>
📚 Go benchmark reportDiff with the
report generated with https://pkg.go.dev/golang.org/x/perf/cmd/benchstat |
Since the benchmark diff is not very readable, here's the output from my laptop:
And
|
Benchstat diff1 to 4g APM ServerAll had similar performance to main, since the active indexers were only scaled up briefly when CPU credits were available which to significant bursts in throughput when that occurred. 8g APM ServerThis one has less of a noticeable increase, but the CPU was utilized more fully. Bursts had way higher throughput. $ benchstat -alpha 0.11 sizes/main-active/8g-12s-12n.txt sizes/scaling/8g-24s-24n1mb-50av-gomaxprocs-by-4.txt
name old time/op new time/op delta
AgentAll-512 606ms ± 1% 593ms ±25% ~ (p=0.700 n=3+3)
name old error_responses/sec new error_responses/sec delta
AgentAll-512 0.03 ±58% 0.00 -100.00% (p=0.100 n=3+3)
name old events/sec new events/sec delta
AgentAll-512 29.5k ± 1% 31.5k ±28% ~ (p=0.700 n=3+3)
name old gc_cycles new gc_cycles delta
AgentAll-512 318 ± 1% 312 ±26% ~ (p=0.700 n=3+3)
name old max_goroutines new max_goroutines delta
AgentAll-512 382 ± 2% 398 ±29% ~ (p=0.700 n=3+3)
name old max_heap_alloc new max_heap_alloc delta
AgentAll-512 1.11G ± 2% 1.15G ± 5% ~ (p=0.200 n=3+3)
name old max_heap_objects new max_heap_objects delta
AgentAll-512 10.0M ± 2% 10.1M ± 6% ~ (p=1.000 n=3+3)
name old max_rss new max_rss delta
AgentAll-512 1.22G ± 1% 1.26G ± 5% ~ (p=0.400 n=3+3)
name old mean_available_indexers new mean_available_indexers delta
AgentAll-512 16.1 ± 1% 40.7 ± 8% +153.35% (p=0.100 n=3+3)
name old alloc/op new alloc/op delta
AgentAll-512 570MB ± 1% 581MB ± 1% +2.03% (p=0.100 n=3+3)
name old allocs/op new allocs/op delta
AgentAll-512 7.99M ± 1% 8.15M ± 1% +2.09% (p=0.100 n=3+3) 15g APM ServerFrom this size onwards is where the autoscaling really shines. CPU utilization was increased and so was throughput. $ benchstat -alpha 0.11 sizes/main-active/15g-12s-12n.txt sizes/scaling/15g-24s-24n1mb-50av-gomaxprocs-by-4.txt
name old time/op new time/op delta
AgentAll-960 561ms ±10% 336ms ± 0% -40.08% (p=0.100 n=3+3)
name old error_responses/sec new error_responses/sec delta
AgentAll-960 0.06 ±85% 0.00 ±200% -95.79% (p=0.100 n=3+3)
name old events/sec new events/sec delta
AgentAll-960 31.8k ±10% 53.6k ± 0% +68.34% (p=0.100 n=3+3)
name old gc_cycles new gc_cycles delta
AgentAll-960 317 ± 3% 448 ± 3% +41.28% (p=0.100 n=3+3)
name old max_goroutines new max_goroutines delta
AgentAll-960 386 ± 3% 562 ± 2% +45.64% (p=0.100 n=3+3)
name old max_heap_alloc new max_heap_alloc delta
AgentAll-960 1.11G ± 0% 1.19G ± 1% +7.72% (p=0.100 n=3+3)
name old max_heap_objects new max_heap_objects delta
AgentAll-960 9.89M ± 1% 10.67M ± 0% +7.91% (p=0.100 n=3+3)
name old max_rss new max_rss delta
AgentAll-960 1.23G ± 2% 1.33G ± 1% +7.84% (p=0.100 n=3+3)
name old mean_available_indexers new mean_available_indexers delta
AgentAll-960 15.5 ± 7% 33.4 ± 0% +115.79% (p=0.100 n=3+3)
name old alloc/op new alloc/op delta
AgentAll-960 564MB ± 1% 571MB ± 0% +1.20% (p=0.100 n=3+3)
name old allocs/op new allocs/op delta
AgentAll-960 7.92M ± 1% 8.07M ± 0% +1.82% (p=0.100 n=3+3) 30g APM Server$ benchstat -alpha 0.11 sizes/main-active/30g-12s-12n.txt sizes/scaling/30g-24s-24n1mb-50av-gomaxprocs-by-4-pr.txt
name old time/op new time/op delta
AgentAll-1920 585ms ± 5% 183ms ± 0% -68.71% (p=0.100 n=3+3)
name old error_responses/sec new error_responses/sec delta
AgentAll-1920 0.04 ±18% 0.01 ±50% -76.61% (p=0.100 n=3+3)
name old events/sec new events/sec delta
AgentAll-1920 30.7k ± 5% 98.1k ± 0% +219.84% (p=0.100 n=3+3)
name old gc_cycles new gc_cycles delta
AgentAll-1920 335 ± 5% 772 ± 1% +130.55% (p=0.100 n=3+3)
name old max_goroutines new max_goroutines delta
AgentAll-1920 390 ± 2% 1026 ± 1% +163.22% (p=0.100 n=3+3)
name old max_heap_alloc new max_heap_alloc delta
AgentAll-1920 1.09G ± 2% 1.41G ± 1% +29.59% (p=0.100 n=3+3)
name old max_heap_objects new max_heap_objects delta
AgentAll-1920 9.18M ± 3% 13.15M ± 1% +43.31% (p=0.100 n=3+3)
name old max_rss new max_rss delta
AgentAll-1920 1.23G ± 2% 1.58G ± 0% +28.20% (p=0.100 n=3+3)
name old mean_available_indexers new mean_available_indexers delta
AgentAll-1920 15.8 ± 3% 12.2 ± 7% -22.64% (p=0.100 n=3+3)
name old alloc/op new alloc/op delta
AgentAll-1920 573MB ± 0% 560MB ± 0% -2.19% (p=0.100 n=3+3)
name old allocs/op new allocs/op delta
AgentAll-1920 8.00M ± 0% 7.96M ± 0% -0.59% (p=0.100 n=3+3) |
Signed-off-by: Marc Lopez Rubio <[email protected]>
Signed-off-by: Marc Lopez Rubio <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a bunch of comments and suggestions, but this is very nice :)
// When the queue utilization is below 5%, reset the idleTimer. When | ||
// traffic to the APM Server is interrupted or stopped, it allows excess | ||
// active indexers that have been idle for a the IdleInterval to be | ||
// scaled down. | ||
activeIndexers := atomic.LoadInt64(&i.activeBulkRequests) | ||
lowChanCapacity := float64(len(i.bulkItems))/float64(cap(i.bulkItems)) <= 0.05 | ||
if lowChanCapacity && activeIndexers > 1 { | ||
idleTimer.Reset(i.config.Scaling.IdleInterval) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this on every iteration of the loop concerns me a little bit. Would it make sense to require FlushInterval >= IdleInterval
, and only reset the idle timer before the first iteration of the loop, and then after flushing? i.e. whenever the flush timer is inactive.
Also, I think the idle timer should only be started when autoscaling is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited about these changes!
Left a couple of mostly nitpicks around naming and comments.
Have you tested this against an undersized ES? Given that the number of active indexers is based on events processed in the APM Server and not also on ES response and pushback, I wonder if a situation might get worse for setups where APM Server is processing more events than ES can handle, by increasing the pressure towards ES.
// Disabled toggles active indexer scaling on. | ||
// | ||
// It is enabled by default. | ||
Disabled bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'd switch to Enabled
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think this is okay for the moment, maybe we can come back to it if it's a pain. The zero value for ScalingConfig currently means to use the default config; it should be enabled by default. So we would need to make it a *bool
to preserve that while also making it "enabled".
Signed-off-by: Marc Lopez Rubio <[email protected]>
Signed-off-by: Marc Lopez Rubio <[email protected]>
Signed-off-by: Marc Lopez Rubio <[email protected]>
Signed-off-by: Marc Lopez Rubio <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, it's looking pretty good now. Only one more comment on the metric names from me - I think the main remaining question is @simitt's one about behaviour when ES is underpowered.
This comment was marked as resolved.
This comment was marked as resolved.
Signed-off-by: Marc Lopez Rubio <[email protected]>
…scale-active-indexers
Signed-off-by: Marc Lopez Rubio <[email protected]>
Signed-off-by: Marc Lopez Rubio <[email protected]>
@simitt @axw Thanks for the reviews. I have tested the performance of an 8GB APM Server (4vCPUs with 8 as the burstable max) and it does seem like the pressure is a bit worse since the autoscaling does take place after 60 consecutive flushes. The backing Elasticsearch cluster is a 3 zone 8GB cluster. Comparing the current PR with main (main has 25 available indexers of 2.5MB of size): $ benchstat -alpha 0.11 sizes/scaling/undersized/8g-2s-2n-main.txt sizes/scaling/undersized/8g-2s-2n-pr.txt
name old time/op new time/op delta
AgentAll-512 1.91s ± 4% 2.01s ± 3% ~ (p=0.200 n=3+3)
name old error_responses/sec new error_responses/sec delta
AgentAll-512 0.00 0.00 ~ (all equal)
name old events/sec new events/sec delta
AgentAll-512 9.46k ± 4% 8.98k ± 3% ~ (p=0.200 n=3+3)
name old gc_cycles new gc_cycles delta
AgentAll-512 130 ± 8% 120 ± 2% -7.93% (p=0.100 n=3+3)
name old max_goroutines new max_goroutines delta
AgentAll-512 232 ±10% 294 ± 4% +26.69% (p=0.100 n=3+3)
name old max_heap_alloc new max_heap_alloc delta
AgentAll-512 989M ± 3% 1015M ± 2% ~ (p=0.400 n=3+3)
name old max_heap_objects new max_heap_objects delta
AgentAll-512 8.82M ± 5% 9.00M ± 5% ~ (p=0.700 n=3+3)
name old max_rss new max_rss delta
AgentAll-512 1.10G ± 2% 1.13G ± 1% +2.65% (p=0.100 n=3+3)
name old mean_available_indexers new mean_available_indexers delta
AgentAll-512 2.65 ±11% 0.00 -100.00% (p=0.100 n=3+3)
name old alloc/op new alloc/op delta
AgentAll-512 638MB ± 0% 644MB ± 1% ~ (p=0.200 n=3+3)
name old allocs/op new allocs/op delta
AgentAll-512 8.59M ± 0% 8.63M ± 0% ~ (p=0.200 n=3+3) The number of 429 was significant (up to 23% of all requests). I made some small changes (not pushed to this PR) which disallowed scaling when 1% or more of the total indexed documents results in a 429, that seemed to give a very similar performance to main: $ benchstat -alpha 0.11 sizes/scaling/undersized/8g-2s-2n-main.txt sizes/scaling/undersized/8g-2s-2npr-block-as.txt
name old time/op new time/op delta
AgentAll-512 1.91s ± 4% 1.92s ± 1% ~ (p=1.000 n=3+3)
name old error_responses/sec new error_responses/sec delta
AgentAll-512 0.00 0.00 ~ (all equal)
name old events/sec new events/sec delta
AgentAll-512 9.46k ± 4% 9.39k ± 1% ~ (p=1.000 n=3+3)
name old gc_cycles new gc_cycles delta
AgentAll-512 130 ± 8% 123 ± 5% ~ (p=0.500 n=3+3)
name old max_goroutines new max_goroutines delta
AgentAll-512 232 ±10% 304 ± 6% +30.85% (p=0.100 n=3+3)
name old max_heap_alloc new max_heap_alloc delta
AgentAll-512 989M ± 3% 974M ± 0% ~ (p=0.700 n=3+3)
name old max_heap_objects new max_heap_objects delta
AgentAll-512 8.82M ± 5% 8.21M ± 3% -6.96% (p=0.100 n=3+3)
name old max_rss new max_rss delta
AgentAll-512 1.10G ± 2% 1.11G ± 0% ~ (p=0.700 n=3+3)
name old mean_available_indexers new mean_available_indexers delta
AgentAll-512 2.65 ±11% 0.00 -100.00% (p=0.100 n=3+3)
name old alloc/op new alloc/op delta
AgentAll-512 638MB ± 0% 642MB ± 0% +0.60% (p=0.100 n=3+3)
name old allocs/op new allocs/op delta
AgentAll-512 8.59M ± 0% 8.62M ± 0% ~ (p=0.200 n=3+3) patch: diff --git a/internal/model/modelindexer/indexer.go b/internal/model/modelindexer/indexer.go
index de50be0a1..e50844c68 100644
--- a/internal/model/modelindexer/indexer.go
+++ b/internal/model/modelindexer/indexer.go
@@ -594,6 +594,14 @@ func (i *Indexer) maybeScaleDown(now time.Time, info scalingInfo, timedFlush *ui
}
info = i.scalingInformation() // refresh scaling info if CAS failed.
}
+ // If more than 1% of the requests result in 429, scale down.
+ if i.indexFailureRate() >= 0.01 {
+ if new := info.ScaleDown(now); i.scalingInfo.CompareAndSwap(info, new) {
+ i.logger.Infof("Elasticsearch 429 too many rate exceeded 1%, scaling down to: %d", new)
+ return true
+ }
+ return false
+ }
if *timedFlush < i.config.Scaling.ScaleDown.Threshold {
return false
}
@@ -624,6 +632,10 @@ func (i *Indexer) maybeScaleUp(now time.Time, info scalingInfo, fullFlush *uint)
// Reset fullFlush after it has exceeded the threshold
// it avoids unnecessary precociousness to scale up.
*fullFlush = 0
+ // If more than 1% of the requests result in 429, do not scale up.
+ if i.indexFailureRate() >= 0.01 {
+ return false
+ }
if info.withinCoolDown(i.config.Scaling.ScaleUp.CoolDown, now) {
return false
}
@@ -642,6 +654,11 @@ func (i *Indexer) scalingInformation() scalingInfo {
return i.scalingInfo.Load().(scalingInfo)
}
+func (i *Indexer) indexFailureRate() float64 {
+ return float64(atomic.LoadInt64(&i.tooManyRequests)) /
+ float64(atomic.LoadInt64(&i.eventsAdded))
+}
+
// activeLimit returns the value of GOMAXPROCS / 4. Which should limit the
// maximum number of active indexers to 25% of GOMAXPROCS.
// NOTE: There is also a sweet spot between Config.MaxRequests and the number
If we're ok merging this PR and doing some more investigation on backstop of pressure identifiers as a follow up then we can properly test all these changes and perhaps think about a different way to look for overwhelmed Elasticsearch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Silvia Mitter <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're ok merging this PR and doing some more investigation on backstop of pressure identifiers as a follow up then we can properly test all these changes and perhaps think about a different way to look for overwhelmed Elasticsearch.
SGTM.
The changes all look good apart from a couple of issues with the metrics.
Signed-off-by: Marc Lopez Rubio <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing, I think.
Signed-off-by: Marc Lopez Rubio <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
This should be tested as part of #9182 |
Motivation/summary
Modifies the
modelindexer
to scale up and down the number of "active" indexers (goroutines pulling out of the internal modelindexer queue), based on consecutive number of flushes up to 25% of theGOMAXPROCS
.When N number of full flushes (Reached or exceeded
FlushBytes
) occur, a new active indexer will be created by the scale action, as long as the scaling cooldown has elapsed.Equally, when N number of timed flushes (due to
FlushInterval
) occur, an active indexer will be scaled down, since not enough load is going through the server to warrant the current number of active indexers.Active indexer downscaling can also be triggered due to a change in the
GOMAXPROCS
. This is particularly important for containerized or cgroup environments where CPU quotas may be updated "live". In this case, the downscale cooldown is ignored and active indexers are scaled down until untilactive <= math.RoundToEven(GOMAXPROCS / 4)
.When traffic isn't going through an active indexer, a new timer has been introduced to allow completely idle indexers to be scaled back. The idle check interval can be configured via
IdleInterval
.Scaling is enabled by default, but can be disabled via the configuration option
output.elasticsearch.scaling.enabled: false
.Last, modifies the default settings for
max_requests
andflush_bytes
to be50
and1MB
respectively. This allows smaller payloads to be sent to Elasticsearch, more available indexers can be used by instances with more processing power, and the indexers are cycled faster, which results in better usage and performance.Checklist
- [ ] Update package changelog.yml (only if changes toapmpackage
have been made)How to test these changes
Run benchmarks on instances
8g
and bigger in ESS and observe CPU utilization reaching ~100%.Related issues
Closes #9181