Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream: avoid copies of all cluster endpoints for every resolve target #15013

Merged
merged 3 commits into from
Feb 12, 2021

Conversation

rojkov
Copy link
Member

@rojkov rojkov commented Feb 10, 2021

Commit Message: upstream: avoid copies of all cluster endpoints for every resolve target
Additional Description:
Currently Envoy::Upstream::StrictDnsClusterImpl::ResolveTarget when instantiated for every endpoint also creates a full copy of envoy::config::endpoint::v3::LocalityLbEndpoints the endpoint belongs to. Given the message contains all the endpoints defined for it this leads to exponential growth of consumed memory as the number of endpoints increases. Even though those copies of endpoints are not used.

Instead of creating a copy of envoy::config::endpoint::v3::LocalityLbEndpoints use a reference to a single copy stored in Envoy::Upstream::StrictDnsClusterImpl and accessible from all resolve targets during their life span.

Risk Level: Low
Testing: unit tests
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
May contribute to #12138, #14993

…cality

Currently Envoy::Upstream::StrictDnsClusterImpl::ResolveTarget when
instantiated for every endpoint also creates a full copy of
envoy::config::endpoint::v3::LocalityLbEndpoints the endpoint belongs
to. Given the message contains all the endpoints defined for it this
leads to exponential growth of consumed memory as the number of
endpoints increases. Even though those copies of endpoints are not
used.

Instead of creating a copy of
envoy::config::endpoint::v3::LocalityLbEndpoints introduce
WeightedLocality class which is a wrapper around
envoy::config::core::v3::Locality with priority and load balancing
weight actually used in the upstream implementation.

Signed-off-by: Dmitry Rozhkov <[email protected]>
@htuch
Copy link
Member

htuch commented Feb 10, 2021

@rojkov do you have before/after performance figures or flame graphs? :D I can believe this had a major impact, curious what it was .

@dmitri-d
Copy link
Contributor

Couldn't load_assignment (here: https://github.com/envoyproxy/envoy/blob/main/source/common/upstream/strict_dns_cluster.cc#L27) be made into a member var, and then locality_lb_endpoint can be stored as a reference in ResolveTarget.

@rojkov
Copy link
Member Author

rojkov commented Feb 11, 2021

Couldn't load_assignment be made into a member var, and then locality_lb_endpoint can be stored as a reference in ResolveTarget.

Yes, that's possible with all the tests passing. Here's the v2 patch.

I've measured how quickly ResolveTargets are created for 9002 endpoints with perf annotations (and with -c opt):

the current master

Duration(us)  # Calls  Mean(ns)  StdDev(ns)  Min(ns)  Max(ns)  Category   Description
    13166317     9002   1462599     77592.5  1325699  2418618  done       create ResolveTarget

v1 with WeightedLocality (I did 4 runs)

Duration(us)  # Calls  Mean(ns)  StdDev(ns)  Min(ns)  Max(ns)  Category   Description
        6956     9002       772     1356.23      266    18082  done      create ResolveTarget
        6250     9002       694     1257.25      287    26519  done      create ResolveTarget
        6519     9002       724     1339.45      303    30662  done      create ResolveTarget
        5937     9002       659     1222.48      263    20861  done      create ResolveTarget

v2 with locality_lb_endpoints kept around

Duration(us)  # Calls  Mean(ns)  StdDev(ns)  Min(ns)  Max(ns)  Category   Description
        7793     9002       865     1360.81      293    18177  done      create ResolveTarget
        5732     9002       636     1208.12      260    24967  done      create ResolveTarget
        6340     9002       704     1156.34      325    21298  done      create ResolveTarget
        7507     9002       833     1349.85      309    21394  done      create ResolveTarget

The v2 patch seems to be as fast as v1, but it makes StrictDnsClusterImpl keep endpoints data in memory longer than needed (only locality, priority and weight is used after StrictDnsClusterImpl's ctor is done).

Though in both cases memory consumption is negligible (pprof doesn't show anything whose cumulative allocations <268k; I couldn't find how to make it show) comparing to the current code. For 9002 endpoints it consumes >20G and can probably explain #14993.

Screenshot (11)

@htuch
Copy link
Member

htuch commented Feb 12, 2021

Yeah, much prefer the v2 patch. Good call @dmitri-d. I am still somewhat convinced that using protos in various places where we don't need is going to bite us, but not this time.

…int's Locality"

This reverts commit 3de74c1.

Signed-off-by: Dmitry Rozhkov <[email protected]>
for an instance of StrictDnsClusterImpl while it exists.

Signed-off-by: Dmitry Rozhkov <[email protected]>
@rojkov rojkov changed the title upstream: introduce weighted locality as a wrapper around endpoint's Locality upstream: avoid copies of all cluster endpoints for every resolve target Feb 12, 2021
@rojkov
Copy link
Member Author

rojkov commented Feb 12, 2021

Switched to v2 and updated the description accordingly.

@rojkov
Copy link
Member Author

rojkov commented Feb 12, 2021

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #15013 (comment) was created by @rojkov.

see: more, trace.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@htuch htuch merged commit c676c00 into envoyproxy:main Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants