-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eds: improve performance of updates #11362
Comments
@htuch The results of your performance analysis are consistent with the results of my previous performance analysis of EDS. In addition, I also found that the comparison of Metadata is also a very CPU-intensive point. |
Just so the community is aware, we are seeing performance issues at Lyft also. Going to try sending v3 data for EDS to see if that helps alleviate some of the CPU pressure. |
I think for folks who switch to v3, there probably isn't a big regression compared to before, but there might be a small amount of room to improve, see #10875 (comment). The most attractive low hanging fruit here is to avoid running the xDS resource through |
One other thing to note is that LEDS will also be one way to address this overhead (other than the initial update), see #10373. Backend subsetting is another option. |
This shows that the EDS onConfigUpdate() is 2-3x slower when working with v2 config and doing version conversion with original version recovery. Followup PRs will optimize. Relates to envoyproxy#11362 and envoyproxy#10875. Risk level: Low Testing: Ran benchmark with -c opt binary. Signed-off-by: Harvey Tuch <[email protected]>
There's also another optimization here that the discussion today from @mattklein123 on SotW/delta handling made me think about. If the |
Tagging for collection in #10943 |
This shows that the EDS onConfigUpdate() is 2-3x slower when working with v2 config and doing version conversion with original version recovery. Followup PRs will optimize. Relates to #11362 and #10875. Risk level: Low Testing: Ran benchmark with -c opt binary. Signed-off-by: Harvey Tuch <[email protected]>
A lot of the overhead we see in envoyproxy#11362 has to do with the overhead from the GrpcMuxImpl ingestion path, and helps put the EDS cluster overhead in context. This patch adds a GrpcMuxImpl and GrpcSubscription to the test, allowing a fairly realistic cost of EDS update to be obtained. Below is the example output. We're seeing v3 costs ~3.2x of the v2 ingestion. Future patches will narrow this. --------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------- priorityAndLocalityWeighted/0/0/2000 6277063 ns 6276456 ns 113 priorityAndLocalityWeighted/1/0/2000 20236878 ns 20236505 ns 35 priorityAndLocalityWeighted/0/1/2000 4589757 ns 4588833 ns 154 priorityAndLocalityWeighted/1/1/2000 14853274 ns 14850006 ns 42 priorityAndLocalityWeighted/0/0/4096 11865838 ns 11865314 ns 59 priorityAndLocalityWeighted/1/0/4096 39787876 ns 39787716 ns 18 priorityAndLocalityWeighted/0/1/4096 8186874 ns 8185724 ns 87 priorityAndLocalityWeighted/1/1/4096 29376139 ns 29374980 ns 24 priorityAndLocalityWeighted/0/0/32768 110248743 ns 110246101 ns 6 priorityAndLocalityWeighted/1/0/32768 343826306 ns 343761359 ns 2 priorityAndLocalityWeighted/0/1/32768 76962915 ns 76953406 ns 7 priorityAndLocalityWeighted/1/1/32768 254426231 ns 254401890 ns 3 priorityAndLocalityWeighted/0/0/100000 352576987 ns 352550994 ns 2 priorityAndLocalityWeighted/1/0/100000 1100804152 ns 1100646263 ns 1 priorityAndLocalityWeighted/0/1/100000 253870688 ns 253857626 ns 3 priorityAndLocalityWeighted/1/1/100000 808080918 ns 808028852 ns 1 Risk level: Low (test only) Testing: Added ASSERTs to validate that we're loaded all the hosts at the end of the benchmark. Signed-off-by: Harvey Tuch <[email protected]>
…11566) A lot of the overhead we see in #11362 has to do with the overhead from the GrpcMuxImpl ingestion path, and helps put the EDS cluster overhead in context. This patch adds a GrpcMuxImpl and GrpcSubscription to the test, allowing a fairly realistic cost of EDS update to be obtained. Below is the example output. We're seeing v3 costs ~3.2x of the v2 ingestion. Future patches will narrow this. --------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------- priorityAndLocalityWeighted/0/0/2000 6277063 ns 6276456 ns 113 priorityAndLocalityWeighted/1/0/2000 20236878 ns 20236505 ns 35 priorityAndLocalityWeighted/0/1/2000 4589757 ns 4588833 ns 154 priorityAndLocalityWeighted/1/1/2000 14853274 ns 14850006 ns 42 priorityAndLocalityWeighted/0/0/4096 11865838 ns 11865314 ns 59 priorityAndLocalityWeighted/1/0/4096 39787876 ns 39787716 ns 18 priorityAndLocalityWeighted/0/1/4096 8186874 ns 8185724 ns 87 priorityAndLocalityWeighted/1/1/4096 29376139 ns 29374980 ns 24 priorityAndLocalityWeighted/0/0/32768 110248743 ns 110246101 ns 6 priorityAndLocalityWeighted/1/0/32768 343826306 ns 343761359 ns 2 priorityAndLocalityWeighted/0/1/32768 76962915 ns 76953406 ns 7 priorityAndLocalityWeighted/1/1/32768 254426231 ns 254401890 ns 3 priorityAndLocalityWeighted/0/0/100000 352576987 ns 352550994 ns 2 priorityAndLocalityWeighted/1/0/100000 1100804152 ns 1100646263 ns 1 priorityAndLocalityWeighted/0/1/100000 253870688 ns 253857626 ns 3 priorityAndLocalityWeighted/1/1/100000 808080918 ns 808028852 ns 1 Risk level: Low (test only) Testing: Added ASSERTs to validate that we're loaded all the hosts at the end of the benchmark. Signed-off-by: Harvey Tuch <[email protected]>
Previously, the gRPC muxes required that decoding an opaque resource to obtain its name, then dispatch to the relevant subscription, which would again decode the opaque resource. This is pretty horrible efficiency wise, in particular when upgrading from v2 -> v3. In this patch, we introduce a DecodedResource wrapper and OpaqueResourceDecoder. The config ingestion module, e.g. GrpcMuxImpl, uses the OpaqueResourceDecoder to produce a typed DecodedResource, performing the decode once. This DecodedResource is then dispatched to the watching subscription. This provides > 20% speedup on the v2 -> v3 tax for eds_speed_test, decreasing from an overhead of 3.2x to 2.5x. It's also likely to unlock further optimizations as we now have a wrapper resource and simplifies subscription implementations, as they no longer need to deal with delta vs. SotW resource decoding in different ways. Risk level: Medium (configuration ingestion path changes). Testing: New unit tests for DecodedResourceImpl/OpaqueResourceDecoderImpl, updated existing unit tests to work with new interfaces. Partial solution to envoyproxy#11362 Signed-off-by: Harvey Tuch <[email protected]>
…nvoyproxy#11566) A lot of the overhead we see in envoyproxy#11362 has to do with the overhead from the GrpcMuxImpl ingestion path, and helps put the EDS cluster overhead in context. This patch adds a GrpcMuxImpl and GrpcSubscription to the test, allowing a fairly realistic cost of EDS update to be obtained. Below is the example output. We're seeing v3 costs ~3.2x of the v2 ingestion. Future patches will narrow this. --------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------- priorityAndLocalityWeighted/0/0/2000 6277063 ns 6276456 ns 113 priorityAndLocalityWeighted/1/0/2000 20236878 ns 20236505 ns 35 priorityAndLocalityWeighted/0/1/2000 4589757 ns 4588833 ns 154 priorityAndLocalityWeighted/1/1/2000 14853274 ns 14850006 ns 42 priorityAndLocalityWeighted/0/0/4096 11865838 ns 11865314 ns 59 priorityAndLocalityWeighted/1/0/4096 39787876 ns 39787716 ns 18 priorityAndLocalityWeighted/0/1/4096 8186874 ns 8185724 ns 87 priorityAndLocalityWeighted/1/1/4096 29376139 ns 29374980 ns 24 priorityAndLocalityWeighted/0/0/32768 110248743 ns 110246101 ns 6 priorityAndLocalityWeighted/1/0/32768 343826306 ns 343761359 ns 2 priorityAndLocalityWeighted/0/1/32768 76962915 ns 76953406 ns 7 priorityAndLocalityWeighted/1/1/32768 254426231 ns 254401890 ns 3 priorityAndLocalityWeighted/0/0/100000 352576987 ns 352550994 ns 2 priorityAndLocalityWeighted/1/0/100000 1100804152 ns 1100646263 ns 1 priorityAndLocalityWeighted/0/1/100000 253870688 ns 253857626 ns 3 priorityAndLocalityWeighted/1/1/100000 808080918 ns 808028852 ns 1 Risk level: Low (test only) Testing: Added ASSERTs to validate that we're loaded all the hosts at the end of the benchmark. Signed-off-by: Harvey Tuch <[email protected]> Signed-off-by: yashwant121 <[email protected]>
This shows that the EDS onConfigUpdate() is 2-3x slower when working with v2 config and doing version conversion with original version recovery. Followup PRs will optimize. Relates to envoyproxy#11362 and envoyproxy#10875. Risk level: Low Testing: Ran benchmark with -c opt binary. Signed-off-by: Harvey Tuch <[email protected]> Signed-off-by: yashwant121 <[email protected]>
…nvoyproxy#11566) A lot of the overhead we see in envoyproxy#11362 has to do with the overhead from the GrpcMuxImpl ingestion path, and helps put the EDS cluster overhead in context. This patch adds a GrpcMuxImpl and GrpcSubscription to the test, allowing a fairly realistic cost of EDS update to be obtained. Below is the example output. We're seeing v3 costs ~3.2x of the v2 ingestion. Future patches will narrow this. --------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------- priorityAndLocalityWeighted/0/0/2000 6277063 ns 6276456 ns 113 priorityAndLocalityWeighted/1/0/2000 20236878 ns 20236505 ns 35 priorityAndLocalityWeighted/0/1/2000 4589757 ns 4588833 ns 154 priorityAndLocalityWeighted/1/1/2000 14853274 ns 14850006 ns 42 priorityAndLocalityWeighted/0/0/4096 11865838 ns 11865314 ns 59 priorityAndLocalityWeighted/1/0/4096 39787876 ns 39787716 ns 18 priorityAndLocalityWeighted/0/1/4096 8186874 ns 8185724 ns 87 priorityAndLocalityWeighted/1/1/4096 29376139 ns 29374980 ns 24 priorityAndLocalityWeighted/0/0/32768 110248743 ns 110246101 ns 6 priorityAndLocalityWeighted/1/0/32768 343826306 ns 343761359 ns 2 priorityAndLocalityWeighted/0/1/32768 76962915 ns 76953406 ns 7 priorityAndLocalityWeighted/1/1/32768 254426231 ns 254401890 ns 3 priorityAndLocalityWeighted/0/0/100000 352576987 ns 352550994 ns 2 priorityAndLocalityWeighted/1/0/100000 1100804152 ns 1100646263 ns 1 priorityAndLocalityWeighted/0/1/100000 253870688 ns 253857626 ns 3 priorityAndLocalityWeighted/1/1/100000 808080918 ns 808028852 ns 1 Risk level: Low (test only) Testing: Added ASSERTs to validate that we're loaded all the hosts at the end of the benchmark. Signed-off-by: Harvey Tuch <[email protected]> Signed-off-by: yashwant121 <[email protected]> Signed-off-by: yashwant121 <[email protected]> Signed-off-by: yashwant121 <[email protected]>
This shows that the EDS onConfigUpdate() is 2-3x slower when working with v2 config and doing version conversion with original version recovery. Followup PRs will optimize. Relates to envoyproxy#11362 and envoyproxy#10875. Risk level: Low Testing: Ran benchmark with -c opt binary. Signed-off-by: Harvey Tuch <[email protected]>
…nvoyproxy#11566) A lot of the overhead we see in envoyproxy#11362 has to do with the overhead from the GrpcMuxImpl ingestion path, and helps put the EDS cluster overhead in context. This patch adds a GrpcMuxImpl and GrpcSubscription to the test, allowing a fairly realistic cost of EDS update to be obtained. Below is the example output. We're seeing v3 costs ~3.2x of the v2 ingestion. Future patches will narrow this. --------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------- priorityAndLocalityWeighted/0/0/2000 6277063 ns 6276456 ns 113 priorityAndLocalityWeighted/1/0/2000 20236878 ns 20236505 ns 35 priorityAndLocalityWeighted/0/1/2000 4589757 ns 4588833 ns 154 priorityAndLocalityWeighted/1/1/2000 14853274 ns 14850006 ns 42 priorityAndLocalityWeighted/0/0/4096 11865838 ns 11865314 ns 59 priorityAndLocalityWeighted/1/0/4096 39787876 ns 39787716 ns 18 priorityAndLocalityWeighted/0/1/4096 8186874 ns 8185724 ns 87 priorityAndLocalityWeighted/1/1/4096 29376139 ns 29374980 ns 24 priorityAndLocalityWeighted/0/0/32768 110248743 ns 110246101 ns 6 priorityAndLocalityWeighted/1/0/32768 343826306 ns 343761359 ns 2 priorityAndLocalityWeighted/0/1/32768 76962915 ns 76953406 ns 7 priorityAndLocalityWeighted/1/1/32768 254426231 ns 254401890 ns 3 priorityAndLocalityWeighted/0/0/100000 352576987 ns 352550994 ns 2 priorityAndLocalityWeighted/1/0/100000 1100804152 ns 1100646263 ns 1 priorityAndLocalityWeighted/0/1/100000 253870688 ns 253857626 ns 3 priorityAndLocalityWeighted/1/1/100000 808080918 ns 808028852 ns 1 Risk level: Low (test only) Testing: Added ASSERTs to validate that we're loaded all the hosts at the end of the benchmark. Signed-off-by: Harvey Tuch <[email protected]>
Previously, the gRPC muxes required decoding an opaque resource to obtain its name, then dispatch to the relevant subscription, which would again decode the opaque resource. This is pretty horrible efficiency wise, in particular when upgrading from v2 -> v3. In this patch, we introduce a DecodedResource wrapper and OpaqueResourceDecoder. The config ingestion module, e.g. GrpcMuxImpl, uses the OpaqueResourceDecoder to produce a typed DecodedResource, performing the decode once. This DecodedResource is then dispatched to the watching subscription. This provides > 20% speedup on the v2 -> v3 tax for eds_speed_test, decreasing from an overhead of 3.2x to 2.5x. It's also likely to unlock further optimizations as we now have a wrapper resource and simplifies subscription implementations, as they no longer need to deal with delta vs. SotW resource decoding in different ways. Risk level: Medium (configuration ingestion path changes). Testing: New unit tests for DecodedResourceImpl/OpaqueResourceDecoderImpl, updated existing unit tests to work with new interfaces. Partial solution to #11362 Signed-off-by: Harvey Tuch <[email protected]>
Makes BaseDynamicClusterImpl::updateDynamicHostList O(n) rather than O(n^2) Instead of calling .erase() on list iterators as we find them, we swap with the end of the list and erase after iterating over the list. This shows a ~3x improvement in execution time in the included benchmark test. Risk Level: Medium. No reordering happens to the endpoint list. Not runtime guarded. Testing: New benchmark, existing unit tests pass (and cover the affected function). Docs Changes: N/A Release Notes: N/A Relates to #2874 #11362 Signed-off-by: Phil Genera <[email protected]>
While looking at eds_speed_test flamegraphs as part of envoyproxy#10875, it seemed we were doing some redundant work, namely first unpacking to v2 message and then upgrading from v2 to v3. Since v2 and v3 are wire compatible, and upgrade is just a wirecast, we might as well just unpack to v2. This improves eds_speed_test significantly, the penalty for v3 vs. v2 drops from 2.6x to 2x. Part of envoyproxy#11362 envoyproxy#10875. Risk level: Low Testing: Coverage of behavior from existing tests. Signed-off-by: Harvey Tuch <[email protected]>
While looking at eds_speed_test flamegraphs as part of #10875, it seemed we were doing some redundant work, namely first unpacking to v2 message and then upgrading from v2 to v3. Since v2 and v3 are wire compatible, and upgrade is just a wirecast, we might as well just unpack to v2. This improves eds_speed_test significantly, the penalty for v3 vs. v2 drops from 2.6x to 2x. Part of #11362 #10875. Risk level: Low Testing: Coverage of behavior from existing tests. Signed-off-by: Harvey Tuch <[email protected]>
I think as a result of a combination of the improvements from #11442 and #10875 (comment), we can close this one out. Please reopen if #10875 (comment) and HEAD Envoy is not sufficient. |
This shows that the EDS onConfigUpdate() is 2-3x slower when working with v2 config and doing version conversion with original version recovery. Followup PRs will optimize. Relates to envoyproxy#11362 and envoyproxy#10875. Risk level: Low Testing: Ran benchmark with -c opt binary. Signed-off-by: Harvey Tuch <[email protected]> Signed-off-by: yashwant121 <[email protected]>
…nvoyproxy#11566) A lot of the overhead we see in envoyproxy#11362 has to do with the overhead from the GrpcMuxImpl ingestion path, and helps put the EDS cluster overhead in context. This patch adds a GrpcMuxImpl and GrpcSubscription to the test, allowing a fairly realistic cost of EDS update to be obtained. Below is the example output. We're seeing v3 costs ~3.2x of the v2 ingestion. Future patches will narrow this. --------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------- priorityAndLocalityWeighted/0/0/2000 6277063 ns 6276456 ns 113 priorityAndLocalityWeighted/1/0/2000 20236878 ns 20236505 ns 35 priorityAndLocalityWeighted/0/1/2000 4589757 ns 4588833 ns 154 priorityAndLocalityWeighted/1/1/2000 14853274 ns 14850006 ns 42 priorityAndLocalityWeighted/0/0/4096 11865838 ns 11865314 ns 59 priorityAndLocalityWeighted/1/0/4096 39787876 ns 39787716 ns 18 priorityAndLocalityWeighted/0/1/4096 8186874 ns 8185724 ns 87 priorityAndLocalityWeighted/1/1/4096 29376139 ns 29374980 ns 24 priorityAndLocalityWeighted/0/0/32768 110248743 ns 110246101 ns 6 priorityAndLocalityWeighted/1/0/32768 343826306 ns 343761359 ns 2 priorityAndLocalityWeighted/0/1/32768 76962915 ns 76953406 ns 7 priorityAndLocalityWeighted/1/1/32768 254426231 ns 254401890 ns 3 priorityAndLocalityWeighted/0/0/100000 352576987 ns 352550994 ns 2 priorityAndLocalityWeighted/1/0/100000 1100804152 ns 1100646263 ns 1 priorityAndLocalityWeighted/0/1/100000 253870688 ns 253857626 ns 3 priorityAndLocalityWeighted/1/1/100000 808080918 ns 808028852 ns 1 Risk level: Low (test only) Testing: Added ASSERTs to validate that we're loaded all the hosts at the end of the benchmark. Signed-off-by: Harvey Tuch <[email protected]> Signed-off-by: yashwant121 <[email protected]>
Makes BaseDynamicClusterImpl::updateDynamicHostList O(n) rather than O(n^2) Instead of calling .erase() on list iterators as we find them, we swap with the end of the list and erase after iterating over the list. This shows a ~3x improvement in execution time in the included benchmark test. Risk Level: Medium. No reordering happens to the endpoint list. Not runtime guarded. Testing: New benchmark, existing unit tests pass (and cover the affected function). Docs Changes: N/A Release Notes: N/A Relates to envoyproxy#2874 envoyproxy#11362 Signed-off-by: Phil Genera <[email protected]> Signed-off-by: scheler <[email protected]>
While looking at eds_speed_test flamegraphs as part of envoyproxy#10875, it seemed we were doing some redundant work, namely first unpacking to v2 message and then upgrading from v2 to v3. Since v2 and v3 are wire compatible, and upgrade is just a wirecast, we might as well just unpack to v2. This improves eds_speed_test significantly, the penalty for v3 vs. v2 drops from 2.6x to 2x. Part of envoyproxy#11362 envoyproxy#10875. Risk level: Low Testing: Coverage of behavior from existing tests. Signed-off-by: Harvey Tuch <[email protected]> Signed-off-by: scheler <[email protected]>
Looking at performance profiles, there are a number of places we can optimize EDS updates:
We are burning 30-50% of time in version conversion. We can follow Guarantee and optimize v3 internally until close to v4alpha #10875 to improve this.
There is a lot of highly redundant computation happening as we are doing multiple
anyConvert
s, with proto deserialization overhead, version conversions, etc. e.g. when we access resource name, this does a full conversion. This will also apply to other resource types. Ideally we have a cheaper way to get at resource name. Also, message validation has to recover original type; we can possibly supply it directly.I think EDS update performance can be improve somewhere in the 50-70% range if ^^ is done right.
The text was updated successfully, but these errors were encountered: