Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade from v1.12.3 to v1.14.4 leads to more CPU usage without traffic #12080

Closed
fxposter opened this issue Jul 14, 2020 · 45 comments
Closed

Upgrade from v1.12.3 to v1.14.4 leads to more CPU usage without traffic #12080

fxposter opened this issue Jul 14, 2020 · 45 comments
Labels
area/perf question Questions that are neither investigations, bugs, nor enhancements stale stalebot believes this issue/PR has not been touched recently

Comments

@fxposter
Copy link

Description:
While trying to upgrade to latest version of envoy, we faced the problem (maybe) - envoy started to consume 1.5x more cpu (from 7-9% cpu to 12-20%) when there are no traffic to this particular envoy instance. Also, cpu usage graph now have way more spikes than it had before(previously it looked like a flat line - now it is full of spikes).

We have our own control plane and our bootstrap config looks like this:

node:
  cluster: private-cluster

admin:
  access_log_path: /var/log/envoy/admin_access.log
  address:
    socket_address: { address: 0.0.0.0, port_value: 9901 }

stats_flush_interval: 120s

dynamic_resources:
  lds_config: {ads: {}}
  cds_config: {ads: {}}
  ads_config:
    api_type: GRPC
    grpc_services:
      envoy_grpc:
        cluster_name: xds_cluster

static_resources:
  clusters:
  - name: xds_cluster
    connect_timeout: 1s
    type: STRICT_DNS
    lb_policy: RANDOM
    http2_protocol_options: {}
    upstream_connection_options:
      tcp_keepalive:
        keepalive_time: 300
        keepalive_interval: 10
    load_assignment:
      cluster_name: xds_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: private-address
                port_value: 8080

we have quite a lot of routes/clusters/clusterLoadAssignments there which change quite often, though in this particular case number of changes is not that big - change every 20-300 seconds, but they are still present. Full config dump after initialization is about 6mb.

Doing CPU profiling gives me something like this:
image
image

We don't use envoy's healthchecking/rate limiting/etc - I am not sure whether their appearance in cpu profile says anything, but it bothers me a bit. Also, "__ZNSt3__1L20__throw_length_errorEPKc" looks strange.

Full profile of 10 minutes of time is here: envoy.1.14.prof.zip

@mattklein123
Copy link
Member

I'm pretty skeptical of these symbols. Can you get a trace with a binary with debug symbols?

@fxposter
Copy link
Author

I forgot to mention that I also have a custom envoy (we use some of internal plugins). Currently I build envoy with ci/do_ci.sh bazel.release.server_only and take binary from build_release_stripped dir. Should I build it differently, ie: bazel.debug.server_only? or something else

@mattklein123
Copy link
Member

If you are stripping the symbols you have to actually make sure perf, etc. can find them, or just don't strip.

@fxposter
Copy link
Author

I just reused https://github.com/envoyproxy/envoy/blob/master/ci/Dockerfile-envoy, which uses stripped dir, I guess, I need to use build_release dir instead. Will update, thanks.

@fxposter
Copy link
Author

btw, are there any performance downsides about using stripped/non-stripped binaries? ie: slower startup due to ~1gb envoy binary size/etc?

@mattklein123
Copy link
Member

No, no downsides other than increased disk cost, transfer, etc.

@lizan
Copy link
Member

lizan commented Jul 15, 2020

@fxposter Those are just mangled symbol regardless whether you have debug symbols or not. For example

echo _ZNSt3__1L20__throw_length_errorEPKc | c++filt
std::__1::__throw_length_error(char const*)

pprof should be able to demangle it with --symbols flag though

@fxposter
Copy link
Author

Hm, while running with the non-stripped binary - at the end I get the same flamegraph/top with same symbols :(

@fxposter
Copy link
Author

@lizan yes, I tried c++filt, flamegraph stacktraces are still quite weird though :(

@fxposter
Copy link
Author

Regarding "--symbols" - are you talking about https://github.com/google/pprof? Don't see such option there.

@fxposter
Copy link
Author

BTW, the screenshots are taken from pprof that was ran on mac os x, while envoys were running in docker on linux machines.

@fxposter
Copy link
Author

To this point

  1. I collect .prof file by running /cpuprofiler?enable=y and /cpuprofiler?enable=n
  2. Even with release (but non-stripped) version of envoy when I try running pprof in docker with the same image where it was collected - I get this
pprof -text /usr/local/bin/envoy /data/envoy.1.14.stats.prof
Some binary filenames not available. Symbolization may be incomplete.
Try setting PPROF_BINARY_PATH to the search path for local binaries.
File: envoy
Type: cpu
Showing nodes accounting for 16.03s, 99.63% of 16.09s total
Dropped 2 nodes (cum <= 0.08s)
      flat  flat%   sum%        cum   cum%
    13.30s 82.66% 82.66%     15.99s 99.38%  [envoy]
     1.43s  8.89% 91.55%     14.60s 90.74%  [libc-2.23.so]
     1.30s  8.08% 99.63%      2.85s 17.71%  [libpthread-2.23.so]

@fxposter
Copy link
Author

When running pprof on mac os (without specifying the binary) - it shows mangled names

@junr03 junr03 added area/perf question Questions that are neither investigations, bugs, nor enhancements labels Jul 15, 2020
@fxposter
Copy link
Author

Managed to get proper data:
image

CPU usage spikes look like:
image

I suspect code that applies new envoy configurations coming from control plane now takes much more time than it was before.

Flat %:

(pprof) top 100
Showing nodes accounting for 13.38s, 79.88% of 16.75s total
Dropped 440 nodes (cum <= 0.08s)
Showing top 100 nodes out of 261
      flat  flat%   sum%        cum   cum%
     2.22s 13.25% 13.25%      2.22s 13.25%  __write_nocancel
     2.15s 12.84% 26.09%      2.15s 12.84%  inotify_init
     0.80s  4.78% 30.87%      0.80s  4.78%  tc_newarray
     0.50s  2.99% 33.85%      0.50s  2.99%  __read_nocancel
     0.50s  2.99% 36.84%      0.50s  2.99%  operator delete
     0.44s  2.63% 39.46%      1.79s 10.69%  std::__1::basic_regex::__match_at_start_ecma (partial-inline)
     0.35s  2.09% 41.55%      0.35s  2.09%  google::protobuf::Descriptor::FindFieldByNumber
     0.34s  2.03% 43.58%      0.63s  3.76%  google::protobuf::Reflection::HasBit
     0.26s  1.55% 45.13%      0.26s  1.55%  google::protobuf::internal::ReflectionSchema::GetFieldOffset
     0.25s  1.49% 46.63%      0.25s  1.49%  tcmalloc::CentralFreeList::FetchFromOneSpansSafe
     0.24s  1.43% 48.06%      0.44s  2.63%  google::protobuf::DynamicMessage::~DynamicMessage
     0.18s  1.07% 49.13%      0.18s  1.07%  __nss_passwd_lookup
     0.18s  1.07% 50.21%      0.18s  1.07%  absl::hash_internal::CityHash64
     0.18s  1.07% 51.28%      0.33s  1.97%  google::protobuf::DynamicMessage::SharedCtor
     0.18s  1.07% 52.36%      0.25s  1.49%  std::__1::vector::__append
     0.17s  1.01% 53.37%      0.42s  2.51%  tcmalloc::CentralFreeList::RemoveRange
     0.16s  0.96% 54.33%      0.16s  0.96%  __GI___pthread_mutex_lock
     0.16s  0.96% 55.28%      0.16s  0.96%  __GI___pthread_mutex_unlock
     0.16s  0.96% 56.24%      0.25s  1.49%  __state (inline)
     0.16s  0.96% 57.19%      0.39s  2.33%  google::protobuf::Reflection::GetRaw
     0.15s   0.9% 58.09%      0.15s   0.9%  std::__1::vector::size (inline)
     0.14s  0.84% 58.93%      0.47s  2.81%  google::protobuf::internal::ReflectionOps::IsInitialized
     0.14s  0.84% 59.76%      0.14s  0.84%  tcmalloc::CentralFreeList::ReleaseToSpans
     0.13s  0.78% 60.54%      0.13s  0.78%  Envoy::Stats::SymbolTableImpl::Encoding::decodeNumber (inline)
     0.13s  0.78% 61.31%     16.50s 98.51%  event_base_loop
     0.13s  0.78% 62.09%      0.22s  1.31%  google::protobuf::Reflection::FieldSize
     0.13s  0.78% 62.87%      0.65s  3.88%  google::protobuf::Reflection::ListFields
     0.13s  0.78% 63.64%      0.13s  0.78%  std::__1::__match_char::__exec
     0.12s  0.72% 64.36%      0.50s  2.99%  google::protobuf::DynamicMessageFactory::GetPrototypeNoLock
     0.11s  0.66% 65.01%      0.17s  1.01%  std::__1::__state::__state
     0.11s  0.66% 65.67%      0.32s  1.91%  tcmalloc::ThreadCache::ReleaseToCentralCache
     0.10s   0.6% 66.27%      0.10s   0.6%  absl::Mutex::Lock
     0.10s   0.6% 66.87%      0.12s  0.72%  google::protobuf::UnknownField::Delete
     0.09s  0.54% 67.40%      0.09s  0.54%  __pthread_enable_asynccancel
     0.09s  0.54% 67.94%      0.34s  2.03%  google::protobuf::internal::WireFormat::InternalSerializeField
     0.09s  0.54% 68.48%      0.09s  0.54%  std::__1::__empty_state::__exec
     0.09s  0.54% 69.01%      0.19s  1.13%  std::__1::__hash_table::__emplace_unique_key_args
     0.09s  0.54% 69.55%      0.30s  1.79%  std::__1::vector::assign
     0.08s  0.48% 70.03%      0.13s  0.78%  google::protobuf::DynamicMessageFactory::~DynamicMessageFactory
     0.08s  0.48% 70.51%      0.33s  1.97%  google::protobuf::internal::WireFormat::FieldDataOnlyByteSize
     0.08s  0.48% 70.99%      0.09s  0.54%  std::__1::__hash_table::__rehash
     0.08s  0.48% 71.46%      0.19s  1.13%  std::__1::__shared_count::__release_shared (inline)
     0.07s  0.42% 71.88%      0.14s  0.84%  std::__1::__fill_n (inline)
     0.07s  0.42% 72.30%      0.12s  0.72%  std::__1::basic_string::basic_string
     0.07s  0.42% 72.72%      0.16s  0.96%  vector (inline)
     0.06s  0.36% 73.07%      2.39s 14.27%  event_callback_activate_nolock_
     0.06s  0.36% 73.43%      2.91s 17.37%  std::__1::basic_regex::__search
     0.05s   0.3% 73.73%      0.16s  0.96%  Envoy::(anonymous namespace)::UnexpectedFieldProtoVisitor::onField
     0.05s   0.3% 74.03%      0.18s  1.07%  epoll_dispatch
     0.05s   0.3% 74.33%      0.11s  0.66%  google::protobuf::UnknownFieldSet::MergeFrom
     0.05s   0.3% 74.63%      0.26s  1.55%  std::__1::vector::__push_back_slow_path
     0.05s   0.3% 74.93%      0.21s  1.25%  tcmalloc::CentralFreeList::InsertRange
     0.04s  0.24% 75.16%      0.67s  4.00%  Envoy::ProtobufMessage::traverseMutableMessage
     0.04s  0.24% 75.40%      0.21s  1.25%  envoy::config::route::v3::RouteAction::SharedDtor
     0.04s  0.24% 75.64%     13.76s 82.15%  event_process_active_single_queue
     0.04s  0.24% 75.88%      0.48s  2.87%  google::protobuf::DynamicMessage::CrossLinkPrototypes
     0.04s  0.24% 76.12%      0.38s  2.27%  google::protobuf::TextFormat::Printer::Print
     0.04s  0.24% 76.36%      0.78s  4.66%  google::protobuf::internal::ParseContext::ParseMessage
     0.03s  0.18% 76.54%      0.16s  0.96%  envoy::config::route::v3::RetryPolicy::~RetryPolicy
     0.03s  0.18% 76.72%      0.22s  1.31%  envoy::config::route::v3::Route::SharedDtor
     0.03s  0.18% 76.90%      0.13s  0.78%  evmap_io_active_
     0.03s  0.18% 77.07%     16.41s 97.97%  std::__1::__function::__value_func::operator() (inline)
     0.03s  0.18% 77.25%      0.21s  1.25%  std::__1::__vector_base::__destruct_at_end (inline)
     0.03s  0.18% 77.43%      0.22s  1.31%  ~__vector_base (inline)
     0.02s  0.12% 77.55%      8.07s 48.18%  Envoy::Config::GrpcSubscriptionImpl::onConfigUpdate
     0.02s  0.12% 77.67%      2.49s 14.87%  Envoy::Event::DispatcherImpl::post
     0.02s  0.12% 77.79%      2.39s 14.27%  Envoy::Event::TimerImpl::internalEnableTimer (inline)
     0.02s  0.12% 77.91%      0.16s  0.96%  envoy::config::route::v3::RetryPolicy::MergeFrom
     0.02s  0.12% 78.03%      0.09s  0.54%  google::protobuf::Map::~Map
     0.02s  0.12% 78.15%      0.16s  0.96%  google::protobuf::Reflection::MutableMessage
     0.02s  0.12% 78.27%      0.35s  2.09%  google::protobuf::internal::WireFormat::FieldByteSize
     0.02s  0.12% 78.39%      3.35s 20.00%  std::__1::__function::__func::operator()
     0.02s  0.12% 78.51%      2.55s 15.22%  std::__1::basic_regex::__match_at_start (inline)
     0.01s  0.06% 78.57%      3.34s 19.94%  Envoy::Event::DispatcherImpl::runPostCallbacks
     0.01s  0.06% 78.63%      0.11s  0.66%  Envoy::Router::RetryPolicyImpl::RetryPolicyImpl
     0.01s  0.06% 78.69%      0.11s  0.66%  Envoy::Stats::AbslHashValue
     0.01s  0.06% 78.75%      0.14s  0.84%  Envoy::Stats::StatName::dataSize
     0.01s  0.06% 78.81%      2.60s 15.52%  Envoy::ThreadLocal::InstanceImpl::runOnAllThreads
     0.01s  0.06% 78.87%      0.28s  1.67%  absl::hash_internal::HashSelect::HashValueProbe::Invoke (inline)
     0.01s  0.06% 78.93%      0.26s  1.55%  envoy::config::route::v3::Route::~Route
     0.01s  0.06% 78.99%      0.14s  0.84%  envoy::config::route::v3::RouteConfiguration::_InternalParse
     0.01s  0.06% 79.04%      0.09s  0.54%  envoy::config::route::v3::RouteMatch::~RouteMatch
     0.01s  0.06% 79.10%      0.15s   0.9%  envoy::config::route::v3::VirtualHost::Clear
     0.01s  0.06% 79.16%      0.28s  1.67%  envoy::config::route::v3::VirtualHost::MergeFrom
     0.01s  0.06% 79.22%      0.73s  4.36%  google::protobuf::Message::_InternalParse
     0.01s  0.06% 79.28%      0.69s  4.12%  google::protobuf::ParseLenDelim
     0.01s  0.06% 79.34%      0.37s  2.21%  google::protobuf::TextFormat::Printer::PrintField
     0.01s  0.06% 79.40%      0.14s  0.84%  google::protobuf::UnknownFieldSet::ClearFallback
     0.01s  0.06% 79.46%      0.22s  1.31%  google::protobuf::internal::InternalMetadataWithArenaBase::~InternalMetadataWithArenaBase
     0.01s  0.06% 79.52%      0.36s  2.15%  google::protobuf::internal::WireFormat::InternalSerializeWithCachedSizesToArray
     0.01s  0.06% 79.58%      6.32s 37.73%  std::__1::__invoke (inline)
     0.01s  0.06% 79.64%      0.47s  2.81%  std::__1::allocator::construct (inline)
     0.01s  0.06% 79.70%      2.93s 17.49%  std::__1::regex_search
     0.01s  0.06% 79.76%      0.51s  3.04%  std::__1::vector::resize (inline)
     0.01s  0.06% 79.82%      0.43s  2.57%  tcmalloc::ThreadCache::FetchFromCentralCache
     0.01s  0.06% 79.88%      0.09s  0.54%  ~__value_func (inline)
         0     0% 79.88%      0.50s  2.99%  Envoy::Config::(anonymous namespace)::annotateWithOriginalType (inline)
         0     0% 79.88%      0.11s  0.66%  Envoy::Config::(anonymous namespace)::annotateWithOriginalType(google::protobuf::Descriptor const&, google::protobuf::Message&)::TypeAnnotatingProtoVisitor::onMessage
         0     0% 79.88%      0.74s  4.42%  Envoy::Config::(anonymous namespace)::createForDescriptorWithCast
         0     0% 79.88%      1.46s  8.72%  Envoy::Config::(anonymous namespace)::wireCast

Cum%:

(pprof) top 100
Showing nodes accounting for 6.64s, 39.64% of 16.75s total
Dropped 440 nodes (cum <= 0.08s)
Showing top 100 nodes out of 261
      flat  flat%   sum%        cum   cum%
     0.13s  0.78%  0.78%     16.50s 98.51%  event_base_loop
         0     0%  0.78%     16.41s 97.97%  operator() (inline)
     0.03s  0.18%  0.96%     16.41s 97.97%  std::__1::__function::__value_func::operator() (inline)
         0     0%  0.96%     16.41s 97.97%  std::__1::function::operator() (inline)
     0.04s  0.24%  1.19%     13.76s 82.15%  event_process_active_single_queue
         0     0%  1.19%     12.85s 76.72%  Envoy::MainCommon::run (inline)
         0     0%  1.19%     12.85s 76.72%  Envoy::MainCommonBase::run
         0     0%  1.19%     12.85s 76.72%  Envoy::Server::InstanceImpl::run
         0     0%  1.19%     12.85s 76.72%  __libc_start_main
         0     0%  1.19%     12.85s 76.72%  main
         0     0%  1.19%      9.70s 57.91%  Envoy::Event::FileEventImpl::assignEvents(unsigned int, event_base*)::$_0::__invoke
         0     0%  1.19%      9.67s 57.73%  Envoy::Network::ConnectionImpl::onFileEvent
         0     0%  1.19%      9.64s 57.55%  Envoy::Network::ConnectionImpl::onReadReady
         0     0%  1.19%      9.63s 57.49%  Envoy::Http::CodecClient::CodecReadFilter::onData
         0     0%  1.19%      9.63s 57.49%  Envoy::Http::CodecClient::onData
         0     0%  1.19%      9.63s 57.49%  Envoy::Network::ConnectionImpl::onRead (inline)
         0     0%  1.19%      9.63s 57.49%  Envoy::Network::FilterManagerImpl::onContinueReading
         0     0%  1.19%      9.62s 57.43%  Envoy::Http::ResponseDecoderWrapper::decodeData
         0     0%  1.19%      9.62s 57.43%  Envoy::Router::UpstreamRequest::decodeData
         0     0%  1.19%      9.61s 57.37%  Envoy::Grpc::AsyncStreamCallbacks::onReceiveMessageRaw
         0     0%  1.19%      9.61s 57.37%  Envoy::Grpc::AsyncStreamImpl::onData
         0     0%  1.19%      9.61s 57.37%  Envoy::Http::AsyncStreamImpl::encodeData
         0     0%  1.19%      9.61s 57.37%  Envoy::Http::Http2::ConnectionImpl::dispatch
         0     0%  1.19%      9.61s 57.37%  Envoy::Http::Http2::ConnectionImpl::onFrameReceived
         0     0%  1.19%      9.61s 57.37%  nghttp2_session_mem_recv
         0     0%  1.19%      9.61s 57.37%  nghttp2_session_on_data_received
         0     0%  1.19%      9.60s 57.31%  Envoy::Config::GrpcMuxImpl::onDiscoveryResponse
     0.02s  0.12%  1.31%      8.07s 48.18%  Envoy::Config::GrpcSubscriptionImpl::onConfigUpdate
         0     0%  1.31%      6.34s 37.85%  Envoy::Router::RdsRouteConfigSubscription::onConfigUpdate
     0.01s  0.06%  1.37%      6.32s 37.73%  std::__1::__invoke (inline)
         0     0%  1.37%      6.32s 37.73%  std::__1::__invoke_void_return_wrapper::__call (partial-inline)
         0     0%  1.37%      3.70s 22.09%  Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function)::$_0::__invoke
         0     0%  1.37%      3.70s 22.09%  start_thread
         0     0%  1.37%      3.66s 21.85%  Envoy::Server::WorkerImpl::threadRoutine
         0     0%  1.37%      3.62s 21.61%  Envoy::Router::ConfigImpl::ConfigImpl
         0     0%  1.37%      3.62s 21.61%  Envoy::Router::RouteMatcher::RouteMatcher
         0     0%  1.37%      3.62s 21.61%  std::__1::make_unique (inline)
         0     0%  1.37%      3.59s 21.43%  Envoy::Router::VirtualHostImpl::VirtualHostImpl
     0.02s  0.12%  1.49%      3.35s 20.00%  std::__1::__function::__func::operator()
     0.01s  0.06%  1.55%      3.34s 19.94%  Envoy::Event::DispatcherImpl::runPostCallbacks
         0     0%  1.55%      3.33s 19.88%  std::__1::__function::__alloc_func::operator() (inline)
         0     0%  1.55%      3.30s 19.70%  Envoy::Router::VirtualHostImpl::CatchAllVirtualCluster::CatchAllVirtualCluster
         0     0%  1.55%      3.28s 19.58%  Envoy::Router::VirtualCluster::generateStats
         0     0%  1.55%      3.28s 19.58%  Envoy::Stats::Scope::counterFromStatName (inline)
         0     0%  1.55%      3.28s 19.58%  Envoy::Stats::ThreadLocalStoreImpl::ScopeImpl::counterFromStatNameWithTags
         0     0%  1.55%      3.28s 19.58%  Envoy::Stats::ThreadLocalStoreImpl::ScopeImpl::counterFromString
         0     0%  1.55%      3.28s 19.58%  VirtualClusterBase (inline)
         0     0%  1.55%      3.23s 19.28%  Envoy::Stats::ThreadLocalStoreImpl::ScopeImpl::safeMakeStat
         0     0%  1.55%      3.04s 18.15%  Envoy::Stats::StatNameTagHelper::StatNameTagHelper
         0     0%  1.55%      2.99s 17.85%  Envoy::Stats::FakeSymbolTableImpl::callWithStringView
         0     0%  1.55%      2.98s 17.79%  Envoy::Stats::TagProducerImpl::produceTags
         0     0%  1.55%      2.96s 17.67%  Envoy::Stats::TagProducerImpl::forEachExtractorMatching
         0     0%  1.55%      2.95s 17.61%  Envoy::MessageUtil::anyConvert (inline)
         0     0%  1.55%      2.95s 17.61%  Envoy::Stats::TagExtractorImpl::extractTag
         0     0%  1.55%      2.94s 17.55%  Envoy::MessageUtil::unpackTo
     0.01s  0.06%  1.61%      2.93s 17.49%  std::__1::regex_search
     0.06s  0.36%  1.97%      2.91s 17.37%  std::__1::basic_regex::__search
         0     0%  1.97%      2.60s 15.52%  Envoy::ThreadLocal::InstanceImpl::Bookkeeper::runOnAllThreads
         0     0%  1.97%      2.60s 15.52%  Envoy::ThreadLocal::InstanceImpl::SlotImpl::runOnAllThreads
     0.01s  0.06%  2.03%      2.60s 15.52%  Envoy::ThreadLocal::InstanceImpl::runOnAllThreads
         0     0%  2.03%      2.59s 15.46%  Envoy::Stats::ThreadLocalStoreImpl::clearScopeFromCaches
     0.02s  0.12%  2.15%      2.55s 15.22%  std::__1::basic_regex::__match_at_start (inline)
     0.02s  0.12%  2.27%      2.49s 14.87%  Envoy::Event::DispatcherImpl::post
         0     0%  2.27%      2.39s 14.27%  Envoy::Event::TimerImpl::enableTimer
     0.02s  0.12%  2.39%      2.39s 14.27%  Envoy::Event::TimerImpl::internalEnableTimer (inline)
     0.06s  0.36%  2.75%      2.39s 14.27%  event_callback_activate_nolock_
         0     0%  2.75%      2.35s 14.03%  Envoy::MessageUtil::anyConvertAndValidate (inline)
         0     0%  2.75%      2.34s 13.97%  event_active
     2.22s 13.25% 16.00%      2.22s 13.25%  __write_nocancel
     2.15s 12.84% 28.84%      2.15s 12.84%  inotify_init
         0     0% 28.84%      1.88s 11.22%  Envoy::Router::RdsRouteConfigProviderImpl::validateConfig
         0     0% 28.84%      1.79s 10.69%  Envoy::Router::RdsRouteConfigProviderImpl::onConfigUpdate
     0.44s  2.63% 31.46%      1.79s 10.69%  std::__1::basic_regex::__match_at_start_ecma
         0     0% 31.46%      1.51s  9.01%  Envoy::Config::GrpcSubscriptionImpl::resourceName (inline)
         0     0% 31.46%      1.51s  9.01%  Envoy::Upstream::CdsApiImpl::onConfigUpdate
         0     0% 31.46%      1.51s  9.01%  non-virtual thunk to Envoy::Config::GrpcSubscriptionImpl::resourceName(google::protobuf::Any const&)
         0     0% 31.46%      1.46s  8.72%  Envoy::Config::(anonymous namespace)::wireCast
         0     0% 31.46%      1.45s  8.66%  Envoy::MessageUtil::validate
         0     0% 31.46%      1.44s  8.60%  google::protobuf::MessageLite::ParseFrom
         0     0% 31.46%      1.41s  8.42%  Envoy::Config::VersionConverter::upgrade
         0     0% 31.46%      1.41s  8.42%  Envoy::MessageUtil::checkForUnexpectedFields
         0     0% 31.46%      0.94s  5.61%  Envoy::Router::RouteConfigUpdateReceiverImpl::onRdsUpdate
         0     0% 31.46%      0.90s  5.37%  Envoy::Config::VersionConverter::recoverOriginal
         0     0% 31.46%      0.87s  5.19%  Envoy::Router::RdsRouteConfigSubscription::resourceName
         0     0% 31.46%      0.87s  5.19%  google::protobuf::MessageLite::AppendPartialToString
         0     0% 31.46%      0.84s  5.01%  google::protobuf::MessageLite::SerializeAsString
     0.80s  4.78% 36.24%      0.80s  4.78%  tc_newarray
     0.04s  0.24% 36.48%      0.78s  4.66%  google::protobuf::internal::ParseContext::ParseMessage
         0     0% 36.48%      0.74s  4.42%  Envoy::Config::(anonymous namespace)::createForDescriptorWithCast
     0.01s  0.06% 36.54%      0.73s  4.36%  google::protobuf::Message::_InternalParse
         0     0% 36.54%      0.73s  4.36%  google::protobuf::internal::WireFormatParser
     0.01s  0.06% 36.60%      0.69s  4.12%  google::protobuf::ParseLenDelim
     0.04s  0.24% 36.84%      0.67s  4.00%  Envoy::ProtobufMessage::traverseMutableMessage
     0.13s  0.78% 37.61%      0.65s  3.88%  google::protobuf::Reflection::ListFields
     0.34s  2.03% 39.64%      0.63s  3.76%  google::protobuf::Reflection::HasBit
         0     0% 39.64%      0.58s  3.46%  Envoy::Upstream::CdsApiImpl::resourceName (inline)
         0     0% 39.64%      0.58s  3.46%  non-virtual thunk to Envoy::Upstream::CdsApiImpl::resourceName(google::protobuf::Any const&)
         0     0% 39.64%      0.58s  3.46%  std::__1::default_delete::operator() (inline)
         0     0% 39.64%      0.58s  3.46%  std::__1::unique_ptr::reset (inline)
         0     0% 39.64%      0.57s  3.40%  ~unique_ptr (inline)

@lizan
Copy link
Member

lizan commented Jul 16, 2020

macOS has different mangling rule of symbols so if the pprof is from Linux it might not demangle well.

Aside from that, looking at the pprof, it seems the VersionConverter is the source of CPU usage. I believe it's because your control plane is sending Envoy v2 config instead of v3, so Envoy will convert it to v3 internally. The v3 API took in place since 1.13. If your control plane send v3 protos, I think it will be down closer to the original CPU usage.

cc @htuch

@htuch
Copy link
Member

htuch commented Jul 16, 2020

@lizan yes, that's right to the extent that is the source of the extra CPU usage. This is tracked in #10875. Since 1.14 I think we've reduced the overhead from 3.6x down to 2x for v2 ingest vs. v3. So, you might see a real improvement by moving to HEAD @ master. Otherwise, I suggest moving to v3.

@fxposter
Copy link
Author

Should it be v3 "all-in", ie: if I just start returning v3 resources while keeping all envoy configs the same (we don't specify it and hence use default AUTO resource version) - it will still be subject to extra CPU usage? ie: in this case control plane will receive request for V2 resource, it will respond with v3 one, envoy will parse it as v3 and convert to v2?

@fxposter
Copy link
Author

cause by just switching to sending v3 resources does not seem like it's helping with CPU usage

@htuch
Copy link
Member

htuch commented Jul 16, 2020

If you send all your resources as v3, there should be no conversion happening in Envoy at all; you should not see VersionConverter in the CPU profile. You need to set V3 in the resource version explicitly to make this happen.

@fxposter
Copy link
Author

I mean - I changed my control plane (based on go-control-plane) to still using cache/v2 and server/v2 from the library and register API from discovery/v2 (we use ADS). According to docs - I can do that and envoy will accept v3 resources from v2 APIs. But when I do config_dump afterwards - I see that under, for example, type.googleapis.com/envoy.admin.v3.ClustersConfigDump there are type.googleapis.com/envoy.api.v2.Cluster resources still, while I think I send v3 resources already

@htuch
Copy link
Member

htuch commented Jul 16, 2020

Envoy can accept v2 APIs and convert internally to v3. The point above is that if you want performance, you need to put v3 resource on the wire at the control plane, so Envoy doesn't need to do the work of converting.

@fxposter
Copy link
Author

Yes, I got that. My question is different. If I send v3 resources on the wire (ie: actual v3 resources are marshaled on the control plane side), but lds/cds/rds references in envoy's config will still be auto (ie: envoy will ask v2 resources from control plane, but will receive v3 resources back on the wire) - is that enough or not to not do conversion all the time?
Cause that's what I did and it seems like envoy treats them as v2, though go-control-plane sends v3.

@htuch
Copy link
Member

htuch commented Jul 16, 2020

I think you probably need to change resource versions to v3, otherwise I don't see how it works given #10776

@fxposter
Copy link
Author

image

this one is for purely v3, cpu graph still has spikes

Now I suspect this part :)
image

Will generate graph for v1.12 with v2 (which was flat in terms of CPU usage) and compare

@htuch
Copy link
Member

htuch commented Jul 16, 2020

CC @jmarantz for stats overheads.

@jmarantz
Copy link
Contributor

Thanks for the heads up. Looks like RDS is re-creating a lot of stats on every update?

I suspect this code could be optimized a bit:

  for (const auto& virtual_cluster : virtual_host.virtual_clusters()) {
    virtual_clusters_.push_back(
        VirtualClusterEntry(virtual_cluster, stat_name_pool_, *vcluster_scope_));
  }

in source/common/router/config_impl.cc function VirtualHostImpl::VirtualHostImpl. I think it's re-creating the same exact block of stats for every virtual cluster in every virtual host.

Not sure of the cardinality of this in general, or for @fxposter's config in particular. Does anyone have more context around this code path?

@fxposter
Copy link
Author

Our config may be quite different from others. yes - we have infrequent (once a couple of minutes) updates of clusters and each cluster have separate path in same domain in routeconfiguration (or our control plane is a bit broken at this point, this is separate cluster where I was playing with new envoy version - changing envoy configs/versions/control plane code). but with the same control plane envoy 1.12 uses less cpu and is more stable in terms of CPU usage.

@jmarantz
Copy link
Contributor

Oh, and the last flame graph looks like regexes in particular, but I don't think there's been a change in that area of stats recently, and that may be a side effect of wasteful re-creation of the same stats. I would have thought we'd have skipped tag extraction if a stat of the same name was already known.

@fxposter
Copy link
Author

(pprof) top 100
Showing nodes accounting for 7.87s, 78.00% of 10.09s total
Dropped 394 nodes (cum <= 0.05s)
Showing top 100 nodes out of 197
      flat  flat%   sum%        cum   cum%
     2.01s 19.92% 19.92%      2.01s 19.92%  inotify_init
     1.76s 17.44% 37.36%      1.76s 17.44%  __write_nocancel
     0.54s  5.35% 42.72%      0.54s  5.35%  __read_nocancel
     0.38s  3.77% 46.48%      1.45s 14.37%  std::__1::basic_regex::__match_at_start_ecma
     0.37s  3.67% 50.15%      0.37s  3.67%  operator delete
     0.34s  3.37% 53.52%      0.34s  3.37%  tc_newarray
     0.23s  2.28% 55.80%      0.23s  2.28%  Envoy::Stats::SymbolTableImpl::Encoding::decodeNumber (inline)
     0.16s  1.59% 57.38%      0.16s  1.59%  __nss_passwd_lookup
     0.15s  1.49% 58.87%      0.15s  1.49%  absl::hash_internal::CityHash64
     0.15s  1.49% 60.36%      0.26s  2.58%  std::__1::vector::__append (partial-inline)
     0.11s  1.09% 61.45%      0.11s  1.09%  absl::Mutex::Unlock
     0.11s  1.09% 62.54%      0.14s  1.39%  tcmalloc::ThreadCache::ReleaseToCentralCache
     0.10s  0.99% 63.53%      0.19s  1.88%  __state (inline)
     0.09s  0.89% 64.42%      9.86s 97.72%  event_base_loop
     0.09s  0.89% 65.31%      0.11s  1.09%  std::__1::__function::__func::__clone
     0.08s  0.79% 66.11%      0.08s  0.79%  __GI___pthread_mutex_lock
     0.08s  0.79% 66.90%      0.08s  0.79%  absl::Mutex::Lock
     0.08s  0.79% 67.69%      0.10s  0.99%  std::__1::__state::__state
     0.07s  0.69% 68.38%      0.07s  0.69%  RefcountPtr (inline)
     0.07s  0.69% 69.08%      0.07s  0.69%  std::__1::vector::size (inline)
     0.07s  0.69% 69.77%      0.13s  1.29%  vector (inline)
     0.06s  0.59% 70.37%      0.06s  0.59%  __pthread_enable_asynccancel
     0.06s  0.59% 70.96%      0.17s  1.68%  std::__1::vector::assign
     0.05s   0.5% 71.46%      0.06s  0.59%  Envoy::Stats::StatsSharedImpl::incRefCount
     0.05s   0.5% 71.95%      0.17s  1.68%  std::__1::__shared_count::__release_shared (inline)
     0.05s   0.5% 72.45%      0.25s  2.48%  std::__1::vector::__push_back_slow_path
     0.05s   0.5% 72.94%      0.16s  1.59%  ~__vector_base (inline)
     0.04s   0.4% 73.34%      0.06s  0.59%  epoll_dispatch
     0.04s   0.4% 73.74%      7.53s 74.63%  event_process_active_single_queue
     0.03s   0.3% 74.03%      0.10s  0.99%  Envoy::(anonymous namespace)::UnexpectedFieldProtoVisitor::onField
     0.03s   0.3% 74.33%      0.07s  0.69%  Envoy::Stats::RefcountPtr::resetInternal (inline)
     0.03s   0.3% 74.63%      0.06s  0.59%  envoy::config::route::v3::RetryPolicy::MergeFrom
     0.03s   0.3% 74.93%      0.39s  3.87%  std::__1::allocator::construct (inline)
     0.02s   0.2% 75.12%      0.09s  0.89%  Envoy::Stats::ThreadLocalStoreImpl::gauges
     0.02s   0.2% 75.32%      0.06s  0.59%  absl::container_internal::raw_hash_set::EqualElement::operator() (inline)
     0.02s   0.2% 75.52%      0.12s  1.19%  envoy::config::route::v3::VirtualHost::~VirtualHost
     0.02s   0.2% 75.72%      1.84s 18.24%  event_callback_activate_nolock_
     0.02s   0.2% 75.92%      0.07s  0.69%  std::__1::__split_buffer::__construct_at_end (inline)
     0.02s   0.2% 76.11%      2.10s 20.81%  std::__1::basic_regex::__search
     0.02s   0.2% 76.31%      0.07s  0.69%  tcmalloc::CentralFreeList::RemoveRange
     0.01s 0.099% 76.41%      1.87s 18.53%  Envoy::Event::DispatcherImpl::post
     0.01s 0.099% 76.51%      3.71s 36.77%  Envoy::Network::ConnectionImpl::onFileEvent
     0.01s 0.099% 76.61%      0.08s  0.79%  Envoy::Router::RetryPolicyImpl::RetryPolicyImpl
     0.01s 0.099% 76.71%      0.13s  1.29%  Envoy::Router::RouteEntryImplBase::RouteEntryImplBase
     0.01s 0.099% 76.81%      0.11s  1.09%  Envoy::Stats::AbslHashValue
     0.01s 0.099% 76.91%      0.24s  2.38%  Envoy::Upstream::CdsApiImpl::onConfigUpdate
     0.01s 0.099% 77.01%      0.08s  0.79%  absl::container_internal::raw_hash_set::find
     0.01s 0.099% 77.11%      0.39s  3.87%  absl::container_internal::raw_hash_set::find_or_prepare_insert
     0.01s 0.099% 77.21%      0.10s  0.99%  envoy::config::route::v3::RouteAction::MergeFrom
     0.01s 0.099% 77.30%      0.11s  1.09%  google::protobuf::internal::ParseContext::ParseMessage
     0.01s 0.099% 77.40%      9.77s 96.83%  std::__1::__function::__value_func::operator() (inline)
     0.01s 0.099% 77.50%      1.88s 18.63%  std::__1::basic_regex::__match_at_start (inline)
     0.01s 0.099% 77.60%      0.13s  1.29%  std::__1::unique_ptr::reset (inline)
     0.01s 0.099% 77.70%      0.45s  4.46%  std::__1::vector::push_back (inline)
     0.01s 0.099% 77.80%      0.08s  0.79%  tcmalloc::ThreadCache::FetchFromCentralCache
     0.01s 0.099% 77.90%      0.11s  1.09%  ~__value_func (inline)
     0.01s 0.099% 78.00%      0.18s  1.78%  ~shared_ptr (inline)
         0     0% 78.00%      0.07s  0.69%  <unknown>
         0     0% 78.00%      3.58s 35.48%  Envoy::Config::GrpcMuxImpl::onDiscoveryResponse
         0     0% 78.00%      3.49s 34.59%  Envoy::Config::GrpcSubscriptionImpl::onConfigUpdate
         0     0% 78.00%      0.06s  0.59%  Envoy::Config::GrpcSubscriptionImpl::resourceName (inline)
         0     0% 78.00%      2.91s 28.84%  Envoy::Event::DispatcherImpl::runPostCallbacks
         0     0% 78.00%      3.75s 37.17%  Envoy::Event::FileEventImpl::assignEvents(unsigned int, event_base*)::$_0::__invoke
         0     0% 78.00%      1.85s 18.33%  Envoy::Event::TimerImpl::enableTimer
         0     0% 78.00%      1.85s 18.33%  Envoy::Event::TimerImpl::internalEnableTimer (inline)
         0     0% 78.00%      3.59s 35.58%  Envoy::Grpc::AsyncStreamCallbacks::onReceiveMessageRaw
         0     0% 78.00%      3.59s 35.58%  Envoy::Grpc::AsyncStreamImpl::onData
         0     0% 78.00%      3.59s 35.58%  Envoy::Http::AsyncStreamImpl::encodeData
         0     0% 78.00%      3.60s 35.68%  Envoy::Http::CodecClient::CodecReadFilter::onData
         0     0% 78.00%      3.60s 35.68%  Envoy::Http::CodecClient::onData
         0     0% 78.00%      3.59s 35.58%  Envoy::Http::Http2::ConnectionImpl::dispatch
         0     0% 78.00%      3.59s 35.58%  Envoy::Http::Http2::ConnectionImpl::onFrameReceived
         0     0% 78.00%      3.60s 35.68%  Envoy::Http::ResponseDecoderWrapper::decodeData
         0     0% 78.00%      6.41s 63.53%  Envoy::MainCommon::run
         0     0% 78.00%      6.41s 63.53%  Envoy::MainCommonBase::run
         0     0% 78.00%      0.12s  1.19%  Envoy::MessageUtil::anyConvert (inline)
         0     0% 78.00%      0.35s  3.47%  Envoy::MessageUtil::anyConvertAndValidate (inline)
         0     0% 78.00%      0.26s  2.58%  Envoy::MessageUtil::checkForUnexpectedFields
         0     0% 78.00%      0.17s  1.68%  Envoy::MessageUtil::hash
         0     0% 78.00%      0.12s  1.19%  Envoy::MessageUtil::unpackTo
         0     0% 78.00%      0.28s  2.78%  Envoy::MessageUtil::validate
         0     0% 78.00%      3.64s 36.08%  Envoy::Network::ConnectionImpl::onRead (inline)
         0     0% 78.00%      3.66s 36.27%  Envoy::Network::ConnectionImpl::onReadReady
         0     0% 78.00%      3.64s 36.08%  Envoy::Network::FilterManagerImpl::onContinueReading
         0     0% 78.00%      0.20s  1.98%  Envoy::ProtobufMessage::traverseMessage
         0     0% 78.00%      2.57s 25.47%  Envoy::Router::ConfigImpl::ConfigImpl
         0     0% 78.00%      0.09s  0.89%  Envoy::Router::ConfigImpl::~ConfigImpl
         0     0% 78.00%      0.15s  1.49%  Envoy::Router::PrefixRouteEntryImpl::PrefixRouteEntryImpl
         0     0% 78.00%      1.28s 12.69%  Envoy::Router::RdsRouteConfigProviderImpl::onConfigUpdate
         0     0% 78.00%      1.34s 13.28%  Envoy::Router::RdsRouteConfigProviderImpl::validateConfig
         0     0% 78.00%      3.22s 31.91%  Envoy::Router::RdsRouteConfigSubscription::onConfigUpdate
         0     0% 78.00%      0.06s  0.59%  Envoy::Router::RdsRouteConfigSubscription::resourceName
         0     0% 78.00%      0.10s  0.99%  Envoy::Router::RouteConfigUpdateReceiverImpl::initializeRdsVhosts (inline)
         0     0% 78.00%      0.32s  3.17%  Envoy::Router::RouteConfigUpdateReceiverImpl::onRdsUpdate
         0     0% 78.00%      0.08s  0.79%  Envoy::Router::RouteEntryImplBase::buildRetryPolicy (inline)
         0     0% 78.00%      2.56s 25.37%  Envoy::Router::RouteMatcher::RouteMatcher
         0     0% 78.00%      0.09s  0.89%  Envoy::Router::RouteMatcher::~RouteMatcher
         0     0% 78.00%      3.60s 35.68%  Envoy::Router::UpstreamRequest::decodeData
         0     0% 78.00%      2.32s 22.99%  Envoy::Router::VirtualCluster::generateStats
         0     0% 78.00%      2.32s 22.99%  Envoy::Router::VirtualHostImpl::CatchAllVirtualCluster::CatchAllVirtualCluster
(pprof) sort = cum
(pprof) top 150
Showing nodes accounting for 7.17s, 71.06% of 10.09s total
Dropped 394 nodes (cum <= 0.05s)
Showing top 150 nodes out of 197
      flat  flat%   sum%        cum   cum%
     0.09s  0.89%  0.89%      9.86s 97.72%  event_base_loop
         0     0%  0.89%      9.77s 96.83%  operator() (inline)
     0.01s 0.099%  0.99%      9.77s 96.83%  std::__1::__function::__value_func::operator() (inline)
         0     0%  0.99%      9.77s 96.83%  std::__1::function::operator() (inline)
     0.04s   0.4%  1.39%      7.53s 74.63%  event_process_active_single_queue
         0     0%  1.39%      6.41s 63.53%  Envoy::MainCommon::run (inline)
         0     0%  1.39%      6.41s 63.53%  Envoy::MainCommonBase::run
         0     0%  1.39%      6.41s 63.53%  Envoy::Server::InstanceImpl::run
         0     0%  1.39%      6.41s 63.53%  __libc_start_main
         0     0%  1.39%      6.41s 63.53%  main
         0     0%  1.39%      5.05s 50.05%  std::__1::__invoke (inline)
         0     0%  1.39%      5.05s 50.05%  std::__1::__invoke_void_return_wrapper::__call (partial-inline)
         0     0%  1.39%      3.75s 37.17%  Envoy::Event::FileEventImpl::assignEvents(unsigned int, event_base*)::$_0::__invoke
     0.01s 0.099%  1.49%      3.71s 36.77%  Envoy::Network::ConnectionImpl::onFileEvent
         0     0%  1.49%      3.66s 36.27%  Envoy::Network::ConnectionImpl::onReadReady
         0     0%  1.49%      3.64s 36.08%  Envoy::Network::ConnectionImpl::onRead (inline)
         0     0%  1.49%      3.64s 36.08%  Envoy::Network::FilterManagerImpl::onContinueReading
         0     0%  1.49%      3.60s 35.68%  Envoy::Http::CodecClient::CodecReadFilter::onData
         0     0%  1.49%      3.60s 35.68%  Envoy::Http::CodecClient::onData
         0     0%  1.49%      3.60s 35.68%  Envoy::Http::ResponseDecoderWrapper::decodeData
         0     0%  1.49%      3.60s 35.68%  Envoy::Router::UpstreamRequest::decodeData
         0     0%  1.49%      3.59s 35.58%  Envoy::Grpc::AsyncStreamCallbacks::onReceiveMessageRaw
         0     0%  1.49%      3.59s 35.58%  Envoy::Grpc::AsyncStreamImpl::onData
         0     0%  1.49%      3.59s 35.58%  Envoy::Http::AsyncStreamImpl::encodeData
         0     0%  1.49%      3.59s 35.58%  Envoy::Http::Http2::ConnectionImpl::dispatch
         0     0%  1.49%      3.59s 35.58%  Envoy::Http::Http2::ConnectionImpl::onFrameReceived
         0     0%  1.49%      3.59s 35.58%  nghttp2_session_mem_recv
         0     0%  1.49%      3.59s 35.58%  nghttp2_session_on_data_received
         0     0%  1.49%      3.58s 35.48%  Envoy::Config::GrpcMuxImpl::onDiscoveryResponse
         0     0%  1.49%      3.49s 34.59%  Envoy::Config::GrpcSubscriptionImpl::onConfigUpdate
         0     0%  1.49%      3.47s 34.39%  Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function)::$_0::__invoke
         0     0%  1.49%      3.47s 34.39%  start_thread
         0     0%  1.49%      3.44s 34.09%  Envoy::Server::WorkerImpl::threadRoutine
         0     0%  1.49%      3.22s 31.91%  Envoy::Router::RdsRouteConfigSubscription::onConfigUpdate
         0     0%  1.49%      2.91s 28.84%  Envoy::Event::DispatcherImpl::runPostCallbacks
         0     0%  1.49%      2.87s 28.44%  std::__1::__function::__alloc_func::operator() (inline)
         0     0%  1.49%      2.87s 28.44%  std::__1::__function::__func::operator()
         0     0%  1.49%      2.58s 25.57%  std::__1::make_unique (inline)
         0     0%  1.49%      2.57s 25.47%  Envoy::Router::ConfigImpl::ConfigImpl
         0     0%  1.49%      2.56s 25.37%  Envoy::Router::RouteMatcher::RouteMatcher
         0     0%  1.49%      2.53s 25.07%  Envoy::Router::VirtualHostImpl::VirtualHostImpl
         0     0%  1.49%      2.34s 23.19%  Envoy::Stats::Scope::counterFromStatName (inline)
         0     0%  1.49%      2.34s 23.19%  Envoy::Stats::ThreadLocalStoreImpl::ScopeImpl::counterFromStatNameWithTags
         0     0%  1.49%      2.34s 23.19%  Envoy::Stats::ThreadLocalStoreImpl::ScopeImpl::counterFromString
         0     0%  1.49%      2.32s 22.99%  Envoy::Router::VirtualCluster::generateStats
         0     0%  1.49%      2.32s 22.99%  Envoy::Router::VirtualHostImpl::CatchAllVirtualCluster::CatchAllVirtualCluster
         0     0%  1.49%      2.32s 22.99%  VirtualClusterBase (inline)
         0     0%  1.49%      2.30s 22.79%  Envoy::Stats::ThreadLocalStoreImpl::ScopeImpl::safeMakeStat
         0     0%  1.49%      2.20s 21.80%  Envoy::Stats::StatNameTagHelper::StatNameTagHelper
         0     0%  1.49%      2.18s 21.61%  Envoy::Stats::FakeSymbolTableImpl::callWithStringView
         0     0%  1.49%      2.17s 21.51%  Envoy::Stats::TagProducerImpl::produceTags
         0     0%  1.49%      2.16s 21.41%  Envoy::Stats::TagExtractorImpl::extractTag
         0     0%  1.49%      2.16s 21.41%  Envoy::Stats::TagProducerImpl::forEachExtractorMatching
         0     0%  1.49%      2.12s 21.01%  std::__1::regex_search
     0.02s   0.2%  1.68%      2.10s 20.81%  std::__1::basic_regex::__search
         0     0%  1.68%      2.03s 20.12%  Envoy::Stats::ThreadLocalStoreImpl::clearScopeFromCaches
         0     0%  1.68%      2.02s 20.02%  Envoy::ThreadLocal::InstanceImpl::Bookkeeper::runOnAllThreads
         0     0%  1.68%      2.02s 20.02%  Envoy::ThreadLocal::InstanceImpl::SlotImpl::runOnAllThreads
         0     0%  1.68%      2.01s 19.92%  Envoy::ThreadLocal::InstanceImpl::runOnAllThreads
     2.01s 19.92% 21.61%      2.01s 19.92%  inotify_init
     0.01s 0.099% 21.70%      1.88s 18.63%  std::__1::basic_regex::__match_at_start (inline)
     0.01s 0.099% 21.80%      1.87s 18.53%  Envoy::Event::DispatcherImpl::post
         0     0% 21.80%      1.85s 18.33%  Envoy::Event::TimerImpl::enableTimer
         0     0% 21.80%      1.85s 18.33%  Envoy::Event::TimerImpl::internalEnableTimer (inline)
         0     0% 21.80%      1.84s 18.24%  event_active
     0.02s   0.2% 22.00%      1.84s 18.24%  event_callback_activate_nolock_
     1.76s 17.44% 39.44%      1.76s 17.44%  __write_nocancel
     0.38s  3.77% 43.21%      1.45s 14.37%  std::__1::basic_regex::__match_at_start_ecma
         0     0% 43.21%      1.34s 13.28%  Envoy::Router::RdsRouteConfigProviderImpl::validateConfig
         0     0% 43.21%      1.28s 12.69%  Envoy::Router::RdsRouteConfigProviderImpl::onConfigUpdate
         0     0% 43.21%      0.61s  6.05%  Envoy::Server::InstanceImpl::flushStatsInternal
         0     0% 43.21%      0.61s  6.05%  Envoy::Server::InstanceUtil::flushMetricsToSinks (inline)
         0     0% 43.21%      0.61s  6.05%  Envoy::Stats::ThreadLocalStoreImpl::mergeInternal
     0.54s  5.35% 48.56%      0.54s  5.35%  __read_nocancel
         0     0% 48.56%      0.51s  5.05%  Envoy::Server::MetricSnapshotImpl::MetricSnapshotImpl
     0.01s 0.099% 48.66%      0.45s  4.46%  std::__1::vector::push_back (inline)
         0     0% 48.66%      0.43s  4.26%  absl::container_internal::hash_policy_traits::apply (inline)
         0     0% 48.66%      0.42s  4.16%  absl::container_internal::DecomposeValue (inline)
         0     0% 48.66%      0.42s  4.16%  absl::container_internal::FlatHashSetPolicy::apply (inline)
         0     0% 48.66%      0.40s  3.96%  std::__1::allocator_traits::__construct (inline)
         0     0% 48.66%      0.40s  3.96%  std::__1::allocator_traits::construct (inline)
     0.01s 0.099% 48.76%      0.39s  3.87%  absl::container_internal::raw_hash_set::find_or_prepare_insert
     0.03s   0.3% 49.06%      0.39s  3.87%  std::__1::allocator::construct (inline)
         0     0% 49.06%      0.37s  3.67%  Envoy::Stats::ThreadLocalStoreImpl::counters
         0     0% 49.06%      0.37s  3.67%  absl::container_internal::raw_hash_set::EmplaceDecomposable::operator() (inline)
         0     0% 49.06%      0.37s  3.67%  absl::container_internal::raw_hash_set::emplace (inline)
         0     0% 49.06%      0.37s  3.67%  absl::container_internal::raw_hash_set::insert (inline)
     0.37s  3.67% 52.73%      0.37s  3.67%  operator delete
         0     0% 52.73%      0.35s  3.47%  Envoy::MessageUtil::anyConvertAndValidate (inline)
     0.34s  3.37% 56.10%      0.34s  3.37%  tc_newarray
         0     0% 56.10%      0.33s  3.27%  std::__1::vector::resize (inline)
         0     0% 56.10%      0.32s  3.17%  Envoy::Router::RouteConfigUpdateReceiverImpl::onRdsUpdate
         0     0% 56.10%      0.32s  3.17%  absl::container_internal::raw_hash_set::prepare_insert
         0     0% 56.10%      0.32s  3.17%  absl::hash_internal::CityHashState::hash (inline)
         0     0% 56.10%      0.32s  3.17%  absl::hash_internal::HashImpl::operator() (inline)
         0     0% 56.10%      0.32s  3.17%  absl::hash_internal::HashSelect::HashValueProbe::Invoke (inline)
         0     0% 56.10%      0.32s  3.17%  absl::hash_internal::HashStateBase::combine (inline)
         0     0% 56.10%      0.31s  3.07%  absl::container_internal::raw_hash_set::resize
         0     0% 56.10%      0.28s  2.78%  Envoy::MessageUtil::validate
         0     0% 56.10%      0.27s  2.68%  absl::container_internal::raw_hash_set::HashElement::operator() (inline)
         0     0% 56.10%      0.26s  2.58%  Envoy::MessageUtil::checkForUnexpectedFields
     0.15s  1.49% 57.58%      0.26s  2.58%  std::__1::vector::__append (partial-inline)
     0.05s   0.5% 58.08%      0.25s  2.48%  std::__1::vector::__push_back_slow_path
     0.01s 0.099% 58.18%      0.24s  2.38%  Envoy::Upstream::CdsApiImpl::onConfigUpdate
         0     0% 58.18%      0.23s  2.28%  Envoy::Stats::StatName::dataSize
     0.23s  2.28% 60.46%      0.23s  2.28%  Envoy::Stats::SymbolTableImpl::Encoding::decodeNumber (inline)
         0     0% 60.46%      0.20s  1.98%  Envoy::ProtobufMessage::traverseMessage
     0.10s  0.99% 61.45%      0.19s  1.88%  __state (inline)
         0     0% 61.45%      0.19s  1.88%  std::__1::allocator_traits::__destroy (inline)
         0     0% 61.45%      0.19s  1.88%  std::__1::allocator_traits::destroy (inline)
     0.01s 0.099% 61.55%      0.18s  1.78%  ~shared_ptr (inline)
         0     0% 61.55%      0.17s  1.68%  Envoy::MessageUtil::hash
         0     0% 61.55%      0.17s  1.68%  google::protobuf::TextFormat::Printer::Print
         0     0% 61.55%      0.17s  1.68%  google::protobuf::TextFormat::Printer::PrintToString
     0.05s   0.5% 62.04%      0.17s  1.68%  std::__1::__shared_count::__release_shared (inline)
         0     0% 62.04%      0.17s  1.68%  std::__1::__shared_weak_count::__release_shared (inline)
     0.06s  0.59% 62.64%      0.17s  1.68%  std::__1::vector::assign
     0.16s  1.59% 64.22%      0.16s  1.59%  __nss_passwd_lookup
     0.05s   0.5% 64.72%      0.16s  1.59%  ~__vector_base (inline)
         0     0% 64.72%      0.16s  1.59%  ~vector (inline)
         0     0% 64.72%      0.15s  1.49%  Envoy::Router::PrefixRouteEntryImpl::PrefixRouteEntryImpl
     0.15s  1.49% 66.20%      0.15s  1.49%  absl::hash_internal::CityHash64
         0     0% 66.20%      0.15s  1.49%  google::protobuf::TextFormat::Printer::PrintField
     0.11s  1.09% 67.29%      0.14s  1.39%  tcmalloc::ThreadCache::ReleaseToCentralCache
     0.01s 0.099% 67.39%      0.13s  1.29%  Envoy::Router::RouteEntryImplBase::RouteEntryImplBase
         0     0% 67.39%      0.13s  1.29%  std::__1::default_delete::operator() (inline)
     0.01s 0.099% 67.49%      0.13s  1.29%  std::__1::unique_ptr::reset (inline)
     0.07s  0.69% 68.19%      0.13s  1.29%  vector (inline)
         0     0% 68.19%      0.12s  1.19%  Envoy::MessageUtil::anyConvert (inline)
         0     0% 68.19%      0.12s  1.19%  Envoy::MessageUtil::unpackTo
     0.02s   0.2% 68.38%      0.12s  1.19%  envoy::config::route::v3::VirtualHost::~VirtualHost
         0     0% 68.38%      0.12s  1.19%  google::protobuf::MessageLite::ParseFrom
     0.01s 0.099% 68.48%      0.11s  1.09%  Envoy::Stats::AbslHashValue
     0.11s  1.09% 69.57%      0.11s  1.09%  absl::Mutex::Unlock
         0     0% 69.57%      0.11s  1.09%  envoy::config::route::v3::VirtualHost::MergeFrom
     0.01s 0.099% 69.67%      0.11s  1.09%  google::protobuf::internal::ParseContext::ParseMessage
     0.09s  0.89% 70.56%      0.11s  1.09%  std::__1::__function::__func::__clone
         0     0% 70.56%      0.11s  1.09%  std::__1::__vector_base::__destruct_at_end (inline)
         0     0% 70.56%      0.11s  1.09%  std::__1::__vector_base::clear (inline)
         0     0% 70.56%      0.11s  1.09%  std::__1::allocator::destroy (inline)
     0.01s 0.099% 70.66%      0.11s  1.09%  ~__value_func (inline)
         0     0% 70.66%      0.11s  1.09%  ~function (inline)
         0     0% 70.66%      0.11s  1.09%  ~unique_ptr (inline)
     0.03s   0.3% 70.96%      0.10s  0.99%  Envoy::(anonymous namespace)::UnexpectedFieldProtoVisitor::onField
         0     0% 70.96%      0.10s  0.99%  Envoy::Router::RouteConfigUpdateReceiverImpl::initializeRdsVhosts (inline)
         0     0% 70.96%      0.10s  0.99%  absl::hash_internal::AbslHashValue (inline)
         0     0% 70.96%      0.10s  0.99%  absl::hash_internal::HashStateBase::combine_contiguous (inline)
         0     0% 70.96%      0.10s  0.99%  absl::hash_internal::hash_range_or_bytes (inline)
     0.01s 0.099% 71.06%      0.10s  0.99%  envoy::config::route::v3::RouteAction::MergeFrom
         0     0% 71.06%      0.10s  0.99%  std::__1::__function::__alloc_func::destroy (inline)
(pprof)

If it's needed

@fxposter
Copy link
Author

This one is from 1.12, overall structure is the same, but sample time for 1.14 is 10seconds, while for 1.12 it is 3 seconds
image

CPU usage (blue and green are 1.14, red is 1.12)
image

@jmarantz
Copy link
Contributor

jmarantz commented Jul 16, 2020 via email

@fxposter
Copy link
Author

fxposter commented Jul 16, 2020

in this case - no, but sending same configs result in this behaviour. in this case I send v3 resources instead of v2 (it doesn't matter much - situation is the same in both cases), but content of resources is same.

ie: to eliminate the possibility that resource conversion (v2 to v3) in new envoy adds this cpu usage - I tried to "improve" situation with 1.14

@jmarantz
Copy link
Contributor

In my recent experiences, translating between API versions during xDS updates has been a significant bottleneck. So from a functional perspective you may be right but I am skeptical that it is a no-op for perf.

Having said that it does look like there's some potential perf improvements, having looked at the code. This will become actionable if we have a testcase to repro though, otherwise we kind of shooting in the dark.

@fxposter
Copy link
Author

After re-checking what is actually in rds/cds - I would say the pattern is to add and remove 2-5 clusters and corresponding routes each 2-10 minutes. 2 resources in RDS (egress/ingress) and path and domain-based mappings for each cluster (ie: for each cluster there is a separate domain and a separate mapping within some single domain).
Overall number of clusters is ~1500 (and ~3000 corresponding routes).

@stale
Copy link

stale bot commented Aug 16, 2020

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 16, 2020
@rene-m-hernandez
Copy link

May we please have the stale tag removed? I'm curious to know more about the increased CPU utilization that still occurred after the v3 upgrade.

Regarding the v3 upgrade, it would seem there's not much that can be done other than to upgrade as soon as possible. Was this called out in the v14 release notes or documentation? At quick glance, I don't see mention of it & this does seem to be a well-known "issue"

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Sep 2, 2020
@mattklein123
Copy link
Member

Regarding the v3 upgrade, it would seem there's not much that can be done other than to upgrade as soon as possible. Was this called out in the v14 release notes or documentation? At quick glance, I don't see mention of it & this does seem to be a well-known "issue"

Probably not, doc updates appreciated. It is a "known issue" at this point. And yes, I would upgrade to v3 as soon as you are able. I don't think there is anything else actionable on this issue unfortunately. @htuch has done a bunch of recent changes to improve v2 -> v3 upgrade perf fwiw.

@htuch
Copy link
Member

htuch commented Sep 2, 2020

Tagging for #10943 postmortem; I don't think we had clear evidence that performance was problematic at the time of the release, most discussion has taken place since then, but worth keeping in mind for the future. Some of this comes down to the fact that OSS Envoy has no control plane performance regression tests.

@oschaaf is helping to coordinate some of the work on data plane performance regression testing; Otto, do you know if there are any places to add regression tests for control plane on anyone's radar?

@fxposter
Copy link
Author

fxposter commented Sep 3, 2020

I will provide a sample control plane this week.

@oschaaf
Copy link
Member

oschaaf commented Sep 4, 2020

do you know if there are any places to add regression tests for control plane on anyone's radar?

There's a design doc here: https://docs.google.com/document/d/14Iz8j--Mvb06QFB8RurtYlwmy657YbAVfqDr-jKgtaQ/edit#heading=h.grkfe6onmtgv
Also, cross linking #11266

@fxposter
Copy link
Author

fxposter commented Sep 7, 2020

https://gist.github.com/fxposter/8887788a575090601ba6b106e80e3230

this is the go-control-plane that shows the effect of that. when changing only paths under some wildcard domain - it consumes less cpu, when constantly adding/removing domains in routes - all those metric generation starts to consume much more CPU. This was not the case for 1.12.

@fxposter
Copy link
Author

fxposter commented Sep 7, 2020

and looking at graphs above - it seems that quite a lot of time is spent in Envoy::Stats::TagProducerImpl::produceTags/etc.

@stale
Copy link

stale bot commented Oct 12, 2020

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Oct 12, 2020
@mattklein123 mattklein123 removed the stale stalebot believes this issue/PR has not been touched recently label Dec 9, 2020
@github-actions
Copy link

github-actions bot commented Jan 8, 2021

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Jan 8, 2021
@github-actions
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/perf question Questions that are neither investigations, bugs, nor enhancements stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

8 participants