Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory: switch to the latest tcmalloc for x86_64 builds #13251

Merged
merged 21 commits into from
Oct 7, 2020

Conversation

rojkov
Copy link
Member

@rojkov rojkov commented Sep 24, 2020

Commit Message: memory: switch to the latest tcmalloc for x86_64 builds
Additional Description: Switch to the new tcmalloc for linux x86_64 builds. All other builds still get the old tcmalloc from gperftools. Switching profiling or the debugalloc feature on will result in linking to the old tcmalloc too. The old tcmalloc can still be enabled for x86_64 builds with --define tcmalloc=gperftools.
Risk Level: High
Testing: run unit tests
Docs Changes: added a note about the new --define tcmalloc=gperftools option to bazel/README.md.
Release Notes: updated the "Minor Behavior Changes" section of version_hstory/current.rst
Fixes #10053

@repokitteh-read-only
Copy link

CC @envoyproxy/dependency-watchers: FYI only for changes made to (bazel/repository_locations\.bzl)|(api/bazel/repository_locations\.bzl)|(.*/requirements\.txt).

🐱

Caused by: #13251 was opened by rojkov.

see: more, trace.

Signed-off-by: Dmitry Rozhkov <[email protected]>
@lizan lizan self-requested a review September 24, 2020 16:20
@lizan lizan self-assigned this Sep 24, 2020
bazel/repository_locations.bzl Outdated Show resolved Hide resolved
source/common/memory/utils.cc Show resolved Hide resolved
Signed-off-by: Dmitry Rozhkov <[email protected]>
Signed-off-by: Dmitry Rozhkov <[email protected]>
@rojkov
Copy link
Member Author

rojkov commented Sep 25, 2020

Hm.. the way how generic.current_allocated_bytes is calculated in the new tcmalloc has changed a bit (per-cpu cache is substructed now). I need more tests to run -> converting to Draft for now.

@rojkov rojkov marked this pull request as draft September 25, 2020 15:03
@rojkov
Copy link
Member Author

rojkov commented Sep 29, 2020

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13251 (comment) was created by @rojkov.

see: more, trace.

@rojkov
Copy link
Member Author

rojkov commented Sep 30, 2020

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13251 (comment) was created by @rojkov.

see: more, trace.

rojkov added 7 commits October 1, 2020 18:05
When Envoy is linked with the new tcmalloc and compiled with gcc it
crashes upon assert in tcmalloc because a buffer slice is allocated
with `new []`, but deallocated with `delete`.

Signed-off-by: Dmitry Rozhkov <[email protected]>
Signed-off-by: Dmitry Rozhkov <[email protected]>
@rojkov
Copy link
Member Author

rojkov commented Oct 6, 2020

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13251 (comment) was created by @rojkov.

see: more, trace.

@rojkov
Copy link
Member Author

rojkov commented Oct 6, 2020

@jmarantz With the new tcmalloc the amount of consumed memory becomes non deterministic and from run to run I see different results of memory_test.consumedBytes() in the same test. Can we relax the tests a bit and avoid the strict equivalence checks?

@rojkov rojkov marked this pull request as ready for review October 6, 2020 08:06
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz With the new tcmalloc the amount of consumed memory becomes non deterministic and from run to run I see different results of memory_test.consumedBytes() in the same test. Can we relax the tests a bit and avoid the strict equivalence checks?

Are you sure that deterministic byte-counts are not still available via the API in some way? It looks like maybe you changed the memory-stats APIs to include unmapped bytes?

@@ -310,7 +311,7 @@ TEST_P(ClusterMemoryTestRunner, MemoryLargeClusterSizeWithFakeSymbolTable) {
// https://github.com/envoyproxy/envoy/issues/12209
// EXPECT_MEMORY_EQ(m_per_cluster, 44949);
}
EXPECT_MEMORY_LE(m_per_cluster, 47500); // Round up to allow platform variations.
EXPECT_MEMORY_LE(m_per_cluster, 47700); // Round up to allow platform variations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole test is now gone, once you merge, so you can just keep the edit for the real symbol-tables test below, and drop this one.

namespace Memory {

uint64_t Stats::totalCurrentlyAllocated() {
return tcmalloc::MallocExtension::GetNumericProperty("generic.current_allocated_bytes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe for deterministic memory checks we can just use generic.current_allocated_bytes and not include tcmalloc.cpu_free?

We can make 2 APIs if that helps.

I'm also confused by the github UI; it is not showing me the prior impl of this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part (from #if to #elif) is new for tcmalloc. The previous impl is below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe for deterministic memory checks we can just use generic.current_allocated_bytes and not include tcmalloc.cpu_free?

Right, my bad. It shouldn't be included. I'll drop it. Though after the last merge all tests passed for a6d4c42. But the version without tcmalloc.cpu_free passes the release tests too on my local host. Let's see...

I'm also confused by the github UI; it is not showing me the prior impl of this method.

The old version is still there, just wrapped in #if defined(GPERFTOOLS_TCMALLOC). It's needed for non-x86 builds.

Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Defer to @jmarantz for the deterministic memory checks and then merge.

@rojkov
Copy link
Member Author

rojkov commented Oct 7, 2020

Are you sure that deterministic byte-counts are not still available via the API in some way? It looks like maybe you changed the memory-stats APIs to include unmapped bytes?

I did include unmapped bytes to Stats::totalCurrentlyReserved() only. The reason is that the new tcmalloc doesn't include them now for generic.heap_size. See https://github.com/google/tcmalloc/blob/43cb87553037d858d819437e294bd6a4eb0210ae/tcmalloc/tcmalloc.cc#L877 vs https://github.com/gperftools/gperftools/blob/bda3c82e11615ca9e7751d1f3cfb161026ee742a/src/tcmalloc.cc#L717

// symbol_table_mem_used: 1726056 (3.9x) -- does not seem to depend on STL sizes.
EXPECT_MEMORY_LE(string_mem_used, 7759488);
EXPECT_MEMORY_LE(string_mem_used, 7775872);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI fails on gcc now with:

Expected: (string_mem_used) <= (7775872), actual: 7792256 vs 7775872

I'm guessing this may be related to varying STL versions for std::string across platforms than tcmalloc. I think we can just remove the exact check for string_mem_used. We can keep the test that symbol tables cut the usage by 2/3.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

@mattklein123 mattklein123 merged commit 4c0d2d2 into envoyproxy:master Oct 7, 2020
@rojkov rojkov deleted the tcmalloc branch October 8, 2020 06:02
@alyssawilk
Copy link
Contributor

This morning my default build was using clang+llvm-8.0.0 and build failed with "error: 'asm goto' constructs are not supported yet". I updated clang to 9 and I'm good, but based on that and an unsubstantiated google search (http://llvm.1065342.n5.nabble.com/llvm-dev-is-clang9-supporting-asm-goto-td128867.html) I wonder if we need to update the build requirements doc (https://www.envoyproxy.io/docs/envoy/latest/install/building#requirements)

@alyssawilk
Copy link
Contributor

for the record, I just restored clang-8 as default and verified it builds with --define tcmalloc=gperftools so ideally we can just update requirements doc to note folks should use that option with older versions of clang

@rojkov
Copy link
Member Author

rojkov commented Oct 8, 2020

for the record, I just restored clang-8 as default and verified it builds with --define tcmalloc=gperftools so ideally we can just update requirements doc to note folks should use that option with older versions of clang

Right. Thanks for letting know! I'll make a PR to mention it.

Basically for non-x86 builds the current requirement is still sufficient too. Those who build Envoy for embedded very often have to deal with quite outdated toolchains from BSPs they can get from board vendors.

@alyssawilk
Copy link
Contributor

I added some comments in #13438 as we're hoping to cut the release today or tomorrow.
LMK what you think?

@rojkov
Copy link
Member Author

rojkov commented Oct 8, 2020

Great you corrected it already! I commented the PR.

@rgs1
Copy link
Member

rgs1 commented Oct 8, 2020

Hmm this change is triggering this for our build:

14:21:45 warning: unknown warning option '-Wno-attribute-alias'; did you mean '-Wno-attributes'? [-Wunknown-warning-option]
14:21:45 In file included from external/com_github_google_tcmalloc/tcmalloc/static_vars.cc:26:
14:21:45 In file included from external/com_github_google_tcmalloc/tcmalloc/cpu_cache.h:30:
14:21:45 external/com_github_google_tcmalloc/tcmalloc/internal/percpu_tcmalloc.h:255:7: error: 'asm goto' constructs are not supported yet
14:21:45   asm goto(
14:21:45       ^
14:21:45 external/com_github_google_tcmalloc/tcmalloc/internal/percpu_tcmalloc.h:459:25: error: invalid output constraint '=@ccbe' in asm
14:21:45             [underflow] "=@ccbe"(underflow),

@rgs1
Copy link
Member

rgs1 commented Oct 8, 2020

nvm just saw the prev comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow new tcmalloc to be used as an option
7 participants