Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LSan by fixing getauxval() (resubmit v4) #39430

Merged
merged 2 commits into from
Jul 25, 2022

Conversation

azat
Copy link
Member

@azat azat commented Jul 20, 2022

Changelog category (leave one):

  • Build/Testing/Packaging Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix LSan by fixing getauxval()

getauxval() from glibc-compatibility did not work always correctly:

  • it does not work after setenv(), and this breaks vsyscalls,
    like sched_getcpu() 1 (and BaseDaemon.cpp always set TZ if timezone
    is defined, which is true for CI 2).

  • another think that is definitely broken is LSan (Leak Sanitizer), it
    relies on worked getauxval() but it does not work if __environ is not
    initialized yet (there is even a commit about this).

Fix this by using /proc/self/auxv with fallback to environ solution to
make it compatible with environment that does not allow reading from
auxv (or no procfs), since previous attempt w/o it failed - #33957 (comment)

Fixes: #39299 (cc @rschu1ze )
Fixes: #39452

Note: I have few patches on top, but I want to see this error again.

@azat azat marked this pull request as draft July 20, 2022 18:32
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jul 20, 2022
@azat azat changed the title Fix getauxval (for now debugging) Fix abort() in getauxval (for now debugging) Jul 20, 2022
@azat azat changed the title Fix abort() in getauxval (for now debugging) Fix abort() in getauxval() (for now debugging) Jul 20, 2022
@azat azat changed the title Fix abort() in getauxval() (for now debugging) Fix LSan by fixing getauxval() (resubmit v4) Jul 21, 2022
@robot-ch-test-poll2 robot-ch-test-poll2 added pr-build Pull request with build/testing/packaging improvement and removed pr-not-for-changelog This PR should not be mentioned in the changelog labels Jul 21, 2022
@rschu1ze rschu1ze self-assigned this Jul 21, 2022
@azat
Copy link
Member Author

azat commented Jul 21, 2022

Stress test (address, actions) — Backward compatibility check: Error message in clickhouse-server.log (see bc_check_error_messages.txt)

2022.07.21 22:03:58.327180 [ 277225 ] {} <Error> test_11.src_11 (38153510-aab5-41e8-8285-b595336f35fd): The set of parts restored in place of 3_32_32_7 looks incomplete. There might or might not be a data loss. Suspicious parts: 2_59_59_0 (state Active) 

Not interesting failure for this PR

@azat azat marked this pull request as ready for review July 24, 2022 12:16
azat added 2 commits July 25, 2022 01:21
(cherry picked from commit 51e7c41)
v2: fix type check
Signed-off-by: Azat Khuzhin <[email protected]>
getauxval() from glibc-compatibility did not work always correctly:

- It does not work after setenv(), and this breaks vsyscalls,
  like sched_getcpu() [1] (and BaseDaemon.cpp always set TZ if timezone
  is defined, which is true for CI [2]).

  Also note, that fixing setenv() will not fix LSan,
  since the culprit is getauxval()

  [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1163404
  [2]: ClickHouse#32928 (comment)

- Another think that is definitely broken is LSan (Leak Sanitizer), it
  relies on worked getauxval() but it does not work if __environ is not
  initialized yet (there is even a commit about this).

  And because of, at least, one leak had been introduced [3]:

    [3]: ClickHouse#33840

Fix this by using /proc/self/auxv with fallback to environ solution to
make it compatible with environment that does not allow reading from
auxv (or no procfs).

v2: add fallback to environ solution
v3: fix return value for __auxv_init_procfs()
(cherry picked from commit f187c34)
v4: more verbose message on errors, CI founds [1]:
    AUXV already has value (529267711)
    [1]: https://s3.amazonaws.com/clickhouse-test-reports/39103/2325f7e8442d1672ce5fb43b11039b6a8937e298/stress_test__memory__actions_.html
v5: break at AT_NULL
v6: ignore AT_IGNORE
v7: suppress TSan and remove superior check to avoid abort() in case of race
v8: proper suppressions (not inner function but itself)
Refs: ClickHouse#33957
Signed-off-by: Azat Khuzhin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-build Pull request with build/testing/packaging improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Abort in getauxval from musl
3 participants