Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shadow memory range interleaves with an existing memory mapping #1630

Open
jason-infra opened this issue Mar 8, 2023 · 4 comments
Open

Shadow memory range interleaves with an existing memory mapping #1630

jason-infra opened this issue Mar 8, 2023 · 4 comments

Comments

@jason-infra
Copy link

jason-infra commented Mar 8, 2023

I have an ASAN test suite that gives me the following error:

==25==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
==25==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
==25==This might be related to ELF_ET_DYN_BASE change in Linux 4.12.
==25==See https://github.com/google/sanitizers/issues/856 for possible workarounds.
==25==Process memory map follows:
	0x00007fff7000-0x00008fff7000	
	0x000091ff6000-0x004091ff7000	
	0x02008fff7000-0x10007fff8000	
         ...

I am running

  • Nvidia 525.60.13 drivers
  • Ubuntu 20.04.5
  • Linux kernel 5.15.0-1027-gcp

IMPORTANTLY, All Asan test pass with no errors if I run the ASAN tests using nvidia 470 drivers:

  • Nvidia 470.82.01 drivers
  • Ubuntu 20.04.5
  • Linux kernel 5.15.0-1027-gcp

Other Info

  • I have kept all variables the same, including Asan testsuite, OS, kernel, nvidia gpu t4. The only difference being the Nvidia drivers.

  • I do not believe -fsanitize=address is related in #856, because the test suite runs flawlessly with Nvidia 470 drivers.

  • All my other tests pass, including tsan.

Why am I still getting the "Shadow memory range interleaves" error?

@mjj48
Copy link

mjj48 commented Mar 29, 2023

Standalone command to repro this:

#!/bin/bash
set -e 

# Things you must change:
LIB_ASAN=/usr/lib/x86_64-linux-gnu/libasan.so.5
LIB_CUDART=cuda-11.2/targets/x86_64-linux/lib/libcudart.so.11.0
NVCC_PATH=/home/michael.johnson/cuda-11.2/bin/nvcc
CLANG10_PATH=/home/michael.johnson/clang/bin/clang
# End of things you must change.

# Create .h
cat <<EOT >> hello.h
void call_hello();
EOT
# Create .cu
cat <<EOT >> hello.cu
#include <cstdio>
__global__ void cuda_hello(){
    printf("Hello World from GPU!\n");
}
void call_hello() {
    cuda_hello<<<1,1>>>(); 
}
EOT
# Create .cpp
cat <<EOT >> hello_bin.cpp
#include "hello.h"
int main() {
    call_hello(); 
    return 1;
}
EOT
mkdir -p needed_libs/
cp $LIB_ASAN needed_libs/
cp $LIB_CUDART needed_libs/

$NVCC_PATH  --objdir-as-tempdir  --compiler-options "-fPIC" --compiler-bindir=$CLANG10_PATH  -x cu  -O2 -c hello.cu -o hello.pic.o

$CLANG10_PATH -shared -o libhello.so hello.pic.o -fsanitize=address -stdlib=libstdc++ -Lneeded_libs -lstdc++

cp libhello.so needed_libs/

$CLANG10_PATH  -fPIC  -nostdinc '-std=c++17' -nostdinc++ -c hello_bin.cpp -o hello_bin.pic.o

$CLANG10_PATH -o hello_bin -Lneeded_libs hello_bin.pic.o -lhello -l:libcudart.so.11.0 -l:libstdc++.so.6 -pie -fsanitize=address -fuse-ld=gold  -stdlib=libstdc++ -lstdc++

LD_LIBRARY_PATH=needed_libs/ ./hello_bin 2>&1 | head

@noaxp
Copy link

noaxp commented Mar 6, 2024

It seems the problem only happens when you use both clang and gold ld.
I also encountered this issue on gcc, but it's weird I couldn't reproduce it now .
As a temporary solution, you could try gcc or other ld.

Reproduce:
clang demo.cc -fsanitize=address -I/usr/local/cuda/include -lcuda -o demo -fuse-ld=gold
Remove -fuse-ld=gold the program would work well.

// demo.cc
#include <cuda.h>
int main() {
  cuInit(0);
  return 0;
}

@noaxp
Copy link

noaxp commented Mar 14, 2024

It's caused by duplicated invoking of InitializeShadowMemory.
First invoking is before main(), second is before cuInit(0).

The memory address of variable asan_inited is different between the twice invoking, so the program consider asan uninitialized and call InitializeShadowMemory in the second time. Then it try to allocate shadow memory on the same address i.e. 0x00007fff7000-0x10007fff7fff and cause error.

But I'm still confused why there are two asan_inited object, and why cuda driver & ld could effect it.

@noaxp
Copy link

noaxp commented Mar 19, 2024

I found the root cause, there is below code in libcuda.so

Dl_info attr[2];
dladdr((void*)&pthread_join, attr);
dlopen(attr[0].dli_fname, 1);

So the program try to dlopen itself, and dlopen pie file is undefined behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants