Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About static link to cudnn and cublas #5

Open
gangmul12 opened this issue Jul 13, 2019 · 10 comments
Open

About static link to cudnn and cublas #5

gangmul12 opened this issue Jul 13, 2019 · 10 comments

Comments

@gangmul12
Copy link

I think nobody managing this repo now, but for future usage.

my env:
OS : Ubuntu 16.04
cudnn : 7.1.4
cuda : 8.0

and i installed pytorch according to the instruction of @cng123. link is below
https://docs.google.com/document/d/17fSM2vrWodP8rWR7ctpgaggVXEw0uD2VCAh0Gi4Gpb4/edit?usp=sharing

I downloaded https://github.com/pytorch/examples, and run imagenet benchmark. Then, I found gpgpu-sim often face deadlock, seg fault, or CUDNN_STATUS_INTERNAL_ERROR
I analyzed this problem by using LD_DEBUG flags, then i found that pytorch library dynamically loads libcuda.so.1, which should not be linked.

I've found two reason why it tried to link to libcuda.so.1 instead of gpgpu-sim's libcudart.so.

  1. In compilation stage for _nvrtc.so, there is a link flag to libcuda.so.

    thnvrtc_link_flags += ['-lcudart', '-lcuda', '-lnvrtc']

    I can bypass issue from this by using static library of cudnn or deleting lcuda link flag(if you want to use shared version of cudnn). I personally prefer using static library.

  2. libcublas.so has a link to libcuda.so. Strangely, i can't find a explicity linkage libcublas.so to libcuda.so when i check it using ldd command, but LD_DEBUG result shows that libcublas.so calls functions in libcuda.so.
    At first, I tried to resolve this issue by making a copy of libcudart.so with the name of libcuda.so.1. However, there are so many unimplemented cuda driver function in cuda_runtime_api.cc, so my terminal generated CUDNN_STATUS_INTERNAL_ERROR very quickly.
    Then, I just built pytoch with libcublas_static.a by modifying some cmake value. like

    option(CAFFE2_STATIC_LINK_CUDA "Statically link CUDA libraries" OFF)

    if(CAFFE2_STATIC_LINK_CUDA)
    set_property(
    TARGET caffe2::cublas PROPERTY INTERFACE_LINK_LIBRARIES
    "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcublas_static.a")

Then, many errors were gone.
I also think this is closely related the merged pull-request of gpgpu-sim-distribution, gpgpu-sim/gpgpu-sim_distribution#129

I'm not sure it is meaningful to improve old version of pytorch(ver0.4) but anyway, i hope this issue help your simulation

Thank you

@ohcurrent
Copy link

Hello gangmul12!
It has been a long time talking to you.

Are you still working on pytorch-gpgpusim?
Did you make some progress running any of the Pytorch examples which fully runs on GPGPU-Sim?

@gangmul12
Copy link
Author

Hi! i worked on pytorch-gpgpusim, but i've realized that many kernels in cuDNN library is implemented with only SASS, so now i'm studying SASS in fact XD
However, without failing to simulate SASS only kernel, i've successfully ran(and ignore SASS only kernels) an example!

@ohcurrent
Copy link

@gangmul12
I see....
The kernels you mentioned about in cuDNN library which only has SASS, does that include "maxwell_sgemm_128x64_raggedMn_tn_splitK" kernel?
Thanks for answering.

@gangmul12
Copy link
Author

@ohcurrent
exactly. every kernel named maxwell_something_blahblah does not has PTX code... Also, some kernels have ptx version code, but their function bodies are just {ret;}...

@cng123
Copy link
Contributor

cng123 commented Nov 5, 2019

To add onto this, the maxwell_something_xxxx function headers are not in the newer CuDNN versions, so it might have been a mistake that they were there in the first place.

@ohcurrent
Copy link

@cng123, Then did you simulate with higher version of cuDNN7.1.4 ?

@gangmul12
Copy link
Author

@ohcurrent, I saw many cuDNN kernel is optimized to some of its architecture, newer versions have volta_xxxx then it only contains SASS code.
According to a few articles like https://arxiv.org/abs/1804.06826 or https://hal.inria.fr/hal-00789958/document, it seems that there is an optimization that can only be done in SASS level, and can not be provided by nvcc.. I think that is the reason why there are some kernels implemented only in SASS level.

@ohcurrent
Copy link

@gangmul12
I see, thanks for sharing. I thought that kernel was related to cuBLAS.

@Azuresonance
Copy link

@gangmul12
This may be off-topic, but I am trying to obtain some information on your fork of this repository (gangmul12/pytorch-v1.1-gpgpu-sim), which unfortunately does not have the Issues tab enabled.

I was trying to build your project with CUDA 8.0 and CUDNN 7.1.3, since versions above this doesn't work according to the developer of GPGPU-Sim (gpgpu-sim/gpgpu-sim_distribution#166 (comment))

After installing, I attempted to run an MNIST example (https://github.com/pytorch/examples/blob/master/mnist/main.py), and got the following output:
Traceback (most recent call last):
File "./main.py", line 139, in
main()
File "./main_original.py", line 128, in main
train(args, model, device, train_loader, optimizer, epoch)
File "./main_original.py", line 42, in train
output = model(data)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "./main.py", line 22, in forward
x = self.conv1(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 338, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

I would be grateful for some help, whether it's a solution or some hint to narrow the problem down.

@gangmul12
Copy link
Author

@Azuresonance
Hi,
At that time, gpgpu-sim version is dev branch of ver. 3x, so i'm not sure where the error is from.
The possible point is..
when you execute any command that is related to cuda, gpgpu sim should be started.
However it seems that gpgpu-sim has not been started.(or you just didn't print gpgpu-sim log?)
maybe it is because you use different cuda version for gpgpu-sim and pytorch, or rpath option is not deleted when you installed pytorch..
It is good start point to check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants