-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building (and using) libtensorflow.so #316
Comments
Hi I’m wondering could you try build _pywrap_tensorflow_internal.so as well? In the current setup, libtensorflow.so won’t contain any GPU kernels so that’s why you are seeing this issue. Also could you shed some lights on how rust builds/links/invokes Tensorflow? |
Hi, thanks for your reply!
I'm currently building with this command, that's the closest I could find to __pywrap__tensorflow_internal.so. Once this goes through, what would you recommend doing with the built file?
edit: Build was succesful, I can import the library in python without errors. Unfortunately I'm neither experienced in using bazel nor in building tensorflow from sources.
Rust calls the tensorflow C API, so it requires building the libtensorflow.so |
Let me do some studies and get back to you. Thus far applications I’ve encountered are either python-based (works fine) or C++ applications which builds inside TensorFlow (also works fine, with some known limitations). For other high-level languages which uses TF C API I’ll need some info. Could you help provide some pointers to the rust binding to TensorFlow and how to build/use it? |
I'll check TensorFlow rust binding, and see how it loads We have an internal ticket tracking this and I'm working on the fix. |
I've just encountered the same runtime error after building import tensorflow as tf
x = tf.random.uniform(shape=(10, 10))
with tf.Session() as sess:
print(sess.run(x)) |
@pricebenjamin Since you are using Python please refer to this article on how to build TensorFlow ROCm: Once the PIP package is built and installed, On the other hand, from your log it seems you want to run your application in other means:
Could you help shed more details on how you plan to load and run your TensorFlow-based applications? We have an internal ticket that a bug in HIP runtime where GPU kernels within shared libraries loaded after TensorFlow is initialized won't be properly identified. Shall a fix be devised, it may help resolve the issue from @sebpuetz, but I'd like to understand your application better to see if it's the same issue. |
I can actually confirm what @pricebenjamin describes. I built a wheel from
I am building a wheel from edit: building on
|
@whchung I've just rebuilt the package in a Docker container using a minimally modified version of Dockerfile.rocm. The major differences are:
I'm able to build and install the wheel successfully, but the same runtime error is thrown. To be clear, the |
@pricebenjamin is it possible to share your Dockerfile somewhere so I can reproduce it? I'm recently working on #318 which changes how ROCm components get loaded by TensorFlow, to cope with upcoming changes in future versions of TensorFlow. And I'm at the spot where issue in this ticket is now blocking me and thus I'm actively looking into getting a fix implemented. Please stay tuned. |
@whchung Sure thing. Edit: Hopefully you saw the hyperlink there. I just realized that it's not obvious on some screens: https://github.com/pricebenjamin/tf-bionic-rocm-docker |
HIP runtime has to be changed. And all user-level ROCm components, including TensorFlow ROCm would need to be rebuilt. A work-in-progress branch is at: https://github.com/ROCm-Developer-Tools/HIP/compare/feature_maybe_dlopen . It doesn't solve all the corner cases yet and we're still working on it. |
The branch in HIP runtime is getting matured and passing some initial tests. Checking rust binding now. |
Thanks for the update! |
@sebpuetz I'm new to rust, could you check if this sounds right to you:
|
@whchung
This might then result in the following error:
which can be mitigated by:
|
@sebpuetz my environment got a bit mixed up so it took me sometime to rebuild everything:
also tried Since everything above HIP have to be rebuilt from source (HIP / rocRAND / rocFFT / rocBLAS / MIOpen / TensorFlow), and the change in HIP is still under validation for other ROCm applications, we plan to release this fix in ROCm 2.3, scheduled in (late?) March 2019. Meanwhile I'm wondering if it's possible to give you a docker container image so you can validate the fix on your end? |
Nice!
Sure, I could check if my program works with the fix. Although I might run into #325 along the way ;) |
@sebpuetz could you try this tag on dockerhub:
I'm still pushing the tag so you'll need to wait a bit until it appears on dockerhub. Notice you'll need ROCm 2.1 in your bare metal. I run it with:
TensorFlow rust binding is stored under |
I'll probably get around to check this tomorrow, I want to switch to the fully supported Ubuntu 18.04 before testing things. |
Everything seems to be loaded properly, everything gets properly initialized and training works! Thanks |
@pricebenjamin I'm wondering could you also give the docker container a shot?
|
@whchung Pulling now. I'll test it some time this afternoon. |
@whchung The issue appears to be resolved in the |
@whchung just wondering whether the changes shipped with 2.3? I installed 2.3 and tried to compile
I was able to execute a graph by copying the Thanks in advance! |
edit: Building in the Compiling the tensorflow wheel for python also doesn't work on that branch:
failed because it tried to install for python 3.5?
Installed the .whl manually by:
Trying to import tensorflow throws an exception in python:
|
Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template
System information
Describe the problem
I want to use tensorflow from rust, to do so I need to build the
libtensorflow.so
shared library. Compilation goes through onr1.12
but when trying to execute the graph I get a runtime exception (see other info/logs section).I don't encounter any issues with tensorflow in python, running a graph and training model works like a charm there. Although that was not compiled from source but installed from pypi.
Provide the exact sequence of commands / steps that you executed before running into the problem
Any other info / logs
Runtime exception:
hipconfig
hcc --version
rocminfo
The text was updated successfully, but these errors were encountered: