Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zsh issues #50

Closed
coodoo opened this issue Feb 28, 2016 · 22 comments
Closed

zsh issues #50

coodoo opened this issue Feb 28, 2016 · 22 comments
Assignees

Comments

@coodoo
Copy link

coodoo commented Feb 28, 2016

Seems a bunch of people had ran into zsh not working properly with cltorch, see #31 and #24, and for now the only solution is switching back to bash which is less than ideal, just wondering has anyone figured out a solution to this? Thanks!

@hughperkins
Copy link
Owner

Ok, I might take a look sometime.... seems like a reasonable request, and I think I can install zsh on ubuntu, afaik.

@hughperkins
Copy link
Owner

Hi. Ok, I installed zsh, and tried running cltorch, and so far no issues. See below. Can you produce a tiny test-case that demonstrates the issue please?

ubuntu@orange ~ % torch         
zsh: command not found: torch
ubuntu@orange ~ % source ~/torch/install/bin/torch-activate 
ubuntu@orange ~ % luajit
LuaJIT 2.1.0-beta1 -- Copyright (C) 2005-2015 Mike Pall. http://luajit.org/

 _____              _     
|_   _|            | |    
  | | ___  _ __ ___| |__  
  | |/ _ \| '__/ __| '_ \ 
  | | (_) | | | (__| | | |
  \_/\___/|_|  \___|_| |_|

JIT: ON SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
th> require 'torch'
th> a = torch.Tensor(2,3):uniform()
th> require 'cutorch'
th> a = torch.CudaTensor(2,3):uniform()
th> a
th>> 
th>> print(a)
stdin:3: '=' expected near 'print'
th> print(a)
 0.2988  0.0395  0.7658
 0.8750  0.2667  0.3257
[torch.CudaTensor of size 2x3]

th> require 'cltorch'
th> a = torch.ClTensor(3,2,5):uniform()
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
th> a
th>> print(a)
stdin:2: '=' expected near 'print'
th> print(a)
(1,.,.) = 
  0.8787  0.8070  0.0460  0.5314  0.4063
  0.6157  0.8125  0.1887  0.3200  0.8569

(2,.,.) = 
  0.2003  0.0125  0.4530  0.7228  0.4754
  0.1571  0.4689  0.2694  0.7298  0.5362

(3,.,.) = 
  0.5828  0.9367  0.6495  0.0552  0.9694
  0.6742  0.1739  0.5109  0.2835  0.8508
[torch.ClTensor of size 3x2x5]

@coodoo
Copy link
Author

coodoo commented Feb 29, 2016

Yes I do, a minimum reproducible case is like this (on OS X 10.11 El Capitan)

with zsh

th> require 'cltorch'
/Users/jlu/torch/install/share/lua/5.1/trepl/init.lua:384: /Users/jlu/torch/install/share/lua/5.1/cltorch/init.lua:19: cannot load '/Users/jlu/torch/install/lib/lua/5.1/libcltorch.so'
stack traceback:
    [C]: in function 'error'
    /Users/jlu/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
    [string "_RESULT={require 'cltorch'}"]:1: in main chunk
    [C]: in function 'xpcall'
    /Users/jlu/torch/install/share/lua/5.1/trepl/init.lua:651: in function 'repl'
    .../jlu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
    [C]: at 0x0100bc3be0

with bash

th> require 'cltorch'
{
  finish : function: 0x04d17080
  about : function: 0x04d16a00
  getDeviceCount : function: 0x04d17198

  [redacted]
}

So precisely speaking, this is a zsh on os x issue, for some reason libcltorch.so is not found, no matter how I declare the ENV variables...

@hughperkins
Copy link
Owner

Ah, I dont have Mac OS X ;-) Is this something you might be able to help fix?

@coodoo
Copy link
Author

coodoo commented Feb 29, 2016

Surely I would love to if I know how, hint?

@hughperkins
Copy link
Owner

Obviously if I knew how, I would have fixed it ;-) But I would imagine it's something to do with env vars. But as to what and how it's hard to say.

Things I would try:

  • open a zsh and a bash, and compare the values of:
    • LD_LIBRARY_PATH
    • DYLD_LIBRARY_PATH (or something like that, it's mac specific, you should see it in the bash vars I think)
    • PATH
    • LUA_CPATH
    • LUA_PATH
      You can see the vars by typing env or set. (But env is better, since it shows the exported ones, I think).

If that throws up nothing, you're going to have to start loading stuff, and hack around a bit. If it was me, I'd start writing little c programs probalby, to load stuff from c, and find out what works, and what doesnt. Here's an example I was using to try to fix Mac problems in cltorch yesterday:

#include <iostream>
using namespace std;

#include <dlfcn.h>

extern "C" {
  #include "luaT.h"
  #include "lualib.h"
int luaopen_libpaths(lua_State *L);
}

int main(int argc, char *argv[]) {
  void *err = dlopen("libPyTorchLua.so", RTLD_NOW | RTLD_GLOBAL);
  cout << "err " << (long)err << endl;
  cout << "dlerror " << dlerror() << endl << endl;
  err = dlopen("/home/ubuntu/torch/install/lib/lua/5.1/libpaths.so", RTLD_NOW | RTLD_GLOBAL);
  cout << "err " << (long)err << endl;


    lua_State *L = luaL_newstate();
    luaL_openlibs(L);

luaopen_libpaths(L);

    lua_getglobal(L, "require");
    lua_pushstring(L, "torch");
    lua_call(L, 1, 0);

  return 0;
}

Obviously you'll need to modify the names of the lilbraries being loaded and stuff. And you might comment out hte first bit, that loads libPyTorchLua.so and libpaths, or not. Or modify it. Etc. To build it, i would think it's something like:

gcc -o mytest mytest.cpp 

... and that might be all you need. Oh.. this version actually hard-links with paths, so comment out the luaT and lualib headers, everything in that export section, and remove the call to luaopen_libpaths, and it should compile without many other libraries. You might need to add dl and m libraries, like:

gcc -o mytest mytest.cpp -ldl -lm

(dl is the library for dynamic loading, and m is the maths library, which isnt actually used here, so you might not need it)

Edit: Oh, you'll need those lua headers... which means ... ummm.... you probably need to link with lua library somehow. That bit is always a bit tricky... and so... at this point... you'd have to start thinking about what is failing and where really.... ummm....

@hughperkins
Copy link
Owner

How about start by trying to load libcltorch.so from the c program, and see what happens.

If that works (or if it doesnt), then try maybe loading the lua library, initializing lua (lua_openlibs), and then loading it. or requiring it.

I cant tell you an exact recipe.

Something I often do is, hack around in the lua library itself, build myself. For example, loaindg the library is done by loadlib.c, in lua source. You can sprinkle printfs liberally around that, build it, link with that. run in gdb. etc... Oh, I usually put a floating point exceptoin into loaderror, so that load errors trigger gdb to halt, instad of the program just exiting:

static void loaderror (lua_State *L, const char *filename) {
  int a = 0;
  int b = 5 / a;
  luaL_error(L, "error loading module " LUA_QS " from file " LUA_QS ":\n\t%s",
                lua_tostring(L, 1), filename, lua_tostring(L, -1));
}

It might need a certain amount of effort, and time :-P

@hughperkins
Copy link
Owner

So, I added a call to zsh to my travis script https://travis-ci.org/hughperkins/cltorch/builds/112530954#L1268 , which runs zsh against https://github.com/hughperkins/cltorch/blob/master/src/travis/install-torch.sh , but it seems like it doesnt quite really run zsh for some reason, since: 1. it doesnt fail 2. when I do ps, there is no zsh, wihch sort of hints it's not running zsh for some reason.

edit: sorry, I mean against https://github.com/hughperkins/cltorch/blob/master/src/test/test-zsh.zsh

@coodoo
Copy link
Author

coodoo commented Mar 1, 2016

Just did a quick comparison in both zsh and bash, see results below.

Interestingly LD_LIBRARY_PATH and DYLD_LIBRARY_PATH never showed up, but bash still works alright.

zsh

PATH=/Users/jlu/torch/install/bin:/usr/local/bin:/usr/local/sbin:/usr/local/mysql/bin:/usr/local/share/npm/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Applications/VirtualBox.app/Contents/MacOS:/Users/jlu/temp/arc/arcanist/bin/
LUA_PATH=/Users/jlu/.luarocks/share/lua/5.1/?.lua;/Users/jlu/.luarocks/share/lua/5.1/?/init.lua;/Users/jlu/torch/install/share/lua/5.1/?.lua;/Users/jlu/torch/install/share/lua/5.1/?/init.lua;./?.lua;/Users/jlu/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua
LUA_CPATH=/Users/jlu/torch/install/lib/?.dylib;/Users/jlu/.luarocks/lib/lua/5.1/?.so;/Users/jlu/torch/install/lib/lua/5.1/?.so;/Users/jlu/torch/install/lib/?.dylib;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so
_=/usr/bin/env

bash

PATH=/Users/jlu/torch/install/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
LUA_PATH=/Users/jlu/.luarocks/share/lua/5.1/?.lua;/Users/jlu/.luarocks/share/lua/5.1/?/init.lua;/Users/jlu/torch/install/share/lua/5.1/?.lua;/Users/jlu/torch/install/share/lua/5.1/?/init.lua;./?.lua;/Users/jlu/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua
LANG=en_US.UTF-8
LUA_CPATH=/Users/jlu/torch/install/lib/?.dylib;/Users/jlu/.luarocks/lib/lua/5.1/?.so;/Users/jlu/torch/install/lib/lua/5.1/?.so;/Users/jlu/torch/install/lib/?.dylib;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so
_=/usr/bin/env

More interestingly, I tried to manually export DYLD_LIBRARY_PATH via command line, and seems anything starting with DYLD_ will not be recognized, looking into that aspect now.

@coodoo
Copy link
Author

coodoo commented Mar 1, 2016

I coped everything from /Users/jlu/torch/install/bin/torch-activate to my ~/.zshrc and verified all variables can be find by checking things like $ echo $DYLD_LIBARARY_PATH, but still no dice, libcltorch.so still can't be found.

@coodoo
Copy link
Author

coodoo commented Mar 1, 2016

Seems it has something to do with DYLD paths not working on os x unless SIP is disabled, details here.

What I don't understand is why bash still works without those environment variables being set?

@hughperkins
Copy link
Owner

I think you should have an LD_LIBRARY_PATH. On my system:

$ echo $LD_LIBRARY_PATH
/home/ubuntu/torch/install/lib:
$ cat ~/torch/install/bin/torch-activate 
export LUA_PATH='/home/ubuntu/.luarocks/share/lua/5.1/?.lua;/home/ubuntu/.luarocks/share/lua/5.1/?/init.lua;/home/ubuntu/torch/install/share/lua/5.1/?.lua;/home/ubuntu/torch/install/share/lua/5.1/?/init.lua;./?.lua;/home/ubuntu/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua'
export LUA_CPATH='/home/ubuntu/.luarocks/lib/lua/5.1/?.so;/home/ubuntu/torch/install/lib/lua/5.1/?.so;/home/ubuntu/torch/install/lib/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so'
export PATH=/home/ubuntu/torch/install/bin:$PATH
export LD_LIBRARY_PATH=/home/ubuntu/torch/install/lib:$LD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=/home/ubuntu/torch/install/lib:$DYLD_LIBRARY_PATH
export LUA_CPATH='/home/ubuntu/torch/install/lib/?.so;'$LUA_CPATH

@coodoo
Copy link
Author

coodoo commented Mar 1, 2016

Yep you are correct, both bash and zsh has that variable set, unfortunately zsh just won't work 😓

$ echo $LD_LIBRARY_PATH  
/Users/jlu/torch/install/lib:

@hughperkins
Copy link
Owner

For SIP, what is the recommended approach, instead of using LD_LIBRARY_PATH?

@hughperkins
Copy link
Owner

What happens if you copy everything from ~/torch/install/lib into ~/lib? (create ~/lib if it doesnt exist)

@hughperkins
Copy link
Owner

seems it is likely an RPATH issue. Andresy pointed this out a while actually, but I hadn't had a moment to find out more about it before #15

Relevant references: http://linuxmafia.com/faq/Admin/ld-lib-path.html https://blogs.oracle.com/ali/entry/avoiding_ld_library_path_the

@hughperkins
Copy link
Owner

@hughperkins
Copy link
Owner

Interestingly, if I clear my LD_LIBRARY_PATH, and move my build directoires, everything continues to run:

(envs)ubuntu:~/git$ env | grep PATH
GLADE_PIXMAP_PATH=:
XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0
GLADE_MODULE_PATH=:
XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
DEFAULTS_PATH=/usr/share/gconf/xfce.default.path
PATH=/home/ubuntu/torch/install/bin:/home/ubuntu/envs/bin:/home/ubuntu/bin:/home/ubuntu/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
LUA_PATH=/home/ubuntu/.luarocks/share/lua/5.1/?.lua;/home/ubuntu/.luarocks/share/lua/5.1/?/init.lua;/home/ubuntu/torch/install/share/lua/5.1/?.lua;/home/ubuntu/torch/install/share/lua/5.1/?/init.lua;./?.lua;/home/ubuntu/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua
LUA_CPATH=/home/ubuntu/.luarocks/lib/lua/5.1/?.so;/home/ubuntu/torch/install/lib/lua/5.1/?.so;/home/ubuntu/torch/install/lib/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so
MANDATORY_PATH=/usr/share/gconf/xfce.mandatory.path
GLADE_CATALOG_PATH=:
(envs)ubuntu:~/git$ luajit -l cltorch -e 'print(torch.ClTensor(2,3):uniform())'
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
 0.3308  0.3602  0.4100
 0.6792  0.7481  0.7736
[torch.ClTensor of size 2x3]

@hughperkins
Copy link
Owner

My RPATHs look like:

 objdump -x ~/torch/install/lib/lua/5.1/libcltorch.so | grep RPATH
  RPATH                $ORIGIN/../lib:/home/ubuntu/torch/install/lib

Per bottom of https://cmake.org/Wiki/CMake_RPATH_handling , yo uwould need to use otool to view these.

@coodoo
Copy link
Author

coodoo commented Mar 1, 2016

Hahaha, copying everything from ~/torch/install/lib into ~/lib did the trick!

Ended up I just symlink it with $ ln -s ~/torch/install/lib/ ~/lib and it worked fine, not sure this is the best possible solution but at least it worked, thanks for helping out, you rock (as usual)!

@hughperkins
Copy link
Owner

Ok, that's interesting. Not sure that is the most sustainable solution, but good that it is working :-)

@coodoo
Copy link
Author

coodoo commented Mar 1, 2016

Absolutely!

@coodoo coodoo closed this as completed Mar 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants