You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in an installation which worked before, I suddenly see an import problem popping up in the agent:
2017-02-23 19:01:56,719: agent_0.AgentWorker.0.child: agent_0.AgentWorker.0 : MainThread : ERROR : ERROR in agent main loop: cannot import name jsonapi
Traceback (most recent call last):
File "/lustre/atlas/scratch/merzky1/bip103/radical.pilot.sandbox/rp.session.titan-ext1.merzky1.017220.0001-pilot.0000/rp_install/lib/python2.7/site-packages/radical/pilot/worker/agent.py", line 515, in idle_cb
returnself.check_units()
File "/lustre/atlas/scratch/merzky1/bip103/radical.pilot.sandbox/rp.session.titan-ext1.merzky1.017220.0001-pilot.0000/rp_install/lib/python2.7/site-packages/radical/pilot/worker/agent.py", line 560, in check_units
self.advance(cu_list, publish=False, push=True, prof=False)
File "/lustre/atlas/scratch/merzky1/bip103/radical.pilot.sandbox/rp.session.titan-ext1.merzky1.017220.0001-pilot.0000/rp_install/lib/python2.7/site-packages/radical/pilot/utils/component.py", line 961, in advance
Component.advance(self, units, state, publish, push, prof)
File "/lustre/atlas/scratch/merzky1/bip103/radical.pilot.sandbox/rp.session.titan-ext1.merzky1.017220.0001-pilot.0000/rp_install/lib/python2.7/site-packages/radical/pilot/utils/component.py", line 903, in advance
output.put(_unit)
File "/lustre/atlas/scratch/merzky1/bip103/radical.pilot.sandbox/rp.session.titan-ext1.merzky1.017220.0001-pilot.0000/rp_install/lib/python2.7/site-packages/radical/pilot/utils/queue.py", line 513, in put
_uninterruptible(self._q.send_json, msg)
File "/lustre/atlas/scratch/merzky1/bip103/radical.pilot.sandbox/rp.session.titan-ext1.merzky1.017220.0001-pilot.0000/rp_install/lib/python2.7/site-packages/radical/pilot/utils/queue.py", line 45, in _uninterruptible
return f(*args, **kwargs)
File "/lustre/atlas1/bip103/scratch/merzky1/radical.pilot.sandbox/ve_titan/lib/python2.7/site-packages/zmq/sugar/socket.py", line 506, in send_json
ImportError: cannot import name jsonapi
jsonapi however is installed and loadable in the pilot ve:
I gave up trying to find the cause of this. My current assumption is that the lustre file system cache is inconsistent between nodes, but I don't really want to spend the time to rule this out or confirm. Just as a data point, I see also things like this for some nodes:
~/sandbox/rp.session.titan-ext1.merzky1.017224.0008-pilot.0000 $ cat agent_1.err
/lustre/atlas1/bip103/scratch/merzky1/radical.pilot.sandbox/rp.session.titan-ext1.merzky1.017224.0008-pilot.0000/bootstrap_2.sh: line 40: /lustre/atlas1/bip103/scratch/merzky1/radical.pilot.sandbox/ve_titan/bin/python: No such file or directory
/lustre/atlas1/bip103/scratch/merzky1/radical.pilot.sandbox/rp.session.titan-ext1.merzky1.017224.0008-pilot.0000/bootstrap_2.sh: line 40: exec: /lustre/atlas1/bip103/scratch/merzky1/radical.pilot.sandbox/ve_titan/bin/python: cannot execute: No such file or directory
~/sandbox/rp.session.titan-ext1.merzky1.017224.0008-pilot.0000 $ l /lustre/atlas1/bip103/scratch/merzky1/radical.pilot.sandbox/ve_titan/bin/python
-rwxr-xr-x 1 merzky1 merzky1 14258 Feb 27 03:16 /lustre/atlas1/bip103/scratch/merzky1/radical.pilot.sandbox/ve_titan/bin/python*
And this right after agent_0 used the very same python executable on a different node...
So, I am gonna close this, and if deployment hiccups like this become too much of a problem, we'll have to take this to ORNL support.
PS.: I pasted a version with the wrong ve location - corrected now above.
in an installation which worked before, I suddenly see an import problem popping up in the agent:
jsonapi
however is installed and loadable in the pilot ve:No idea what's up as of yet - but that needs fixing first before looking into #1235 and #1237 ...
Stack:
The text was updated successfully, but these errors were encountered: