You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The singularity container is sensitive to having the users home environment available/bound when being instantiated. Since most htcondor environments call singularity with -C --no-home this leads to complications, eg. with the python egg cache.
Describe the solution you'd like
I would like the container to be more "self contained" so it relies as little as possible on the host environment when being called. Reason being, that we usually don't know which node in a cluster will execute a respective bidsonym job. In addition, the entry point definition is somewhat complicating matters as well, since it is calling a startup script with command line parameters, and not an executable in a fully prepared environment. Lastly, we can assume that all nodes in a htcondor cluster can see a shared file system, so data transfer within jobs is reduced to a minimum.
Describe alternatives you've considered
Let's start with a fully working singularity call from the host, where /srv/home is bind mounted on all nodes. The dataset is in /srv/home/user/test/BIDS. Then we can successfully call
Changing the call to simulate a htcondor like call. We eliminate binding home and bring in the temporary directoriea availalable on each node locally. To prevent the container from crashing, we need to explicitly tell singularity where to setup the python egg cache (this is one annoyance that should be fixed).
The above call would work on a htcondor cluster if we were able to have user defined singularity bind mounts (apart from user definable singularity images). In general, htcondor clusters are not setup for that, but it can be done by configuring htcondor with
As can be seen in the job description above, we have to specify the entry point as executable, and then give the parameters in the arguments section. This leads to a problem, where relative paths for the data directory do not work properly anymore, so we have to specify absolute paths. I am not sure why this is. Would be great to get this fixed.
To get rid of the +SingularityBind hack and be much closer to htcondor standard, we need to change a few things around. To simulate the cluster call, we can do
Interestingly, when using the above job description (or simulation), we end up with some additional directories and files, that are not there when calling without the --no-home option. These are brainextraction_wf and deface_wf and report_wf.
Apparently, these files contain an html report that tries to pull some data from /tmp, for example like so in graph1.json
Since we have to assume that /tmp is bind mounted locally on each cluster node and content deleted regularly, this html report will almost always fail and not endure for a long time. I do not see a way to influence the location of these reports. On the other hand, they do not seem to be vital. It would be great, if this could be controlled.
Additional context
Long story short, it would be good if we could make the container more self contained, have a more concise entry point, and a better way to control where output goes. Especially, when we want to exploit parallel execution on a htcondor cluster.
Happy to discuss and debug.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
The singularity container is sensitive to having the users home environment available/bound when being instantiated. Since most htcondor environments call singularity with
-C --no-home
this leads to complications, eg. with the python egg cache.Describe the solution you'd like
I would like the container to be more "self contained" so it relies as little as possible on the host environment when being called. Reason being, that we usually don't know which node in a cluster will execute a respective bidsonym job. In addition, the entry point definition is somewhat complicating matters as well, since it is calling a startup script with command line parameters, and not an executable in a fully prepared environment. Lastly, we can assume that all nodes in a htcondor cluster can see a shared file system, so data transfer within jobs is reduced to a minimum.
Describe alternatives you've considered
Let's start with a fully working singularity call from the host, where
/srv/home
is bind mounted on all nodes. The dataset is in/srv/home/user/test/BIDS
. Then we can successfully callChanging the call to simulate a htcondor like call. We eliminate binding home and bring in the temporary directoriea availalable on each node locally. To prevent the container from crashing, we need to explicitly tell singularity where to setup the python egg cache (this is one annoyance that should be fixed).
The above call would work on a htcondor cluster if we were able to have user defined singularity bind mounts (apart from user definable singularity images). In general, htcondor clusters are not setup for that, but it can be done by configuring htcondor with
And then introducing in the job definition files
+SingularityBind = "/srv/home/user/test/BIDS:/BIDS"
This would give an job desciption file that would execute well in the cluster.
As can be seen in the job description above, we have to specify the entry point as executable, and then give the parameters in the arguments section. This leads to a problem, where relative paths for the data directory do not work properly anymore, so we have to specify absolute paths. I am not sure why this is. Would be great to get this fixed.
To get rid of the
+SingularityBind
hack and be much closer to htcondor standard, we need to change a few things around. To simulate the cluster call, we can doNote how we have to specify an absolute path for the data directory in the above. This translates in to a job description like this
Interestingly, when using the above job description (or simulation), we end up with some additional directories and files, that are not there when calling without the
--no-home
option. These arebrainextraction_wf
anddeface_wf
andreport_wf
.Apparently, these files contain an html report that tries to pull some data from
/tmp
, for example like so ingraph1.json
Since we have to assume that
/tmp
is bind mounted locally on each cluster node and content deleted regularly, this html report will almost always fail and not endure for a long time. I do not see a way to influence the location of these reports. On the other hand, they do not seem to be vital. It would be great, if this could be controlled.Additional context
Long story short, it would be good if we could make the container more self contained, have a more concise entry point, and a better way to control where output goes. Especially, when we want to exploit parallel execution on a htcondor cluster.
Happy to discuss and debug.
The text was updated successfully, but these errors were encountered: