Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed cluster with SGE and DIND capabilities fails to start #392

Closed
tcibinan opened this issue Jun 14, 2019 · 1 comment
Closed

Fixed cluster with SGE and DIND capabilities fails to start #392

tcibinan opened this issue Jun 14, 2019 · 1 comment
Assignees
Labels
kind/bug Something isn't working sys/core Issues related to core functionality (API, VM management, ...)

Comments

@tcibinan
Copy link
Contributor

Version: 0.16.0.1380.1015af98ba9d2900a8b086391356672cf626b7d8

Currently, fixed cluster with both CP_CAP_SGE and CP_CAP_DIND_CONTAINER options enabled with more than one worker fails to start. Some of the workers fail on either SGEWorkerSetup or SetupDind task with different errors. Mostly it is the following error:

Started DIND setup
...
DIND dependencies installed
tar: docker.tgz: Cannot open: No such file or directory

It was noticed that scripts are executed in the shared analysis directory. The problem is that workers download, use and remove same files in the same shared working directory. Therefore, some workers may delete files downloaded by other workers.

@tcibinan tcibinan added kind/bug Something isn't working sys/core Issues related to core functionality (API, VM management, ...) labels Jun 14, 2019
@tcibinan tcibinan self-assigned this Jun 14, 2019
tcibinan added a commit that referenced this issue Jun 14, 2019
Capabilities installation working directory is expected to be a local directory. Therefore capabilities installation should be performed before changing the working directory to the analysis directory which is a shared folder.
@tcibinan tcibinan added the state/underway Issues that are currently being solved/implemented label Jun 14, 2019
@tcibinan
Copy link
Contributor Author

Fixed by bbdbd4f.

@tcibinan tcibinan removed the state/underway Issues that are currently being solved/implemented label Jun 17, 2019
evgeniimv pushed a commit to evgeniimv/cloud-pipeline that referenced this issue Oct 14, 2019
…ectory.

Capabilities installation working directory is expected to be a local directory. Therefore capabilities installation should be performed before changing the working directory to the analysis directory which is a shared folder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working sys/core Issues related to core functionality (API, VM management, ...)
Projects
None yet
Development

No branches or pull requests

1 participant