-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set TMPDIR variable to soft link pointing to PWD #21419
Conversation
The code-checks are being triggered in jenkins. |
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/PR-21419/2126 Code check has found code style and quality issues which could be resolved by applying a patch in https://cmssdt.cern.ch/SDT/code-checks/PR-21419/2126/git-diff.patch You can run |
The code-checks are being triggered in jenkins. |
+code-checks |
please test |
The tests are being triggered in jenkins. |
A new Pull Request was created by @perrozzi for master. It involves the following packages: GeneratorInterface/SherpaInterface @cmsbuild, @efeyazgan, @perrozzi, @thuer, @govoni can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
+1 The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
please test |
The tests are being triggered in jenkins. |
+1 The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: |
Comparison job queued. |
+1 |
merge |
hi @perrozzi - could you back port this to 93x? |
Comparison is ready Comparison Summary:
|
Unfortunately this PR broke this relval https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/slc6_amd64_gcc630/CMSSW_10_0_X_2017-11-26-2300/pyRelValMatrixLogs/run/534.0_sherpa_ZtoEE_0j_OpenLoops_13TeV_MASTER+sherpa_ZtoEE_0j_OpenLoops_13TeV_MASTER+HARVESTGEN/step1_sherpa_ZtoEE_0j_OpenLoops_13TeV_MASTER+sherpa_ZtoEE_0j_OpenLoops_13TeV_MASTER+HARVESTGEN.log |
can you please provide a cmsDriver command or cfg to reproduce the error? |
I tried with CMSSW_10_0_X_2017-11-26-2300 after adding the commits of this PR
and using the command |
Try again with CMSSW_10_0_X_2017-11-27-2300 , same command or |
ok, I can reproduce the error. |
first thing to note: if |
might be related to this discussion, but I don't have any clue... |
std::string tmpdir = getenv("TMPDIR"); | ||
if (tmpdir.size() > 50 ){ | ||
std::string command = "ln -s $PWD "+tmpdir+"/tmp;"; | ||
system(command.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not check if system returns a failure value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put printouts and confirm that things get created properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things to consider in addition to Chris's comment:
- What happens if this path already exists (but is owned by someone else)?
- What happens if
$TMPDIR
is not unique to the job (several jobs run in the same directory, potentially owned by different users)? - What happens if
$PWD
contains spaces or special characters in it? - Who cleans up this symlink when the job finishes?
As written, if there's a shared $TMPDIR
with a long name, one job will run successfully and then all subsequent ones on the worker node will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dr15Jones @bbockelm investigating a bit further I came across this
https://www.open-mpi.org/faq/?category=osx#startup-errors-with-open-mpi-2.0.x
and the conclusion is:
The workaround for the Open MPI 2.0.x and v2.1.x release series is to set the TMPDIR environment variable to /tmp or other short directory name.
as far as I see from the example, the directory created in the TMPDIR
path should be unique and related to the MPI session.
Would it be an acceptable solution to assing /tmp/
or /tmp/$USER/
as TMPDIR
value in SherpackUtilities.cc
? I don't see how we could move forward otherwise. Thanks a lot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most important, the written data seems to be cleaned up by MPI after use (i.e. if I point TMPDIR
to any directory, it is empty after the job has either succeeded or failed)
No description provided.