Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set TMPDIR variable to soft link pointing to PWD #21419

Merged
merged 4 commits into from
Nov 24, 2017

Conversation

perrozzi
Copy link

No description provided.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/PR-21419/2126

Code check has found code style and quality issues which could be resolved by applying a patch in https://cmssdt.cern.ch/SDT/code-checks/PR-21419/2126/git-diff.patch
e.g. curl https://cmssdt.cern.ch/SDT/code-checks/PR-21419/2126/git-diff.patch | patch -p1

You can run scram build code-checks to apply code checks directly

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/PR-21419/2127

@perrozzi
Copy link
Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 21, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/24589/console Started: 2017/11/21 18:43

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @perrozzi for master.

It involves the following packages:

GeneratorInterface/SherpaInterface

@cmsbuild, @efeyazgan, @perrozzi, @thuer, @govoni can you please review it and eventually sign? Thanks.
@alberto-sanchez, @agrohsje, @mkirsano, @thuer this is something you requested to watch as well.
@davidlange6, @slava77 you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-21419/24589/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 27
  • DQMHistoTests: Total histograms compared: 2833444
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2833265
  • DQMHistoTests: Total skipped: 178
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 1.00999999999 KiB( 23 files compared)
  • Checked 111 log files, 8 edm output root files, 27 DQM output files

@cmsbuild
Copy link
Contributor

Pull request #21419 was updated. @cmsbuild, @efeyazgan, @perrozzi, @thuer, @govoni can you please check and sign again.

@perrozzi
Copy link
Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 23, 2017

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/24662/console Started: 2017/11/23 22:16

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@davidlange6
Copy link
Contributor

+1

@davidlange6
Copy link
Contributor

merge

@davidlange6
Copy link
Contributor

hi @perrozzi - could you back port this to 93x?

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-21419/24662/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 27
  • DQMHistoTests: Total histograms compared: 2833444
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2833265
  • DQMHistoTests: Total skipped: 178
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 1.42999999996 KiB( 23 files compared)
  • Checked 111 log files, 8 edm output root files, 27 DQM output files

@perrozzi
Copy link
Author

can you please provide a cmsDriver command or cfg to reproduce the error?

@perrozzi
Copy link
Author

perrozzi commented Nov 28, 2017

I tried with CMSSW_10_0_X_2017-11-26-2300 after adding the commits of this PR

    Merge pull request #21419 from perrozzi/setenv
    set TMPDIR variable to soft link pointing to PWD
commit 565fdd170c8639a4382bc7df5fcaac85a044ee10
Merge: dc1e4ec5d67 e150ca74ccb
--
Author: perrozzi <[email protected]>
Date:   Thu Nov 23 22:12:03 2017 +0100

and using the command
cmsDriver.py sherpa_ZtoEE_0j_OpenLoops_13TeV_MASTER_cff -s GEN,VALIDATION:genvalid --relval 250000,5000 --eventcontent RAWSIM,DQM --datatier GEN,DQMIO --conditions auto:run2_mc_FULL
I don't observe any crash

@mrodozov
Copy link
Contributor

mrodozov commented Nov 28, 2017

Try again with CMSSW_10_0_X_2017-11-27-2300 , same command or
runTheMatrix -l 534.0 -i all. Fail both ways

@perrozzi
Copy link
Author

ok, I can reproduce the error.
it doesn't directly have to do with the modification, that goes smoothly,
but it has consequences that I am trying to understand when the TMPDIR is actually modified, in my case to
$TMPDIR = /tmp/perrozzi/tmp

@perrozzi
Copy link
Author

first thing to note: if /tmp/perrozzi/tmp/ is a directory, things work. if is a soft link, it crashes....

@perrozzi
Copy link
Author

might be related to this discussion, but I don't have any clue...
open-mpi/ompi#2328
anyway, this PR indeed doesn't seem to solve the issue
@Dr15Jones @davidlange6

std::string tmpdir = getenv("TMPDIR");
if (tmpdir.size() > 50 ){
std::string command = "ln -s $PWD "+tmpdir+"/tmp;";
system(command.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not check if system returns a failure value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put printouts and confirm that things get created properly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things to consider in addition to Chris's comment:

  • What happens if this path already exists (but is owned by someone else)?
  • What happens if $TMPDIR is not unique to the job (several jobs run in the same directory, potentially owned by different users)?
  • What happens if $PWD contains spaces or special characters in it?
  • Who cleans up this symlink when the job finishes?

As written, if there's a shared $TMPDIR with a long name, one job will run successfully and then all subsequent ones on the worker node will fail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dr15Jones @bbockelm investigating a bit further I came across this
https://www.open-mpi.org/faq/?category=osx#startup-errors-with-open-mpi-2.0.x
and the conclusion is:
The workaround for the Open MPI 2.0.x and v2.1.x release series is to set the TMPDIR environment variable to /tmp or other short directory name.
as far as I see from the example, the directory created in the TMPDIR path should be unique and related to the MPI session.
Would it be an acceptable solution to assing /tmp/ or /tmp/$USER/ as TMPDIR value in SherpackUtilities.cc? I don't see how we could move forward otherwise. Thanks a lot

Copy link
Author

@perrozzi perrozzi Dec 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most important, the written data seems to be cleaned up by MPI after use (i.e. if I point TMPDIR to any directory, it is empty after the job has either succeeded or failed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants