Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SiPixelAli PCL] Update the pede options to avoid issues with too many binaries open #28306

Merged
merged 1 commit into from
Oct 30, 2019

Conversation

mmusich
Copy link
Contributor

@mmusich mmusich commented Oct 29, 2019

PR description:

During Run-2 pp operations it was noticed that occasionally the PCL alignment workflow was stalled for very long runs (typically exceeding 1000 LS long), examples are run 317182 and run 317320.
In those relatively rare cases no measurement was available despite the large amount of tracks available.
Inspection of log files revealed that the issue lied in the inability of the pede routine to open the input binary files and access the data within them:

STOP FILETC: open error
%MSG-e MillePedeFileReader: AlignmentProducerAsAnalyzer:SiPixelAliPedeAlignmentProducer@endJob 10-Jun-2018 13:58:10 CEST PostGlobalEndRun
Could not read millepede result-file.
%MSG
%MSG-e MillePedeFileReader:  MillePedeDQMModule:SiPixelAliDQMModule@endJob 10-Jun-2018 13:58:10 CEST PostGlobalEndRun
Could not read millepede result-file.
%MSG 

The issues has been traced back in the meanwhile in having a too large amount of pede binary files open at the same time.
This issue has been solved in recent pede releases (cf. http://www.desy.de/~kleinwrt/MP2/doc/html/index.html), by providing the option closeandreopen (from the pede manual: sets flag keepOpen to zero to enable closing and reopening of binary files to limit the number of concurrently open files).
Such an option is available starting from the 19.04.12 revision and is available either in the privately distributed pede executable at /afs/cern.ch/user/c/ckleinw/bin/rev183/pedeor in the official MillePede release V04-06-00 (which has been requested here: cms-sw/cmsdist#5309)
This PR updates the configuration of SiPixelAliPedeAlignmentProducerin order to use the closeandreopen option and is a companion of cms-sw/cmsdist#5309 (N.B. the two should be tested together).

PR validation:

The previously failing PCL workflow for run 317320 re-generated via:

cmsDriver.py pedeStep --data --conditions 106X_dataRun3_Express_v2 --scenario pp --era Run2_2018 -s ALCAHARVEST:SiPixelAli --dasquery='file dataset=/StreamExpress/Run2018B-PromptCalibProdSiPixelAli-Express-v1/ALCAPROMPT run=317320' -n -1 

with the modifications proposed in this PR + adding the line:

process.SiPixelAliPedeAlignmentProducer.algoConfig.pedeSteerer.pedeCommand = "/afs/cern.ch/user/c/ckleinw/bin/rev183/pede"

in the configuration (equivalent to have V04-06-00as default executable) runs to completion (takes about 3h to complete).

image

if this PR is a backport please specify the original PR:

This PR is not a backport.
cc:
@adewit @connorpa @dmeuser

…nable the option closeandreopen: (from manual: Set flag keepOpen to zero to enable closing and reopening of binary files to limit the number of concurrently open files) to enable closing and reopening of binary files to limit the number of concurrently open files. The modification dates of the files are monitored to ensure data integrity.
@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28306/12518

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @mmusich (Marco Musich) for master.

It involves the following packages:

Alignment/CommonAlignmentProducer

@christopheralanwest, @tocheng, @cmsbuild, @franzoni, @tlampen, @pohsun can you please review it and eventually sign? Thanks.
@pakhotin, @adewit, @tocheng, @tlampen, @mschrode, @mmusich this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@christopheralanwest
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 29, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3230/console Started: 2019/10/29 21:18

@smuzaffar
Copy link
Contributor

please test with cms-sw/cmsdist#5309
as @mmusich mentioned, this should be tested with cmsdist PR.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 29, 2019

The tests are being triggered in jenkins.
Tested with other pull request(s) cms-sw/cmsdist#5309
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3231/console Started: 2019/10/29 21:43

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0d068f/3231/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2961036
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2960693
  • DQMHistoTests: Total skipped: 341
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@tlampen
Copy link
Contributor

tlampen commented Oct 30, 2019

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 1fc8c61 into cms-sw:master Oct 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants