Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing global variables from EcalDQMCommonUtils #33200

Merged
merged 5 commits into from
Mar 17, 2021

Conversation

alejands
Copy link
Contributor

PR description:

An issue with with the access and storage of database variables in the ECAL DQM code was brought up here: #28858

Inside EcalDQMCommonUtils.cc, there were 4 variables defined globally that saved pointers from EventSetup and made these variables accessible through global functions. The code attempted to make this thread safe by implementing a mutex.

std::mutex mapMutex;
EcalElectronicsMapping const *electronicsMap(nullptr);
EcalTrigTowerConstituentsMap const *trigtowerMap(nullptr);
CaloGeometry const *geometry(nullptr);
CaloTopology const *topology(nullptr);

There were several issues with this:

1: Functions like getElectronicsMap() do not lock the mutex before accessing the global variable, leading to a potential data race with potentially undefined behavior.

2: CMSSW is moving to rely more on multi-threaded processing, and objects such as a mutex lead to blocking and poor multi-threading performance.

3: This approach does not support data that needs to be updated at IOV boundaries, which is a potential issue.

4: This violates rule 7-1 of CMS coding and style rules: “Do not use mutable global data (no globals).

5: This created dependencies between the modules that required them to be executed in a specific order that is not easy for developers to understand.

This PR removes the global variables and instead makes them member variables of each module. Most of this is done by defining the variables inside DQWorker, a class that the majority of ECAL DQM modules inherit from.

There are a few exceptions inside DQM/EcalMonitorDbModule, where some variables had to be defined separately for specific plugins that could not inherit easily from DQWorker.

A side effect of this approach is that a large portion of DQM/EcalCommon needed to be modified. Most of (if not all) of the classes defined here heavily relied on the fact that these variables were accessible from global functions. Now, the variables have to passed to every function that needs them, as the classes and functions here no longer had free access to these values.

Many of the functions inside MESet and its derived classes required multiple variables in order to work. For neatness, a struct was created, EcalDQMSetupObjects, that can pass all the variables at once, rather than having to pass 2, 3, or 4 additional variables to a given function.

In addition, overloaded functions, such as fill() for each MESet, had some versions that used the old global setup variables and some that didn’t. To avoid the compiler choosing the incorrect function during overload resolutions, each fill function (and other overloaded functions) is now passed a copy of these variables.

For example, these first two fill functions use the setup variables

void fill(EcalDQMSetupObjects const, DetId const &, double = 1., double = 0., double = 0.) override;
void fill(EcalDQMSetupObjects const, EcalElectronicsId const &, double = 1., double = 0., double = 0.) override;

While this third one does not, but is passed to them anyway.

void fill(EcalDQMSetupObjects const, int, double = 1., double = 1., double = 1.) override;

This is to avoid the compiler type casting a DetId or and EcalElectronicsId into an int. Otherwise, a call such as fill(detid) would compile but lead to undefined behavior and a crash at runtime, with the error not easy to spot.

In the end, each module now contains a copy of the EventSetup pointers and passes them to the EcalCommon functions that require them. As a toy example,

meSet.fill(GetEcalDQMSetupObjects(), detid);

PR validation:

The fix was validated on both physics and calibration ECAL online DQM-like workflows using data from 2018 pp collisions in stable beams. The output was compared to the output of the old code to ensure that there were no changes to the plots. This was checked using a private Online DQM GUI, checking every ECAL layout and ensuring that every plot was identical before and after this fix was implemented.

The fix was also validated on a full offline DQM relval workflow 136.874 using the runTheMatrix script

runTheMatrix.py -l 136.874 --ibeos

The output was compared before and after the fix using a private Offline DQM GUI, again checking every layout and plot and seeing no changes.

The memory usage was also compared before and after using the dqmStoreStats utility https://github.com/cms-sw/cmssw/blob/master/DQMServices/Components/python/DQMStoreStats_cfi.py as well as the SimpleMemoryCheck service https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMTimingAndMemory#SimpleMemoryCheck_service. The difference in peak memory usage increased by 40kb during processing, which we deemed to be negligible.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-33200/21624

  • This PR adds an extra 216KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-33200/21625

  • This PR adds an extra 476KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @alejands (Alejandro Sanchez) for master.

It involves the following packages:

DQM/EcalCommon
DQM/EcalMonitorClient
DQM/EcalMonitorDbModule
DQM/EcalMonitorTasks

@andrius-k, @kmaeshima, @ErnestaP, @ahmad3213, @cmsbuild, @jfernan2, @rvenditti can you please review it and eventually sign? Thanks.
@rchatter, @simonepigazzini, @thomreis, @argiro this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@jfernan2
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f5a726/13565/summary.html
COMMIT: c9e21b7
CMSSW: CMSSW_11_3_X_2021-03-16-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/33200/13565/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 6 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2639881
  • DQMHistoTests: Total failures: 14
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2639844
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 36 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 155 log files, 37 edm output root files, 37 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented Mar 17, 2021

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants