Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQM: Restructure QTests #28316

Closed
wants to merge 36 commits into from
Closed

Conversation

schneiml
Copy link
Contributor

PR description:

This PR aims to clean up the way QTests are handled in DQM. Rather than spreading the logic all over the place, it is now all in a single module.

  • QualityTester becomes a plugin in DQMServices/Components, handling all QTest things.
  • The DQMServices/ClientConfig package disappears without replacement.
  • Some useful parts of ClientConfig are rescued over to Components, most prominently DQMGenericClient.
  • Some modules that used to use the XML parsing helpers from ClientConfig are now stubbed out (SiPixelMonitorClient/SiStripMonitorClient/SiStripCommissioningClients). Subsystem contacted to see if they need to be/can be ported.
  • The XML parsing is now done with boost::property_tree rather than xerces. Much less code, however more strict (some of the QTest configuration XML used in production was malformed!) and probably worse error reporting.
  • The QTest implementation remain in DQMServices/Core QTest.[h|cc]. However, the only place that uses them now is QualityTester.
  • QReports (results) remain available via the MonitorElement, but don't contain pointers to QCriterions (the actual tests) any more.
  • QualityTester is now a DQMEDHarvester, rather than a legacy module.
  • The ordering QualityTester vs. other harvesters is not enforced by dependencies. This may need to be fixed.

PR validation:

Runs a simple workflow. Opening this PR for more validation, while some functionality is still missing (XML parsing for tracker clients).

Need detailed comparisons over many workflows to see if all QTests are still applied in the same way There might be XML parsing differences, and the pattern matching is different in the new code (now using POSIX fnmatch rather than classlib).

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28316/12555

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28316/12557

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @schneiml (Marcel Schneider) for master.

It involves the following packages:

CalibMuon/DTCalibration
DQM/BeamMonitor
DQM/CSCMonitorModule
DQM/DTMonitorClient
DQM/DTMonitorModule
DQM/HLTEvF
DQM/HLXMonitor
DQM/HcalTasks
DQM/Integration
DQM/L1TMonitor
DQM/L1TMonitorClient
DQM/RPCMonitorClient
DQM/SiPixelCommon
DQM/SiPixelMonitorClient
DQM/SiPixelPhase1Config
DQM/SiPixelPhase1Summary
DQM/SiStripCommissioningClients
DQM/SiStripMonitorClient
DQM/SiStripMonitorSummary
DQM/TrackingMonitorClient
DQMOffline/CalibMuon
DQMOffline/Ecal
DQMOffline/JetMET
DQMOffline/L1Trigger
DQMOffline/Muon
DQMOffline/Trigger
DQMServices/ClientConfig
DQMServices/Components
DQMServices/Core
DQMServices/Examples
GeneratorInterface/RivetInterface
HLTriggerOffline/Common
HLTriggerOffline/Exotica
HLTriggerOffline/Higgs
HLTriggerOffline/JetMET
HLTriggerOffline/Muon
HLTriggerOffline/SUSYBSM
HLTriggerOffline/Tau
PhysicsTools/NanoAOD
Validation/L1T
Validation/RPCRecHits
Validation/RecoMuon
Validation/RecoTrack

@SiewYan, @andrius-k, @schneiml, @Martin-Grunewald, @rekovic, @fioriNTU, @tlampen, @alberto-sanchez, @pohsun, @santocch, @peruzzim, @cmsbuild, @agrohsje, @fwyzard, @efeyazgan, @tocheng, @jfernan2, @qliphy, @benkrikler, @mkirsano, @kmaeshima, @christopheralanwest, @franzoni, @fgolf can you please review it and eventually sign? Thanks.
@erikbutz, @rappoccio, @felicepantaleo, @jandrea, @missirol, @argiro, @fioriNTU, @alberto-sanchez, @ahinzmann, @threus, @mmusich, @seemasharmafnal, @venturia, @acimmino, @mmarionncern, @kreczko, @hdelanno, @battibass, @makortel, @jhgoh, @jdolen, @HuguesBrun, @agrohsje, @cericeci, @ptcox, @trocino, @rociovilar, @abbiendi, @barvic, @DryRun, @GiacomoSguazzoni, @tocheng, @VinInn, @jdamgov, @bellan, @nhanvtran, @gkasieczka, @rovere, @schoef, @idebruyn, @ebrondol, @mtosi, @dgulhan, @clelange, @mkirsano, @batinkov, @Fedespring, @calderona, @wmtford, @gpetruc, @mariadalfonso, @folguera this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@schneiml
Copy link
Contributor Author

please test

... to see what happens.

For actual integration, I should probably factor out all the configuration changes to get the number of changed files down a bit.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 30, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3271/console Started: 2019/10/30 18:01

@cmsbuild
Copy link
Contributor

-1

Tested at: 2b562a9

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f137dc/3271/summary.html

I found follow errors while testing this PR

Failed tests: RelVals

  • RelVals:

When I ran the RelVals I found an error in the following workflows:
9.0 step4

runTheMatrix-results/9.0_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST/step4_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST.log

25.0 step4
runTheMatrix-results/25.0_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT/step4_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT.log

4.53 step4
runTheMatrix-results/4.53_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT/step4_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT.log

136.731 step4
runTheMatrix-results/136.731_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT_skimSinglePh_HIPM+HARVESTDR2/step4_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT_skimSinglePh_HIPM+HARVESTDR2.log

1000.0 step4
runTheMatrix-results/1000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT/step4_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT.log

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2019

Jenkins tests are aborted.

This is stupid and wrong, but actually works.
@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2019

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2019

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28316/12657

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2019

@schneiml
Copy link
Contributor Author

schneiml commented Nov 7, 2019

please test

now we got chances for a pretty clean result.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3379/console Started: 2019/11/07 16:58

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2019

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2019

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2019

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f137dc/3379/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2936360
  • DQMHistoTests: Total failures: 41
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2935978
  • DQMHistoTests: Total skipped: 341
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 12.064 KiB( 33 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): 0.533 KiB Tracking/TrackParameters
  • DQMHistoSizes: changed ( 1000.0,... ): -0.093 KiB L1T/L1TOccupancy
  • DQMHistoSizes: changed ( 140.56,... ): 0.668 KiB Tracking/TrackParameters
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@schneiml
Copy link
Contributor Author

schneiml commented Nov 8, 2019

And indeed, things look much better. The changes that we now see need further investigation. The additional elements are as far as I remember QTests in Tracking which seem to be quite normal, not sure why they did not run in the past, and I don't think I care too much.

Edit: the 1 missing QTest is on https://tinyurl.com/y6yqoq5t, an L1T plot that I have yet to see filled anywhere. It is booked in harvesting, so this could easily be fixed by some dependencies/moving to endRun, but I don't think it's worth the trouble.

The other remaining changes are SiStrip/EventInfo/CertificationContents (plus probably related things) and a few changes in HCAL that seem to be spurious.

@schneiml
Copy link
Contributor Author

schneiml commented Nov 8, 2019

Ok, SiStripCertificationInfo is pretty much unfixable until we learn more about #28354. I tried moving it back by consumesMany<DQMToken, edm::InRun> like I did for siStripOfflineAnalyser, but it gets scheduled right before siStripOfflineAnalyser, where it probably should go after it to work correctly. I am not sure if there currently is a way to schedule things manually, and if there is, I have no idea how to do it.

Note that implementing #28364 will not fix this problem, here we need manual scheduling of legacy modules. For all I know, this is not a new problem; it just happened to work by chance in the past. Even when I go back to legacy EDAnalyzer for the two modules in question, EDM now orders them exactly the wrong way:

++++++ starting: global end run for module: label = 'siStripQTester' id = 9
++++++ finished: global end run for module: label = 'siStripQTester' id = 9
++++++ starting: global end run for module: label = 'siStripCertificationInfo' id = 65
++++++ finished: global end run for module: label = 'siStripCertificationInfo' id = 65
++++++ starting: global end run for module: label = 'siStripOfflineAnalyser' id = 10
++++++ finished: global end run for module: label = 'siStripOfflineAnalyser' id = 10

Before, they where just ordered by module ID (it seems), which would be perfectly correct.

@schneiml
Copy link
Contributor Author

superseded by #28379 (+ #28375, + another one not opened yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants