Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pixel local reco: reduce code duplication in SiPixelRawToDigi related code #33884

Merged

Conversation

czangela
Copy link
Contributor

PR description:

This PR should resolve some duplication issues, namely:

  1. from Open issues regarding the Pixel local reconstruction on GPU #32483 reuse existing constants in SiPixelRawToClusterGPUKernel
  2. Avoid code duplication in SiPixelRawToDigi ErrorChecker's #26731 avoid code duplication in SiPixelRawToDigi ErrorChecker's

Accordingly, the changes made can be categorized this way:

  1. Reusing different bit manipulation constants in all the ErrorCheckers, as well as PixelDataFormatter and SiPixelRawToClusterGPUKernel
    As part of this effort I introduced a new namespace with some constexpr variables and functions named sipixelconstants and placed it in DataFormats/SiPixelDigi. It should probably have a different name, and I suppose a more appropriate place would be DataFormats/SiPixelRawToDigi, but I let that the more experienced contributors decide. I just didn't want to do the fifth rebase yet.
  2. Move common functions from ErrorCheckerPhase0 and ErrorChecker (this is phase 1) to ErrorCheckerBase
    This is really not harmful.

PR validation:

Since this PR influences parts of the code which don't contribute to DQM histograms, there was a custom validation that took place.

Basically, I was just printing the FED errors before and after the changes and comparing them.

This is the description of this validation:


Phase 0 - workflow 136.731

  1. Run workflow, modify # of events to 4000
runTheMatrix.py -e -l 136.731 --ibeos -t 4 -j 0 --maxSteps=3
  1. Run step2 and step3
    During step3 log events with FED errors.

  2. Select events with at least one FED error:
    In repository /afs/cern.ch/work/a/aczirkos/public/errorchecker_validation/phase1
    the file runlumievent.txt contains 378 entries of run:lumi:event

  3. With edmPickEventy.py use this file to filter the events

edmPickEvents.py "SinglePhoton/Run2016B-v2/RAW" <filename> --crab
  1. Run with crab
crab submit -c pickevents_crab.py
crab getoutput

Example result file in repository /afs/cern.ch/work/a/aczirkos/public/errorchecker_validation/phase0 the file step2.root

  1. Run on reduced dataset step3 before and after the changes.

Use some custom LogDebug to track error collection. Make sure that no messages get dropped (indicated at the end of some generated log file).

Example of adding LogDebug entry:

diff --git a/EventFilter/SiPixelRawToDigi/src/PixelDataFormatter.cc b/EventFilter/SiPixelRawToDigi/src/PixelDataFormatter.cc
index 5557f806144..02a177018b5 100644
--- a/EventFilter/SiPixelRawToDigi/src/PixelDataFormatter.cc
+++ b/EventFilter/SiPixelRawToDigi/src/PixelDataFormatter.cc
@@ -417,6 +417,11 @@ void PixelDataFormatter::unpackFEDErrors(PixelDataFormatter::Errors const& error
                                          DetErrors& nodeterrors) {
   const uint32_t dummyDetId = 0xffffffff;
   for (const auto& [errorDetId, rawErrorsVec] : errors) {
+	for (auto const& aPixelError : rawErrorsVec) {
+        LogDebug("FedErrors") << "DETID: " << errorDetId << "; FED: " << aPixelError.getFedId()
+                              << ", TYPE: " << aPixelError.getType() << ", WORD: " << aPixelError.getWord32()
+                              << ", MSG: " << aPixelError.getMessage();
+      }
     if (errorDetId == dummyDetId) {  // errors given dummy detId must be sorted by Fed
       nodeterrors.insert(nodeterrors.end(), rawErrorsVec.begin(), rawErrorsVec.end());
     } else {

Example MessageLogger entry for cmsDriver.py python config file:

# Output info to file detailedInfo.log
process.MessageLogger = cms.Service("MessageLogger",
    destinations = cms.untracked.vstring('detailedInfo'),
    categories         = cms.untracked.vstring('FedErrors'),
    debugModules  = cms.untracked.vstring('*'),
        detailedInfo          = cms.untracked.PSet(
        threshold =  cms.untracked.string('DEBUG'),
        default    =  cms.untracked.PSet(limit = cms.untracked.int32(0)),
        INFO       =  cms.untracked.PSet(limit = cms.untracked.int32(0)),
        DEBUG   = cms.untracked.PSet(limit = cms.untracked.int32(0)),
        FedErrors = cms.untracked.PSet(limit = cms.untracked.int32(10000000))
    )
)
  1. Compare the two log files

Extract lines with FED errors

grep DETID detailedInfo.log > errors01.log
Results
Number # of FED errors by type in the validation data
      1 TYPE:30
     15 TYPE:31
    148 TYPE:37
    249 TYPE:40

In the repository /afs/cern.ch/work/a/aczirkos/public/errorchecker_validation/phase0 there are the log files, root file and python config of the validation:

ls -halt /afs/cern.ch/work/a/aczirkos/public/errorchecker_validation/phase0 
total 238M
drwxr-xr-x. 2 aczirkos zh 2,0K máj  28 15.38 .
-rw-r--r--. 1 aczirkos zh  62K máj  28 15.38 step3.py
drwxr-xr-x. 4 aczirkos zh 2,0K máj  28 15.31 ..
-rw-r--r--. 1 aczirkos zh  36K máj  25 16.01 errors_01.log
-rw-r--r--. 1 aczirkos zh  36K máj  25 16.01 errors_02.log
-rw-r--r--. 1 aczirkos zh 7,1K máj  25 16.01 select_events.txt
-rw-r--r--. 1 aczirkos zh 237M máj  25 14.37 step2.root

One can compare the errors (in bash shell) such as

diff <(sort errors_01.log) <(sort errors_02.log)

In this example errors_01.log is before and errors_02.log is after changes.
They should have no differences.


Phase 1 workflow 136.816

  1. Run workflow, only single-threaded
runTheMatrix.py -e -l 136.816 --ibeos -t 1 -j 0 --maxSteps=3

Since FED errors are more common in the phase 1 data, running on 100 events will suffice.

  1. Basically do the same as with phase 0 just without steps 2-5. (since almost all events have errors)
Results
Number # of FED errors by type in the validation data
  21500 TYPE:25
     93 TYPE:30
     55 TYPE:31
   6683 TYPE:37

Now from the repository /afs/cern.ch/work/a/aczirkos/public/errorchecker_validation/phase1

ls -halt /afs/cern.ch/work/a/aczirkos/public/errorchecker_validation/phase1/
total 882M
drwxr-xr-x. 2 aczirkos zh 2,0K máj  28 17.26 .
-rw-r--r--. 1 aczirkos zh 2,7M máj  28 17.25 errors_02.log
-rw-r--r--. 1 aczirkos zh 2,7M máj  28 16.49 errors_01.log
-rw-r--r--. 1 aczirkos zh  20K máj  28 15.38 step3.py
-rw-r--r--. 1 aczirkos zh 876M máj  28 15.36 step2.root
drwxr-xr-x. 4 aczirkos zh 2,0K máj  28 15.31 ..

And comparing the results (again, errors_01.log is before and errors_02.log is after changes)

diff <(sort errors_01.log) <(sort errors_02.log)

Again, no differences should appear.

if this PR is a backport please specify the original PR and why you need to backport that PR:

it is not

czangela added 23 commits May 5, 2021 14:12
…orCheckerBase

 - common functions: setErrorStatus, checkCRC, checkHeader, checkHeader, checkTrailer
 - copy bit manipulation constants to ErrorCheckerBase
…eckerPhase0 to SiPixelDigiConstants

 - remove duplicated typedefs for Word32 and Word64
…elDigiConstants

 - special bits and shifts for phase 1 layer 1 modules -> inline namespace
 - transfer comments from PixelDataFormatter
…it manipulation

 - constexpr functions are also callable from device if compiled with the right flag
 - separate function to determine errorType and to issue the log message
 - add to error collection if the returned errorType is non-zero and includeErrors is true
 - avoid multiple return statements
 - more specifically the default constructors in ErrorChecker and ErrorCheckerPhase0
…iple files

 - changed in ErrorChecker, ErrorCheckerBase and ErrorCheckerPhase0
 - between SiPixelDigiContants.h and SiPixelRawToClusterGPUKernel.h
 - in SiPixelRawToClusterGPUKernel.cu
 - except for setErrorStatus
 - change inheritance to protected in ErrorChecker*
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @czangela for master.

It involves the following packages:

DataFormats/SiPixelDigi
EventFilter/SiPixelRawToDigi
RecoLocalTracker/SiPixelClusterizer

@perrotta, @civanch, @mdhildreth, @cmsbuild, @slava77, @jpata can you please review it and eventually sign? Thanks.
@mtosi, @OzAmram, @makortel, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @Martin-Grunewald, @ferencek, @dkotlins, @gpetruc, @threus, @tvami this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented May 30, 2021

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-91f7e5/15427/summary.html
COMMIT: 4911f50
CMSSW: CMSSW_12_0_X_2021-05-29-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/33884/15427/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 7 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2650486
  • DQMHistoTests: Total failures: 12
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2650451
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 36 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 155 log files, 37 edm output root files, 37 DQM output files
  • TriggerResults: no differences found

@civanch
Copy link
Contributor

civanch commented Jun 2, 2021

+1

@slava77
Copy link
Contributor

slava77 commented Jun 3, 2021

enable gpu

@slava77
Copy link
Contributor

slava77 commented Jun 3, 2021

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 3, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-91f7e5/15585/summary.html
COMMIT: 4911f50
CMSSW: CMSSW_12_0_X_2021-06-02-2300/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/33884/15585/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 9533
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 9533
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2650486
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2650457
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 36 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 155 log files, 37 edm output root files, 37 DQM output files
  • TriggerResults: no differences found

@jpata
Copy link
Contributor

jpata commented Jun 4, 2021

+reconstruction

  • technical, no reco changes expected or observed neither in standard or GPU workflows

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2021

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented Jun 5, 2021

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants