Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large GPU/CPU difference in soft electron reconstruction related to pixel unpacker #41715

Open
silviodonato opened this issue May 18, 2023 · 19 comments

Comments

@silviodonato
Copy link
Contributor

Dear all,

@gparida recently made a new CPU vs CPU+GPU comparison of the trigger result of the 2023 HLT menu.
The results showed a very large difference in the soft di-electron parking.
Basically HLT_DoubleEleXX_eta1p22_mMax6_v3 have a +25% of rate when running on GPU. The good news is that almost all events triggered by CPU are triggered also by GPU. This means that we are not loosing signal events at P5.

The di-electron paths were recently updated in CMSHLT-2635 with the usage of triplets, instead of doublets, in the electron reconstruction.

Minor differences were already visible in the old version of the path (based on doublets) here , but in that case the average rate of CPU and GPU was compatible.

I investigated a bit the problem, and I see that the differences are already visible in the pixel matching module (hltDoubleEle4eta1p22PixelMatchFilter) before the GSF tracking.

How to reproduce the problem

I copied in /afs/cern.ch/work/s/sdonato/public/GPU_May23/hlt_onlypixelmatching_dump.py a python config containing a fake HLT path running the pixel matching filter, and in /afs/cern.ch/work/s/sdonato/public/GPU_May23/skim.root 740 events passing all the filters before it.

Running

CUDA_VISIBLE_DEVICES=0 cmsRun hlt_onlypixelmatching_dump.py >& logGPU &
CUDA_VISIBLE_DEVICES= cmsRun hlt_onlypixelmatching_dump.py >& logCPU &

you can see that:

  • with GPU 82 events are accepted
  • without GPU 73 events are accepted.

The differences are still visible if you remove the following cuda branches

del process.hltEcalDigis.cuda
del process.hltEcalUncalibRecHit.cuda
del process.hltHbhereco.cuda
del process.hltPixelTracksSoA.cuda
del process.hltPixelVerticesSoA.cuda
del process.hltSiPixelRecHits.cuda
del process.hltSiPixelRecHitsSoA.cuda
del process.hltSiPixelClusters.cuda

but they disappear if you remove

del process.hltSiPixelDigis.cuda

This means that the origin of the difference is somehow in the pixel unpacker:

process.hltSiPixelClustersGPU = cms.EDProducer("SiPixelRawToClusterCUDA",
    CablingMapLabel = cms.string(''),
    IncludeErrors = cms.bool(True),
    InputLabel = cms.InputTag("rawDataCollector"),
    Regions = cms.PSet(

    ),
    UseQualityInfo = cms.bool(False),
    clusterThreshold_layer1 = cms.int32(4000),
    clusterThreshold_otherLayers = cms.int32(4000),
    isRun2 = cms.bool(False)
)

process.hltSiPixelDigiErrorsSoA = cms.EDProducer("SiPixelDigiErrorsSoAFromCUDA",
    src = cms.InputTag("hltSiPixelClustersGPU")
)

process.hltSiPixelDigisFromSoA = cms.EDProducer("SiPixelDigiErrorsFromSoA",
    CablingMapLabel = cms.string(''),
    ErrorList = cms.vint32(29),
    UsePhase1 = cms.bool(True),
    UserErrorList = cms.vint32(40),
    digiErrorSoASrc = cms.InputTag("hltSiPixelDigiErrorsSoA")
)

process.hltSiPixelDigisLegacy = cms.EDProducer("SiPixelRawToDigi",
    CablingMapLabel = cms.string(''),
    ErrorList = cms.vint32(29),
    IncludeErrors = cms.bool(True),
    InputLabel = cms.InputTag("rawDataCollector"),
    Regions = cms.PSet(

    ),
    SiPixelQualityLabel = cms.string(''),
    UsePhase1 = cms.bool(True),
    UsePilotBlade = cms.bool(False),
    UseQualityInfo = cms.bool(False),
    UserErrorList = cms.vint32()
)

process.hltSiPixelDigis = SwitchProducerCUDA(
    cpu = cms.EDAlias(
        hltSiPixelDigisLegacy = cms.VPSet(
            cms.PSet(
                type = cms.string('DetIdedmEDCollection')
            ),
            cms.PSet(
                type = cms.string('SiPixelRawDataErroredmDetSetVector')
            ),
            cms.PSet(
                type = cms.string('PixelFEDChanneledmNewDetSetVector')
            )
        )
    ),
    cuda = cms.EDAlias(
        hltSiPixelDigisFromSoA = cms.VPSet(cms.PSet(
            type = cms.string('*')
        ))
    )
)

@cms-sw/hlt-l2 @cms-sw/egamma-pog-l2 @cms-sw/trk-dpg-l2 @cms-sw/heterogeneous-l2

@cmsbuild
Copy link
Contributor

A new Issue was created by @silviodonato Silvio Donato.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@AdrianoDee
Copy link
Contributor

Note: the release is 13_0_5_patch1

@makortel
Copy link
Contributor

assign hlt, reconstruction, heterogeneous

@cmsbuild
Copy link
Contributor

New categories assigned: heterogeneous,hlt,reconstruction

@mandrenguyen,@missirol,@fwyzard,@clacaputo,@makortel,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

@VinInn
Copy link
Contributor

VinInn commented May 18, 2023

if one removes process.hltSiPixelDigis.cuda how the rest of the pixel workflow is supposed to run on GPU?

@silviodonato
Copy link
Contributor Author

Yes, indeed I wanted to say that you get different results even if you run everything on CPU except process.hltSiPixelDigis on GPU vs everything on CPU.

@VinInn
Copy link
Contributor

VinInn commented May 20, 2023 via email

@swagata87
Copy link
Contributor

Hello Silvio,
Thank you for reporting this.
While electron HLT paths are affected, it looks like the real issue is somewhere upstream.
Let me tag EGM HLT convenors @RSalvatico and @ravindkv and EGM Reco convenors @sameasy and @Prasant1993, who can follow it up.

@silviodonato
Copy link
Contributor Author

silviodonato commented Jun 1, 2023

Hello,
running on CPU and on GPU the pixel local sequence, I noticed that DetIdedmEDCollection_hltSiPixelDigis__HLTX. (edm::EDCollection<DetId> "hltSiPixelDigis") is different.
(CPU)

root [1] Events->Scan("DetIdedmEDCollection_hltSiPixelDigis__HLTX.obj.size()")
************************
*    Row   * DetIdedmE *
************************
*        0 *         0 *
************************

vs
(GPU)

root [1] Events->Scan("DetIdedmEDCollection_hltSiPixelDigis__HLTX.obj.size()")
************************
*    Row   * DetIdedmE *
************************
*        0 *         2 *
************************

not sure if this can be the cause of the large difference that appears in hltEgammaElectronPixelSeeds:

[email protected]() is 4 on CPU and 36 on GPU.

@silviodonato
Copy link
Contributor Author

ping @cms-sw/trk-dpg-l2

@VinInn
Copy link
Contributor

VinInn commented Jun 6, 2023

UserErrorList = cms.vint32(40)
vs
UserErrorList = cms.vint32()
?

But those are "user" errors. Should not affect reco
(still better to make the two config consistent....)

@AdrianoDee
Copy link
Contributor

AdrianoDee commented Jun 7, 2023

I've taken an event where we have differences (290th in the file provided). There is no difference in the local reco objects (clusters,digis and hits). The problem seems that on GPU we get extra SiPixelDigiErrors. These are stored in the hltPixelDigis and then are parsed to process.hltSiStripClusters = cms.EDProducer("MeasurementTrackerEventProducer" ... and are used to flag the "bad detectors". This then messes with the hltEgammaElectronPixelSeeds producer because trajectories are different (the trajectory builder checks if a module is active or not and a bad one is set to be inactive, if I understood well).

On GPU adding in RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterGPUKernel.cu something like:

diff --git a/RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterGPUKernel.cu b/RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterGPUKernel.cu
index 293d4422e84..ab06086d150 100644
--- a/RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterGPUKernel.cu
+++ b/RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterGPUKernel.cu
@@ -168,11 +169,11 @@ namespace pixelgpudetails {
   template <bool debug = false>
   __device__ uint8_t
   checkROC(uint32_t errorWord, uint8_t fedId, uint32_t link, const SiPixelROCsStatusAndMapping *cablingMap) {
     uint8_t errorType = (errorWord >> sipixelconstants::ROC_shift) & sipixelconstants::ERROR_mask;
     if (errorType < 25)
       return 0;
+    printf("errorType;%d;%d;%d;%d;%d;%d\n",errorType,errorWord,fedId+1200,link,sipixelconstants::ROC_shift,sipixelconstants::ERROR_mask);
     bool errorFound = false;

and for CPU side adding inRecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterGPUKernel.cu something like:

diff --git a/EventFilter/SiPixelRawToDigi/src/ErrorChecker.cc b/EventFilter/SiPixelRawToDigi/src/ErrorChecker.cc
index 9bde98bef92..10cd0154d6d 100644
--- a/EventFilter/SiPixelRawToDigi/src/ErrorChecker.cc
+++ b/EventFilter/SiPixelRawToDigi/src/ErrorChecker.cc
@@ -27,8 +28,11 @@ bool ErrorChecker::checkROC(bool& errorsInEvent,
                             Word32& errorWord,
                             SiPixelFormatterErrors& errors) const {
   int errorType = (errorWord >> ROC_shift) & ERROR_mask;
+
   if LIKELY (errorType < 25)
     return true;
+  
+  printf("errorType;%d;%d;%d;%d;%d;%d\n",errorType,errorWord,fedId,(errorWord >> LINK_shift) & LINK_mask,ROC_shift,ERROR_mask);

the GPU output has a couple of extra lines:

3358a3359
> errorType;29;1201668113;1208;17
3359a3361
> errorType;29;329252868;1227;4

Now why this is happening still I don't know. But the difference is there and it's causing, in this event, modules 303083540 to be not active when running on GPU while it is when running on CPU.

@mmusich
Copy link
Contributor

mmusich commented Jun 8, 2023

this change alone (which if I am not mistaken would put the FED error zoology treatment on par between GPU and CPU)

diff --git a/RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterGPUKernel.cu b/RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterGPUKernel.cu
index 293d4422e84..2f728fe024e 100644
--- a/RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterGPUKernel.cu
+++ b/RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterGPUKernel.cu
@@ -189,13 +189,13 @@ namespace pixelgpudetails {
       case (26): {
         if constexpr (debug)
           printf("Gap word found (errorType = 26)\n");
-        errorFound = true;
+        errorFound = false;
         break;
       }
       case (27): {
         if constexpr (debug)
           printf("Dummy word found (errorType = 27)\n");
-        errorFound = true;
+        errorFound = false;
         break;
       }
       case (28): {
@@ -208,8 +208,10 @@ namespace pixelgpudetails {
         if constexpr (debug)
           printf("Timeout on a channel (errorType = 29)\n");
         if ((errorWord >> sipixelconstants::OMIT_ERR_shift) & sipixelconstants::OMIT_ERR_mask) {
+          errorFound = false;
           if constexpr (debug)
             printf("...first errorType=29 error, this gets masked out\n");
+          break;
         }
         errorFound = true;
         break;

only gets the GPU count down of 1 (from 82 to 81 passing events).
Incidentally I realized we're not monitoring in the DQM the CPU vs GPU differences for the SiPixelDigiErrors (which instead we should, as demonstrated here).

@gparida
Copy link
Contributor

gparida commented Jun 16, 2023

Hi everyone, following up on the comments and using this commit from Marco - I reran the to check the differences between CPU and GPU for Double_Ele* paths. They are back to the 2-3% levels as before.

For example:
Before implementing the commit (GPU was accepting 20% more events )

1135985 298 360 64 -2 66 22.15 HLT_DoubleEle6p5_eta1p22_mMax6_v3

Now with the change:

1224276 708 702 5 -11 16 2.26 HLT_DoubleEle6p5_eta1p22_mMax6_v4

Similarly for the other Double_Ele paths.

The details are in this spreadsheet

@mmusich
Copy link
Contributor

mmusich commented Jun 20, 2023

@gparida can you please check the proposed fix at #42010 as well?

@gparida
Copy link
Contributor

gparida commented Jun 22, 2023

Hi everyone,
I checked with the PR #42010 on same Data and the release, With the PR, the differences are reduced by 8-9% level

Before :

1135985 154 190 36 0 36 23.38 HLT_DoubleEle9p5_eta1p22_mMax6_v3

Now with the PR:

915722 221 246 28 -3 31 14.0271 HLT_DoubleEle9p5_eta1p22_mMax6_v4

Similarly for the other DoubleEle* paths. Details are in this spreadsheet.

@mmusich
Copy link
Contributor

mmusich commented Oct 10, 2023

For the record there were two additional fixes:

@mmusich
Copy link
Contributor

mmusich commented Oct 14, 2023

PR #42977 was included in CMSSW_13_2_6_patch1, that went online on October 13th 2023 for HI collision runs starting from run 375083.
In the subsequent fill n. 9254 no further CPU vs GPU Pixel FED error mismatches were observed in online DQM, see e.g. for run 375110 .

For more information about the effect on the offline pp menu, see #42978 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants