Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix number of floats per hit #321

Merged
merged 1 commit into from
Apr 16, 2019

Conversation

dsperka
Copy link

@dsperka dsperka commented Apr 11, 2019

PR description:

This PR arises from my attempt to run the Pixel Track reconstruction validation on a local machine. The problem comes from the 10824.53 workflow when running with the --tool memcheck option. In the step3-memcheck.log file there is a fatal crash:

11-Apr-2019 11:43:35 EDT Initiating request to open file file:/cms/data/store/relval/CMSSW_10_4_0_pre2/RelValTTbar_13/GEN-SIM-DIGI-RAW/PU25ns_103X_upgrade2018_realistic_v8-v1/10000/29C415
A1-48E9-8445-A19C-49B84D1505ED.root
11-Apr-2019 11:43:36 EDT Successfully opened file file:/cms/data/store/relval/CMSSW_10_4_0_pre2/RelValTTbar_13/GEN-SIM-DIGI-RAW/PU25ns_103X_upgrade2018_realistic_v8-v1/10000/29C415A1-48E9
-8445-A19C-49B84D1505ED.root
Begin processing the 1st record. Run 1, Event 8503, LumiSection 171 on stream 0 at 11-Apr-2019 11:43:44.504 EDT
%MSG-e TkDetLayers: SeedingLayersEDProducer:pixelTracksSeedLayers 11-Apr-2019 11:43:46 EDT Run: 1 Event: 8503
ForwardDiskSectorBuilderFromDet: Trying to build Petal Wedge from Dets at different z positions !! Delta_z = -0.950417
%MSG
/data/user/fwyzard/patatrack/build/slc7_amd64_gcc700.patatrack106x/tmp/BUILDROOT/4798e156c43b9007e8153f352738cf66/opt/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_0_pre2_Patatrack/src/Reco
PixelVertexing/PixelTrackFitting/plugins/PixelTrackReconstructionGPU.cu, line 204: cudaErrorLaunchFailure: unspecified launch failure

In the cuda-memcheck.log file there is more information:

========= CUDA-MEMCHECK
========= Invalid global read of size 4
========= at 0x000007b0 in /data/user/fwyzard/patatrack/build/slc7_amd64_gcc700.patatrack106x/tmp/BUILDROOT/4798e156c43b9007e8153f352738cf66/opt/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW
10_6_0_pre2_Patatrack/src/RecoPixelVertexing/PixelTrackFitting/plugins/PixelTrackReconstructionGPU.cu:42:KernelFastFitAllHits(float*, int, int, float, Rfit::helix_fit*, Eigen::Matrix<doub
le, int=3, int=4, int=0, int=3, int=4>, Eigen::Matrix<float, int=6, int=4, int=0, int=6, int=4>, Rfit::circle_fit*, Eigen::Matrix<double, int=4, int=1, int=0, int=4, int=1>*, Rfit::line

fit*)
========= by thread (23,0,0) in block (11,0,0)
========= Address 0x7f59ff411940 is out of bounds

This appears to me to be a problem with the number floats assigned per hit. The code assumes 12 but there are only 9, 3 for position and 6 instead of 9 for position errors, presumably because the covariance matrix is assumed to be symmetric.

PR validation:

After this change the 10824.53 workflow when running with the --tool memcheck option runs successfully without any crash. I didn't do any comparison of performance before and after, but would be happy to if I could be pointed to which validation plots are the most relevant.

@fwyzard fwyzard added the Pixels Pixels-related developments label Apr 12, 2019
@fwyzard
Copy link

fwyzard commented Apr 16, 2019

Validation summary

Reference release CMSSW_10_6_0_pre2 at 1313262
Development branch CMSSW_10_6_X_Patatrack at 49f2c7f
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_5_0-PU25ns_105X_upgrade2018_realistic_v4_HS-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.52
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.53

/RelValZMM_13/CMSSW_10_5_0-105X_upgrade2018_realistic_v4_HS-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.52
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.53

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_5_0-PU25ns_105X_upgrade2018_realistic_v4_HS-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_5_0-105X_upgrade2018_realistic_v4_HS-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/5251f69ef435e4e15af92706aeda89eec91ddea2/log .

@fwyzard
Copy link

fwyzard commented Apr 16, 2019

@dsperka thanks for the fix.

Indeed, this recovers the correct efficiency for the .53 workflow.
Here once can see how "development-10824.53" is broken for Zmumu, and similarly for TTbar:

  reference-10824.5 development-10824.5 development-10824.53 testing-10824.53
Efficiency 0.5513 0.5513 0.2613 0.5515
Number of TrackingParticles (after cuts) 4103 4103 4103 4103
Number of matched TrackingParticles 2262 2262 1072 2263
Fake rate 0.0066 0.0069 0.0102 0.0076
Duplicate rate 0.0137 0.0112 0.0360 0.0157
Number of tracks 3945 3938 1862 3949
Number of true tracks 3919 3911 1843 3919
Number of fake tracks 26 27 19 30
Number of pileup tracks 0 0 0 0
Number of duplicate tracks 54 44 67 62

@fwyzard
Copy link

fwyzard commented Apr 16, 2019

No observable impact on performance, measured on the 10824.53 workflow with a P100 over TTbar events.

development:

Running 4 times over 4200 events with 1 jobs, each with 8 threads, 8 streams and 1 GPUs
  1725.8 ±   1.9 ev/s (4000 events)
  1719.0 ±   1.5 ev/s (4000 events)
  1720.0 ±   2.2 ev/s (4000 events)
  1739.5 ±   1.7 ev/s (4000 events)
 --------------------
  1726.1 ±   9.4 ev/s

testing:

Running 4 times over 4200 events with 1 jobs, each with 8 threads, 8 streams and 1 GPUs
  1737.9 ±   1.4 ev/s (4000 events)
  1734.1 ±   1.5 ev/s (4000 events)
  1726.4 ±   1.3 ev/s (4000 events)
  1707.8 ±   1.9 ev/s (4000 events)
 --------------------
  1726.5 ±  13.4 ev/s

@fwyzard fwyzard merged commit 30f4f4b into cms-patatrack:CMSSW_10_6_X_Patatrack Apr 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pixels Pixels-related developments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants