Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results for Run-1 SIM step are sensitive to order of module execution #34448

Open
civanch opened this issue Jul 12, 2021 · 15 comments
Open

Results for Run-1 SIM step are sensitive to order of module execution #34448

civanch opened this issue Jul 12, 2021 · 15 comments

Comments

@civanch
Copy link
Contributor

civanch commented Jul 12, 2021

When OscarMTProducer was migrated to the new scheme of access to EventSetup (#34338), Run-1 WFs results have different simulation histories even for the 1st event.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 12, 2021

A new Issue was created by @civanch Vladimir Ivantchenko.

@Dr15Jones, @perrotta, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@civanch
Copy link
Contributor Author

civanch commented Jul 12, 2021

assign simulation, core

@cmsbuild
Copy link
Contributor

New categories assigned: core,simulation

@Dr15Jones,@smuzaffar,@civanch,@mdhildreth,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

The title is misleading because the reason is in the execution order of VolumeBasedMagneticFieldESProducerFromDB and XMLIdealGeometryESProducer.

@civanch civanch changed the title Results for Run-1 SIM step are sensitive to the EventSetup interface Results for Run-1 SIM step are sensitive to order of module execution Jul 12, 2021
@makortel
Copy link
Contributor

Let me copy here #34338 (comment)

I did some experiments in CMSSW_12_0_X_2021-07-07-2300 with this PR.

First I confirmed Vladimir's earlier observation that leaving only MagneticField to be read in the "old way" reproduces the reference results.

Then I added only the esConsumes() for the MagneticField, but still read it from EventSetup in the old way. That results in differences.

Logs from Tracer Service show that the relevant change in the order of ESProducers is that the VolumeBasedMagneticFieldESProducerFromDB (and PoolDBESSource for RunInfoRcd, MagFieldConfigRcd, and MFGeometryFileRcd) moved to be run before XMLIdealGeometryESProducer and MuonGeometryConstantsESModule with the esConsumes.

To check if the order of VolumeBasedMagneticFieldESProducerFromDB and XMLIdealGeometryESProducer would be important, I made VolumeBasedMagneticFieldESProducerFromDB to depend on the DDCompactView produced by XMLIdealGeometryESProducer. That test reproduced the reference results!

These symptoms indicate that something in what VolumeBasedMagneticFieldESProducerFromDB and XMLIdealGeometryESProducer do depends on their order (likely independently of #34338). @cms-sw/geometry-l2

@makortel
Copy link
Contributor

I've done some further experimentation.

I dumped the values of MagneticField by setting process.VolumeBasedMagneticFieldESProducer.debugBuilder = True, but the text dump was identical with both ESProducer orderings.

I also dumped the DDCompactView with GeometryInfoDump in OscarMTMasterThread, but the text dump was identical with both ESProducer orderings (also after changing the dump number format to %0.10e). (I tried to use PerfectGeometryAnalyzer first, but with that the ESProducer were always run in one of the orderings)

@civanch How did you produce the geometry dumps for your #34338 (comment) ?

The only difference found in one box from tracker barrel rotation matrix: in case of this PR

@VinInn
Copy link
Contributor

VinInn commented Jul 13, 2021

valgrind?

@civanch
Copy link
Contributor Author

civanch commented Jul 13, 2021

I use following option:
process.g4SimHits.FileNameGDML = 'cms2010_8July.gdml'
process.g4SimHits.FileNameField = 'cms2010_8JulyMF.txt'

AN observation of different rotation matrix mentioned in #34338 is not justified, because it is a printout - result may depend on precision of cout. Except this rotation matrix everything else was identical. There is a possibility that some modules use random numbers at initialisation (I do not know such cases but it is a possible theory), so this change of order of modules change histories.

@makortel
Copy link
Contributor

@civanch Thanks. I used both options and found zero differences between the two orderings of the ESProducers.

I'll try valgrind next (thanks @VinInn).

@makortel
Copy link
Contributor

Here is a test result that shows differences across the board in 9.0 of a PR that should not have any physics impact #34577 (comment)

@makortel
Copy link
Contributor

I had run valgrind on one of the ordering options, but it didn't reveal anything interesting.

I also collected stack traces with gdb breaking into the random number generation. I had patience to collect logs for 3.4M calls (that was not enough to complete one event), but comparing those showed no difference between the two ESProducer orders.

@civanch
Copy link
Contributor Author

civanch commented Aug 27, 2021

There is some instability in DD4Hep WF #34995 not yet understood.

@makortel
Copy link
Contributor

#37592 (comment) (and two earlier tests) shows lots of differences 9.0 workflow which is a Run 1 MC workflow (the PR itself is technical, but it causes many packages to be built). I'm reporting this mostly for the record, although it would be nice to understand the cause (but it is also incredibly laborious/difficult to investigate).

@makortel
Copy link
Contributor

Workflow 9.0 started to show spurious comparison differences again, I opened a separate issue #43415.

@makortel
Copy link
Contributor

#44271 (comment) showed lots of differences in 9.0 (the PR itself is technical)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants