Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DD4Hep uses a non-const global evaluator #32249

Closed
Dr15Jones opened this issue Nov 23, 2020 · 38 comments
Closed

DD4Hep uses a non-const global evaluator #32249

Dr15Jones opened this issue Nov 23, 2020 · 38 comments

Comments

@Dr15Jones
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Issue was created by @Dr15Jones Chris Jones.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor Author

assign geometry, core

@cmsbuild
Copy link
Contributor

New categories assigned: geometry,core

@Dr15Jones,@Dr15Jones,@smuzaffar,@cvuosalo,@mdhildreth,@makortel,@makortel,@ianna,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

This is effectively a duplicate of #32181

@cvuosalo
Copy link
Contributor

@ianna FYI

@cvuosalo
Copy link
Contributor

@Dr15Jones Thank you for finding this.

@Dr15Jones
Copy link
Contributor Author

So with what I feel is a disappointing 'fix' done in the DD4Hep code of adding a lock to work around the problem (the lock will prohibit cmsRun from efficiently scheduling work on different threads) could we instead remove/reduce the use of the problematic dd4hep calls? The one in question are standalone functions beginning with dd4hep::_

> git grep dd4hep::_
DetectorDescription/DDCMS/plugins/dd4hep/DDDefinitions2Objects.cc:  registry.specpars[specParName].numpars[name].emplace_back(dd4hep::_toDouble(value));
DetectorDescription/DDCMS/plugins/dd4hep/DDDefinitions2Objects.cc:        output.emplace_back(dd4hep::_toDouble(string(first, second)));
DetectorDescription/DDCMS/plugins/dd4hep/DDDefinitions2Objects.cc:              result.emplace_back(dd4hep::_toDouble(i));
DetectorDescription/DDCMS/src/DDFilteredView.cc:      result = dd4hep::_toDouble({svalue.data(), svalue.size()});
DetectorDescription/DDCMS/src/DDNamespace.cc:  dd4hep::_toDictionary(n, v, type);
SimG4Core/DD4hepGeometry/test/plugins/DD4hepTestDDDWorld.cc:    double gammacut = convertCmToMm(dd4hep::_toDouble({gammacutStr.data(), gammacutStr.size()}));
SimG4Core/DD4hepGeometry/test/plugins/DD4hepTestDDDWorld.cc:    double electroncut = convertCmToMm(dd4hep::_toDouble({electroncutStr.data(), electroncutStr.size()}));
SimG4Core/DD4hepGeometry/test/plugins/DD4hepTestDDDWorld.cc:    double positroncut = convertCmToMm(dd4hep::_toDouble({positroncutStr.data(), positroncutStr.size()}));
SimG4Core/DD4hepGeometry/test/plugins/DD4hepTestDDDWorld.cc:      protoncut = convertCmToMm(dd4hep::_toDouble({protoncutStr.data(), protoncutStr.size()}));
SimG4Core/Geometry/src/DDG4ProductionCuts.cc:    gammacut = convertCmToMm(dd4hep::_toDouble({gammacutStr.data(), gammacutStr.size()}));
SimG4Core/Geometry/src/DDG4ProductionCuts.cc:    electroncut = convertCmToMm(dd4hep::_toDouble({electroncutStr.data(), electroncutStr.size()}));
SimG4Core/Geometry/src/DDG4ProductionCuts.cc:    positroncut = convertCmToMm(dd4hep::_toDouble({positroncutStr.data(), positroncutStr.size()}));
SimG4Core/Geometry/src/DDG4ProductionCuts.cc:      protoncut = convertCmToMm(dd4hep::_toDouble({protoncutStr.data(), protoncutStr.size()}));

I'm guessing that the one we really need to change is the use in DDFilteredView.cc since that is intended to be called from many different modules and we want that to be thread efficient.

@Dr15Jones
Copy link
Contributor Author

As for any code we can't convert, it would be important to make sure that the thread-inefficient dd4hep calls are only being made from ESProducers and not from any EDProducers and then we can mark all the ESProducer's as have a shared resource which will tell cmsRun not to schedule running any of those modules concurrently.

@ianna
Copy link
Contributor

ianna commented Nov 27, 2020

@Dr15Jones and @cvuosalo - I think, I've mentioned it before - we should not be running form the XML files with DD4hep in RECO in production. It's not just the thread safety, but also an excessive performance penalty. We have demonstrated that the RECO geometries are identical. The SIM geometry is different in a way that we have to use DD4hep for both the XML parsing and the G4 geometry creation.

@makortel
Copy link
Contributor

SIM is run multithreaded in production, so it looks to me the a thread-safe and -efficient solution would be needed anyway.

@ianna
Copy link
Contributor

ianna commented Nov 27, 2020

SIM is run multithreaded in production, so it looks to me the a thread-safe and -efficient solution would be needed anyway.

Agree.

@dpiparo
Copy link
Contributor

dpiparo commented Nov 27, 2020

@ianna is it easy, in a nutshell, to understand why in sim we need to start from the xml? (I agree with @makortel of course, sim jobs are MT on the grid)

@ianna
Copy link
Contributor

ianna commented Nov 27, 2020

@ianna is it easy, in a nutshell, to understand why in sim we need to start from the xml? (I agree with @makortel of course, sim jobs are MT on the grid)

Detector description payload is an XML BLOB and we have no schema for storing a complete G4 geometry. I'm not even sure if a persistent G4 geometry is desirable. We do have RECO geometry schemas though.

@Dr15Jones
Copy link
Contributor Author

I do not believe it matters if we are using xml or not given that the change being made to DD4Hep will require a lock around all calls to dd4hep::_toDouble which will cause thread inefficiencies and we call that function even when the geometry comes from the DB.

@ianna
Copy link
Contributor

ianna commented Nov 27, 2020

I do not believe it matters if we are using xml or not given that the change being made to DD4Hep will require a lock around all calls to dd4hep::_toDouble which will cause thread inefficiencies and we call that function even when the geometry comes from the DB.

the latter should not be the case, we should not use DD4hep for processing RECO payloads

@civanch
Copy link
Contributor

civanch commented Nov 27, 2020

Hi all,

we should have a possibility to run both from XML and from DB. XML is needed for Phase-2, XML is needed for debugging and update the geometry. Is it production or not we have to have this capability.

Concerning SIM: for today we cannot expect building geometry in parallel in several streams. The geometry is built from master thread, both DDD and DD4Hep are not capable split the work. So, the initialisation is serial. At run time of SIM step, there is no need to call both DDD and DD4Hep.

Concerning utilities which break thread safety: I would use DDFilteredView, as Chris propose, if it is possible.

@ianna
Copy link
Contributor

ianna commented Nov 27, 2020

Ok, we are back to square one :-)

The main culprit: we do not use the same units as DD4hep. And instead of a proper units migration this work around was introduced where the SpecPars explicitly define units:

ProdCutsForGamma = 10*mm
ProdCutsForPositrons = 1*mm
ProdCutsForElectrons = 1*mm

This was a very bad idea since it means that a use of the DD4hep evaluator to resolve its own units is unavoidable. Instead of providing another work-around around this work-around why wouldn't we ask DD4hep to switch to the same units and revert the change in the production cuts?

@Dr15Jones
Copy link
Contributor Author

As a counter, given the triviality of the usage of the dd4hep::_toDouble parser, why not just create our own trivial parser which would be thread safe? Basically one that only allows the pattern [0-9][0-9]**(mm|cm|m)?

@ianna
Copy link
Contributor

ianna commented Nov 27, 2020

As a counter, given the triviality of the usage of the dd4hep::_toDouble parser, why not just create our own trivial parser which would be thread safe? Basically one that only allows the pattern [0-9][0-9]**(mm|cm|m)?

we do not know what mm or cm corresponds to :-)

@civanch
Copy link
Contributor

civanch commented Nov 27, 2020

@ianna , we do not have too many variants - only SIM and RECO.

@ianna
Copy link
Contributor

ianna commented Nov 27, 2020

@civanch - why don't we switch to the same units?

@civanch
Copy link
Contributor

civanch commented Nov 27, 2020

I do not think it is possible. Geant4 has huge number of users allover the world, CMSSW has huge number of analysis code. Moreover, I do not think that this is very big problem - only limited number places of unit conversion inside CMSSW.

@cvuosalo
Copy link
Contributor

Be careful in discussing units conventions. DD4hep currently follows the CMS/ROOT units convention, which is cm = 1. In CMSSW, we use the CMS/ROOT units convention for reco. CMS simulation and Geant4 use the Geant4 units convention where mm = 1. CMSSW also uses the Geant4 units convention when storing values in the Conditions DB. In CMSSW, we will always be switching back and forth between units conventions even if DD4hep switches conventions.

@cvuosalo
Copy link
Contributor

I am not understanding how the units convention used by DD4hep is related to thread safety. How would switching the units convention for DD4hep create thread safety?

@civanch
Copy link
Contributor

civanch commented Nov 27, 2020

Unit conversion has nothing to do with thread safety. As I understand, some DD4Hep utility methods may be thread unsafe, so these methods should not be used if this is possible.

@ianna
Copy link
Contributor

ianna commented Nov 30, 2020

Hopefully, fixed in #32339 ;-)

@Dr15Jones
Copy link
Contributor Author

@ianna Thanks for the change. Could we still have a problem if the MagField and the geometry are being constructed simultaneously?

@makortel
Copy link
Contributor

The SIM geometry is different in a way that we have to use DD4hep for both the XML parsing and the G4 geometry creation.

(sorry to sidetrack but I want to understand something related) How much is the current DDD used in the G4 geometry creation?

@Dr15Jones
Copy link
Contributor Author

@ianna Related to this, I wrote a simple thread-safe parser that could handle all to dd4hep::_toDouble work except for the case where it is dealing with a variable (i.e. the string contains [...]). I had originally thought we could use it in conjunction with _toDouble in DDFilteredView where that parser would be called first and then only if it couldn't do the conversion would we fall back to _toDouble.

With your change (which is the better way to go) I don't know if committing that simple parser now makes sense. What do you think?

@ianna
Copy link
Contributor

ianna commented Dec 1, 2020

@ianna Thanks for the change. Could we still have a problem if the MagField and the geometry are being constructed simultaneously?

Hypothetically yes, in reality no. Magnetic field is decoupled from the geometry construction via a DB payload.

@ianna
Copy link
Contributor

ianna commented Dec 1, 2020

The SIM geometry is different in a way that we have to use DD4hep for both the XML parsing and the G4 geometry creation.

(sorry to sidetrack but I want to understand something related) How much is the current DDD used in the G4 geometry creation?

Before DD4hep migration all of it :-) It's needed to parse the XML BLOB and build the G4 geometry.
After DD4hep migration none of it. DD4hep DDCMS interface package is used to parse the XML and DD4hep G4 builder is used to create the G4 geometry.

@ianna
Copy link
Contributor

ianna commented Dec 1, 2020

@ianna Related to this, I wrote a simple thread-safe parser that could handle all to dd4hep::_toDouble work except for the case where it is dealing with a variable (i.e. the string contains [...]). I had originally thought we could use it in conjunction with _toDouble in DDFilteredView where that parser would be called first and then only if it couldn't do the conversion would we fall back to _toDouble.

With your change (which is the better way to go) I don't know if committing that simple parser now makes sense. What do you think?

@Dr15Jones - it would be great if you could push it to a DD4hep repo to replace their parser. They do welcome contributions.

@ianna
Copy link
Contributor

ianna commented Dec 4, 2020

Hi @Dr15Jones - I've mentioned your work on a thread-safe evaluator at a DD4hep developers meeting yesterday. They'd be happy to discuss it with you when you make a PR to their repository: https://github.com/AIDASoft/DD4hep
BTW, there is no rush - it was the last meeting of 2020. There will not be any until mid January 2021.

@Dr15Jones
Copy link
Contributor Author

@ianna Sorry I didn't make what I did clearer, I didn't write a full evaluator, only one that handled the simple cases we tend to have in our XML. As such, it wasn't a useful general replacement. I did later go and modify the DD4hep evaluator and did the pull request.

@cvuosalo
Copy link
Contributor

cvuosalo commented Dec 9, 2020

+1

@cvuosalo
Copy link
Contributor

cvuosalo commented Dec 9, 2020

In CMSSW_11_3_X_2020-12-08-1100, DD4hep workflows 11624 and 11642 run successfully to completion with 1000 events in multi-threaded mode. I think this issue can be closed.

@makortel
Copy link
Contributor

+1

I think this issue can be closed.

I agree

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

@qliphy qliphy closed this as completed Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants