-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DD4Hep uses a non-const global evaluator #32249
Comments
A new Issue was created by @Dr15Jones Chris Jones. @Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign geometry, core |
New categories assigned: geometry,core @Dr15Jones,@Dr15Jones,@smuzaffar,@cvuosalo,@mdhildreth,@makortel,@makortel,@ianna,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks |
This is effectively a duplicate of #32181 |
@ianna FYI |
@Dr15Jones Thank you for finding this. |
So with what I feel is a disappointing 'fix' done in the DD4Hep code of adding a lock to work around the problem (the lock will prohibit cmsRun from efficiently scheduling work on different threads) could we instead remove/reduce the use of the problematic dd4hep calls? The one in question are standalone functions beginning with
I'm guessing that the one we really need to change is the use in DDFilteredView.cc since that is intended to be called from many different modules and we want that to be thread efficient. |
As for any code we can't convert, it would be important to make sure that the thread-inefficient dd4hep calls are only being made from ESProducers and not from any EDProducers and then we can mark all the ESProducer's as have a shared resource which will tell cmsRun not to schedule running any of those modules concurrently. |
@Dr15Jones and @cvuosalo - I think, I've mentioned it before - we should not be running form the XML files with DD4hep in RECO in production. It's not just the thread safety, but also an excessive performance penalty. We have demonstrated that the RECO geometries are identical. The SIM geometry is different in a way that we have to use DD4hep for both the XML parsing and the G4 geometry creation. |
SIM is run multithreaded in production, so it looks to me the a thread-safe and -efficient solution would be needed anyway. |
Agree. |
Detector description payload is an XML BLOB and we have no schema for storing a complete G4 geometry. I'm not even sure if a persistent G4 geometry is desirable. We do have RECO geometry schemas though. |
I do not believe it matters if we are using xml or not given that the change being made to DD4Hep will require a lock around all calls to dd4hep::_toDouble which will cause thread inefficiencies and we call that function even when the geometry comes from the DB. |
the latter should not be the case, we should not use DD4hep for processing RECO payloads |
Hi all, we should have a possibility to run both from XML and from DB. XML is needed for Phase-2, XML is needed for debugging and update the geometry. Is it production or not we have to have this capability. Concerning SIM: for today we cannot expect building geometry in parallel in several streams. The geometry is built from master thread, both DDD and DD4Hep are not capable split the work. So, the initialisation is serial. At run time of SIM step, there is no need to call both DDD and DD4Hep. Concerning utilities which break thread safety: I would use DDFilteredView, as Chris propose, if it is possible. |
Ok, we are back to square one :-) The main culprit: we do not use the same units as DD4hep. And instead of a proper units migration this work around was introduced where the SpecPars explicitly define units:
This was a very bad idea since it means that a use of the DD4hep evaluator to resolve its own units is unavoidable. Instead of providing another work-around around this work-around why wouldn't we ask DD4hep to switch to the same units and revert the change in the production cuts? |
As a counter, given the triviality of the usage of the dd4hep::_toDouble parser, why not just create our own trivial parser which would be thread safe? Basically one that only allows the pattern [0-9][0-9]**(mm|cm|m)? |
we do not know what |
@ianna , we do not have too many variants - only SIM and RECO. |
@civanch - why don't we switch to the same units? |
I do not think it is possible. Geant4 has huge number of users allover the world, CMSSW has huge number of analysis code. Moreover, I do not think that this is very big problem - only limited number places of unit conversion inside CMSSW. |
Be careful in discussing units conventions. DD4hep currently follows the CMS/ROOT units convention, which is cm = 1. In CMSSW, we use the CMS/ROOT units convention for reco. CMS simulation and Geant4 use the Geant4 units convention where mm = 1. CMSSW also uses the Geant4 units convention when storing values in the Conditions DB. In CMSSW, we will always be switching back and forth between units conventions even if DD4hep switches conventions. |
I am not understanding how the units convention used by DD4hep is related to thread safety. How would switching the units convention for DD4hep create thread safety? |
Unit conversion has nothing to do with thread safety. As I understand, some DD4Hep utility methods may be thread unsafe, so these methods should not be used if this is possible. |
Hopefully, fixed in #32339 ;-) |
@ianna Thanks for the change. Could we still have a problem if the MagField and the geometry are being constructed simultaneously? |
(sorry to sidetrack but I want to understand something related) How much is the current DDD used in the G4 geometry creation? |
@ianna Related to this, I wrote a simple thread-safe parser that could handle all to With your change (which is the better way to go) I don't know if committing that simple parser now makes sense. What do you think? |
Hypothetically yes, in reality no. Magnetic field is decoupled from the geometry construction via a DB payload. |
Before DD4hep migration all of it :-) It's needed to parse the XML BLOB and build the G4 geometry. |
@Dr15Jones - it would be great if you could push it to a DD4hep repo to replace their parser. They do welcome contributions. |
Hi @Dr15Jones - I've mentioned your work on a thread-safe evaluator at a DD4hep developers meeting yesterday. They'd be happy to discuss it with you when you make a PR to their repository: https://github.com/AIDASoft/DD4hep |
@ianna Sorry I didn't make what I did clearer, I didn't write a full evaluator, only one that handled the simple cases we tend to have in our XML. As such, it wasn't a useful general replacement. I did later go and modify the DD4hep evaluator and did the pull request. |
+1 |
In CMSSW_11_3_X_2020-12-08-1100, DD4hep workflows 11624 and 11642 run successfully to completion with 1000 events in multi-threaded mode. I think this issue can be closed. |
+1
I agree |
This issue is fully signed and ready to be closed. |
With the crash seen in https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc820/CMSSW_11_2_X_2020-11-22-2300/pyRelValMatrixLogs/run/11624.911_TTbar_13+2021_DD4hep+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST+ALCA/step2_TTbar_13+2021_DD4hep+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST+ALCA.log#/150-150
I went looking throught the DD4Hep code and found that the evaluator that crashed is a non-const global variable.
https://github.com/cms-externals/DD4hep/blob/master/DDCore/src/Handle.cpp#L33
This is very bad.
The text was updated successfully, but these errors were encountered: