-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dire performance in CMSSW vs. stand-alone #22214
Comments
A new Issue was created by @intrepid42 Markus Seidel. @davidlange6, @Dr15Jones, @smuzaffar, @fabiocos can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign generators |
New categories assigned: generators @perrozzi,@efeyazgan you have been requested to review this Pull request/Issue and eventually sign? Thanks |
@smrenna Do you have an idea about this? Or should we update and retest (and cross fingers) with Dire 2.002 once it becomes available? |
I did talk to Stefan about this, and he says life is tough, Dire is slow.
On Mar 26, 2018, at 9:15 AM, Markus Seidel <[email protected]<mailto:[email protected]>> wrote:
@smrenna<https://github.com/smrenna> Do you have an idea about this? Or should we update and retest (and cross fingers) with Dire 2.002 once it becomes available?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#22214 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AFciJOzoi9A9MV5Uxz31Ww6qFdmk1K2mks5tiPf6gaJpZM4SFT7N>.
|
Hi Steve, yes, Dire is slow, but it is much slower (and even freezing) when running inside CMSSW, that's the problem! |
@alberto-sanchez will add a line to tomorrow's ORP gdoc to draw the attention of CMSSW experts as this seems to be related (in first place) to the CMSSW implementation. |
igprof.org
indeed without a profile its hard to know if the issue is in the build of Dire or the generator interface to it
… On May 14, 2018, at 3:18 PM, perrozzi ***@***.***> wrote:
@alberto-sanchez will add a line to tomorrow's ORP gdoc to draw the attention of CMSSW experts as this seems to be related (in first place) to the CMSSW implementation.
@intrepid42 could you run the tool (I don't have details though) to profile the memory/cpu consumption internally and externally to CMSSW? can you coordinte with Alberto?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
It crashes with igprof, see attached log. |
is something being forked?
(what is a recipe)?
… On May 14, 2018, at 4:04 PM, Markus Seidel ***@***.***> wrote:
It crashes with igprof, see attached log.
igtest.pp.log
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
You can easily reproduce the igprof error like this:
Note 1: Also fails with other Pythia test config files. |
Thanks - whats a good example of a problematically slow configuration?
… On May 14, 2018, at 4:34 PM, Markus Seidel ***@***.***> wrote:
You can easily reproduce the igprof error like this:
cmsrel CMSSW_10_1_3
cd CMSSW_10_1_3/src
cmsenv
wget --no-check-certificate https://raw.githubusercontent.com/cms-sw/cmssw/master/GeneratorInterface/Pythia8Interface/test/pythia8ex14_cfg.py
igprof cmsRun pythia8ex14_cfg.py
Note 1: Also fails with other Pythia test config files.
Note 2: This Dire example is e+e- and sufficiently fast. Maybe there is a drop in performance but it is not notable compared to pp collisions
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Regarding the use of igprof - you probably want to read the webpage I pointed you to for how to use it..
… On May 14, 2018, at 4:34 PM, Markus Seidel ***@***.***> wrote:
You can easily reproduce the igprof error like this:
cmsrel CMSSW_10_1_3
cd CMSSW_10_1_3/src
cmsenv
wget --no-check-certificate https://raw.githubusercontent.com/cms-sw/cmssw/master/GeneratorInterface/Pythia8Interface/test/pythia8ex14_cfg.py
igprof cmsRun pythia8ex14_cfg.py
Note 1: Also fails with other Pythia test config files.
Note 2: This Dire example is e+e- and sufficiently fast. Maybe there is a drop in performance but it is not notable compared to pp collisions
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
it looks like the performance bottleneck in the example given is
Pythia8::DireTimes::pT2nextQCD_FF
which appears to be limited by lots of string operations
david
… On May 14, 2018, at 5:20 PM, David Lange ***@***.***> wrote:
Regarding the use of igprof - you probably want to read the webpage I pointed you to for how to use it..
> On May 14, 2018, at 4:34 PM, Markus Seidel ***@***.***> wrote:
>
> You can easily reproduce the igprof error like this:
>
> cmsrel CMSSW_10_1_3
> cd CMSSW_10_1_3/src
> cmsenv
> wget --no-check-certificate https://raw.githubusercontent.com/cms-sw/cmssw/master/GeneratorInterface/Pythia8Interface/test/pythia8ex14_cfg.py
> igprof cmsRun pythia8ex14_cfg.py
>
> Note 1: Also fails with other Pythia test config files.
> Note 2: This Dire example is e+e- and sufficiently fast. Maybe there is a drop in performance but it is not notable compared to pp collisions
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
David, I went to the website, clicked on "Running igprof" and followed the longish command there that produced the log attached some posts ago. It fails with exactly the same error as the short version I can create a config with ttbar production + rivet, that usually fails... |
interesting - i only saw your minimal but incorrect command -
this works ok for me...
igprof -d -pp -o igprof.pp cmsRun pythia8ex14_cfg.py
… On May 14, 2018, at 5:40 PM, Markus Seidel ***@***.***> wrote:
David, I went to the website, clicked on "Running igprof" and followed the longish command there that produced the log attached some posts ago. It fails with exactly the same error as the short version igprof cmsRun ... in my minimal example above. Could you share with us the command that works, please
I can create a config with ttbar production + rivet, that usually fails...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thank you but same problem with that command :( "A fatal system signal has occurred: bus error" Do I need to run this on a special machine? (Using lxplus right now) As it works for you, could you test the following CMSSW configuration, please? This is Dire dijet events + Rivet analysis. It freezes for me after generating the first event: dire_rivet_cfg.py.txt |
which lxplus? I can try it - nothing special should be needed
as for this workflow - its 85% in
Pythia8::DireTimes::pTnext
so nothing different - just a lot slower:)
so how do you run the fast version of this workflow?
… On May 14, 2018, at 5:57 PM, Markus Seidel ***@***.***> wrote:
Thank you but same problem with that command :( "A fatal system signal has occurred: bus error" Do I need to run this on a special machine? (Using lxplus right now)
As it works for you, could you test the following CMSSW configuration, please? This is Dire dijet events + Rivet analysis. It freezes for me after generating the first event: dire_rivet_cfg.py.txt
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I'm on lxplus078... Did the dire_rivet_cfg.py config pass by the second event for you? For me it freezes completely, no event generated in an hour. In contrast, when I run without the CMSSW RivetInterface, it continues to (slowly) generate events, please try out this config for that case (pp dijet): dire_norivet_cfg.py.txt For the really fast experience, one needs to get Dire (https://dire.gitlab.io/Downloads/) and run From that comparison it seems that Dire is not awfully slow as stand-alone program but gets slower and slower when it is run inside CMSSW and more modules are added to the path/schedule. |
How much memory is the standalone and cmsRun versions using? You can just use |
On May 14, 2018, at 7:03 PM, Markus Seidel ***@***.***> wrote:
I'm on lxplus078...
Did the dire_rivet_cfg.py config pass by the second event for you? For me it freezes completely, no event generated in an hour.
In contrast, when I run without the CMSSW RivetInterface, it continues to (slowly) generate events, please try out this config for that case (pp dijet): dire_norivet_cfg.py.txt
-> 100 events in 8 min
For the really fast experience, one needs to get Dire (https://dire.gitlab.io/Downloads/) and run make dire01 && ./dire01 lhc.cmnd in the main directory.
-> 100 events in 40 s
From that comparison it seems that Dire is not awfully slow as stand-alone program but gets slower and slower when it is run inside CMSSW and more modules are added to the path/schedule.
0.4 Hz for event generator sounds pretty slow to me, but ok i'm old fashioned.:)
For dire01 - the cpu time is going to the same piece of string-comparison dominated code...
I also used the CMSSW dire library to run dire01 and get the same CPU performance with the standalone build. So I conclude there is no problem with the CMSSW build of dire. Thus its most likely a difference between the generator interface for dire and whats done in dire01 or a difference in the physics settings between the two examples.
anyway, I would suggest setting up the same physics problem in your cmssw example and your dire example and do a fair comparison. Naively the two configurations look quite different, so doing an absolute comparison is not necessarily reliable.
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi, thank you for making me check that again! It turns out that the shower cutoff value ( Still, the problem persits with running Dire in a more integrated CMSSW workflow with the standard pgen sequence and Rivet analysis, using exactly the same physics settings (with 0.9 as shower cut-off which should be fast). The resource usage does not look overwhelming:
vs
This is the CMSSW timing information for the first event:
And there it just hangs within |
How can I do that? (the dire_rivet_cfg.py you sent yesterday runs ok, but I think that is expected)
meanwhile if you run in the debugger, what is the traceback you get after the job hangs?
…
And there it just hangs within pythia.next(), after the first event was generated fine.
@davidlange6 Can you confirm this using the dire_rivet_cfg.py? What could be the reason for the additional modules causing Dire to freeze?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
No, for me and other people it freezes just after one or a few events! The igprof (crash) log is here: igtest.pp.log |
Yes - I was running the file you sent in this thread
https://github.com/cms-sw/cmssw/files/2001573/dire_rivet_cfg.py.txt
It runs for me in CMSSW_10_2_X_2018-05-13-2300 (up to 20 events)
as for gdb -
gdb cmsRun blah
wait for your program to get stuck
control-c
where
… On May 15, 2018, at 9:40 AM, Markus Seidel ***@***.***> wrote:
How can I do that? (the dire_rivet_cfg.py you sent yesterday runs ok, but I think that is expected)
No, for me and other people it freezes just after one or a few events!
Just to be sure, you are using the file with process.generation_step+=process.rivetAnalyzer as last line, and it runs without problems?
The igprof (crash) log is here: igtest.pp.log
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Ok, I can confirm that it works fine with |
the build of dire is the same. Pythia8 is updated in 102x. Maybe other changes in generator interfaces?
… On May 15, 2018, at 10:45 AM, Markus Seidel ***@***.***> wrote:
Ok, I can confirm that it works fine with CMSSW_10_2_X_2018-05-13-2300, indeed!
What is different with regard to 10_1? This would be needed for 2018 MC production I think...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I tried the stand-alone with different Pythia version, there was no change. Could it be something in the scheduling or so? I mean it seems very obvious to me now that Dire within CMSSW 10_1 works fine, unless we put additional modules in the schedule, that should normally not interfere with Dire. Changing to CMSSW 10_2 this interference seems to be gone... |
can you make that traceback?
… On May 15, 2018, at 10:53 AM, Markus Seidel ***@***.***> wrote:
I tried the stand-alone with different Pythia version, there was no change. Could it be something in the scheduling or so? I mean it seems very obvious to me now that Dire within CMSSW 10_1 works fine, unless we put additional modules in the schedule, that should normally not interfere with Dire. Changing to CMSSW 10_2 this interference seems to be gone...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Now I am left with gdb prompt again. Anything more I can do here? |
type "where"
… On May 15, 2018, at 10:58 AM, Markus Seidel ***@***.***> wrote:
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 on stream 0 at 15-May-2018 10:41:45.489 CEST
^C
Thread 1 "cmsRun" received signal SIGINT, Interrupt.
0x00007ffff52293ab in memchr () from /lib64/libc.so.6
(gdb)
Now I am left with gdb prompt again. Anything more I can do here?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Ok :)
|
so something deep inside pythia8/dire - the fact that its influenced by other stuff running or not running suggests memory corruption.
so which 101x release does it not work in? I tried CMSSW_10_1_X_2018-05-13-0000 which works ok. I've seen several memory related patches to pythia8 fly by in recent weeks..
… On May 15, 2018, at 11:04 AM, Markus Seidel ***@***.***> wrote:
Ok :)
(gdb) where
#0 0x00007ffff52293ab in memchr () from /lib64/libc.so.6
#1 0x00007ffff5b13690 in std::char_traits<char>::find ***@***.***: 80 'P', __n=8,
__s=0x7fffd4cd5d5a " \n\t\v\b\r\f\a")
at /mnt/build/davidlt/gcc630/b/BUILD/slc6_amd64_gcc630/external/gcc/6.3.0/gcc-tags_gcc_6_3_0_release-243837/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/char_traits.h:274
#2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::find_first_not_of (this=<optimized out>, __s=0x7fffd4cd5d5a " \n\t\v\b\r\f\a", __pos=0, __n=8)
at /mnt/build/davidlt/gcc630/b/BUILD/slc6_amd64_gcc630/external/gcc/6.3.0/gcc-tags_gcc_6_3_0_release-243837/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:1297
#3 0x00007fffd4b37166 in Pythia8::toLower(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/external/slc6_amd64_gcc630/lib/libpythia8.so
#4 0x00007fffd4b7fd2a in Pythia8::Settings::word(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/external/slc6_amd64_gcc630/lib/libpythia8.so
#5 0x00007fffd66ed53e in Pythia8::DireTimes::getMass(int, int, double) ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/external/slc6_amd64_gcc630/lib/libdire.so
#6 0x00007fffd66def25 in Pythia8::DireTimes::getNewSplitting(Pythia8::Event const&, Pythia8::DireTimesEnd*, double, double, double, double, double, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int&, int&, double&, double&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, double, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, double> > >&, double&) ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/external/slc6_amd64_gcc630/lib/libdire.so
#7 0x00007fffd66e67fe in Pythia8::DireTimes::pT2nextQCD_FI(double, double, Pythia8::DireTimesEnd&, Pythia8::Event const&) ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/external/slc6_amd64_gcc630/lib/libdire.so
#8 0x00007fffd66ea074 in Pythia8::DireTimes::pTnext(Pythia8::Event&, double, double, bool, bool)
()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/external/slc6_amd64_gcc630/lib/libdire.so
#9 0x00007fffd4aabad4 in Pythia8::PartonLevel::next(Pythia8::Event&, Pythia8::Event&) ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/external/slc6_amd64_gcc630/lib/libpythia8.so
#10 0x00007fffd4b2096b in Pythia8::Pythia::next() ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/external/slc6_amd64_gcc630/lib/libpythia8.so
#11 0x00007fffd767bb9f in Pythia8Hadronizer::generatePartonsAndHadronize() ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/pluginGeneratorInterfacePythia8Filters.so
#12 0x00007fffd76ae6b9 in edm::GeneratorFilter<Pythia8Hadronizer, gen::ExternalDecayDriver>::filter(edm::Event&, edm::EventSetup const&) ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/pluginGeneratorInterfacePythia8Filters.so
#13 0x00007ffff7d38711 in edm::one::EDFilterBase::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so
#14 0x00007ffff7c87142 in edm::WorkerT<edm::one::EDFilterBase>::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCallingContext const*) ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so
#15 0x00007ffff7bfa72a in decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so
#17 0x00007ffff7bfac5b in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) ()
from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so
#18 0x00007ffff7bfcf77 in void edm::SerialTaskQueueChain::actionToRun<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so
#19 0x00007ffff7bfd091 in edm::SerialTaskQueue::QueuedTask<void edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so
#20 0x00007ffff683942c in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7ffff2947200, parent=..., child=<optimized out>) at ../../src/tbb/custom_scheduler.h:509
#21 0x00007ffff7cd6c46 in edm::EventProcessor::processLumis(std::shared_ptr<void> const&) () from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so
#22 0x00007ffff7cdbe5f in edm::EventProcessor::runToCompletion() () from /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so
#23 0x000000000040e7b2 in main::{lambda()#1}::operator()() const ()
#24 0x000000000040d1aa in main ()
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Ok, scary! It freezes for me in CMSSW_10_1_3. |
running valgrind in 10_1_3 I get this likely cause that needs a fix from Pythia or in our interface to it (maybe we have gotten it recently I haven't followed closely).
I don't see it fixed in 10_1_X, so its presumably just "luck" when it works or not. [of course if I run the cfg inside of valgrind it works ok:)]
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 on stream 0 at 15-May-2018 11:32:27.505 CEST
==26194== Conditional jump or move depends on uninitialised value(s)
==26194== at 0x21ED6BEA: Pythia8::PartonLevel::next(Pythia8::Event&, Pythia8::Event&) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc630/external/pythia8/230-omkpbe3/lib/libpythia8.so)
==26194== by 0x21F4B96A: Pythia8::Pythia::next() (in /cvmfs/cms.cern.ch/slc6_amd64_gcc630/external/pythia8/230-omkpbe3/lib/libpythia8.so)
==26194== by 0x1F2C3B9E: Pythia8Hadronizer::generatePartonsAndHadronize() (in /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/pluginGeneratorInterfacePythia8Filters.so)
==26194== by 0x1F2F66B8: edm::GeneratorFilter<Pythia8Hadronizer, gen::ExternalDecayDriver>::filter(edm::Event&, edm::EventSetup const&) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/pluginGeneratorInterfacePythia8Filters.so)
==26194== by 0x4C88710: edm::one::EDFilterBase::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==26194== by 0x4BD7141: edm::WorkerT<edm::one::EDFilterBase>::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCallingContext const*) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==26194== by 0x4B4A729: decltype ({parm#1}()) edm::convertException::wrap<bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==26194== by 0x4B4A8E2: bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==26194== by 0x4B4AC5A: std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::MyPrincipal const&, edm::EventSetup const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==26194== by 0x4B4CF76: void edm::SerialTaskQueueChain::actionToRun<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==26194== by 0x4B4D090: edm::SerialTaskQueue::QueuedTask<void edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() (in /cvmfs/cms.cern.ch/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_1_3/lib/slc6_amd64_gcc630/libFWCoreFramework.so)
==26194== by 0x5FE942B: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) (custom_scheduler.h:509)
==26194==
… On May 15, 2018, at 11:22 AM, Markus Seidel ***@***.***> wrote:
Ok, scary! It freezes for me inCMSSW_10_1_3.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@intrepid42 I have compile pythia8 with the patches made by @kpedro88 at give it a try. |
Hi Alberto, it seemed to work in CMSSW_10_2_X_2018-05-13-2300 already while it was broken in CMSSW_10_1_3. Do you think these patches can fix the problem in the 10_1 series? (needed for MC generation) |
@intrepid42 , Hi Markus, Yes that should work. Try to use the same pythia library which I have compiled (define in pythia8.xml) and see if that work. |
I performed quick checks on the DIRE problem and see some weird things, which CMSSW_10_1_X_2018-05-28-1100 or CMSSW_10_2_X_2018-05-13-2300 which apparently have the same pythia8 version, at least 10_1_X_2018-05-28-1100. I compiled Kevin patch and include it in the 10_1, and 10_2 dev, without any obvious impact, it was already running, |
Hi Alberto, thanks a lot for checking this! So this seems to have been solved by some other PR in the dev releases? We may wait for the next release then and cross fingers that everything is solved! (I suppose |
(addendum: not sure if |
Is there any more progress on this issue? We would like to run DIRE in CMSSW and it would be very useful to learn the current situation. Thanks! |
The fix I made is included in the 10_2_X release cycle. |
@sensrcn Do you have been able to test on 10_2_x upwards?. If everything is OK. we can close this issue. |
I tried to run it with CMSSW_10_2_0 for QCD process, however, it was extremely slow and got stuck at a random event. It was ok with a newer release CMSSW_10_3_X_2018-07-30-2300. However, let me try to generate a larger event sample and see if everything goes smoothly. |
Hi Alberto, I test it with CMSSW_10_3_0_pre1 and it works. Thanks. |
Hi, I am having the hangup problems again with generating ttbar, both with CMSSW_10_1_7 (which worked for me before) and in CMSSW_10_3_0_pre1 (tested just now)... |
Update: I got the problems again because I have to regenerate my Dire sample with
to match the settings of the Pythia samples used for unfolding. It seems that the hangup is provoked by Update 2: Stefan Prestel will have a look at this |
@intrepid42 , do you still see this hangup? |
Hi Malik, I don't think the situation has changed. |
Hi, I hope somebody has an idea on this performance problem: I want to generate+analyze pp events using the Dire shower plugin but the code hangs in pythia::next().
pgen
+ some analysis modules)process.pgen
by doing onlyprocess.genparticles
seems to work also but veeery slow.CPU is 100%, memory consumption seems to be low.
The Dire integration was merged here: #22098
The text was updated successfully, but these errors were encountered: