Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create infrastructure to allow cmsRun to control an external process #28792

Merged
merged 21 commits into from
Feb 7, 2020

Conversation

Dr15Jones
Copy link
Contributor

PR description:

This adds the FWCore/SharedMemory package which can be used to allow bi-directional communication between a cmsRun module and an external process running on the same node. This includes an example implementation in FWCore/Integration.

PR validation:

The code compiles and all new and existing framework unit tests pass.

This allows testing of memory systems.
The buffer also properly handles growing by alternating between
two different shared memory spaces.
Additional improvements to external process failures were also added.
The WorkerChannel and ControllerChannel encapsulate the inter-process
communication needed.
-Have the ability to also control EDFilters with keepEvents info
-Have external module configuration come from the python config.
The recommended way to have a signal handler hand off dealing with the signal to a different thread is to use a pipe between the signal handler and the thread.
@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

1 similar comment
@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28792/13475

  • This PR adds an extra 76KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2020

Pull request #28792 was updated. @makortel, @smuzaffar, @cmsbuild, @Dr15Jones can you please check and sign again.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2020

+1
Tested at: 252c9d9
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b77b1e/4533/summary.html
CMSSW: CMSSW_11_1_X_2020-02-06-1100
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2020

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2020

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b77b1e/4533/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2698043
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2697696
  • DQMHistoTests: Total skipped: 346
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@makortel
Copy link
Contributor

makortel commented Feb 6, 2020

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2020

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @silviodonato, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 7b25496 into cms-sw:master Feb 7, 2020
@silviodonato
Copy link
Contributor

@Dr15Jones a unit test of the newly created FWCore/SharedMemory is crashing
https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/slc7_amd64_gcc820/CMSSW_11_1_X_2020-02-07-1100/unitTestLogs/FWCore/SharedMemory
Could you have a look?

Looking at the PR tests https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b77b1e/4533/unitTests.log, I see that testFWCoreSharedMemoryMonitorThreadSignals was aborted for some reason.

===== Test "testFWCoreSharedMemoryMonitorThreadSignals" ====
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_11_1_X_2020-02-06-1100/src/FWCore/SharedMemory/test/test_monitor_thread_signals.sh: line 10: 13491 Aborted                 ${CMSSW_BASE}/test/${SCRAM_ARCH}/testFWCoreSharedMemoryMonitorThread 6 &>testFWCoreSharedMemoryMonitorThread.log
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_11_1_X_2020-02-06-1100/src/FWCore/SharedMemory/test/test_monitor_thread_signals.sh: line 22: 13551 Segmentation fault      ${CMSSW_BASE}/test/${SCRAM_ARCH}/testFWCoreSharedMemoryMonitorThread 11 &>testFWCoreSharedMemoryMonitorThread.log

---> test testFWCoreSharedMemoryMonitorThreadSignals succeeded
 
^^^^ End Test testFWCoreSharedMemoryMonitorThreadSignals ^^^^

@Dr15Jones
Copy link
Contributor Author

Looking at the PR tests https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b77b1e/4533/unitTests.log, I see that testFWCoreSharedMemoryMonitorThreadSignals was aborted for some reason.

That is exactly what the test is supposed to do. It internally sends an abort (and in another test a SIGSEGV) to be sure the program properly shutsdown. That is why the pull request test also had

---> test testFWCoreSharedMemoryMonitorThreadSignals succeeded

@Dr15Jones
Copy link
Contributor Author

The IB unit test failure has a return code of 127 which means the shell was unable to find the program. Presumably that means it could not find testFWCoreSharedMemoryMonitorThread because it was not built yet.

@smuzaffar what was the proper command to add to the BuildFile.xml directive to let scram know there is a dependency between two test executables?

@smuzaffar
Copy link
Contributor

It is PRE_TEST ( see examples here https://github.com/cms-sw/cmssw/search?q=PRE_TEST&unscoped_q=PRE_TEST )

@Dr15Jones
Copy link
Contributor Author

Turns out the test already does specify PRE_TEST.

@Dr15Jones
Copy link
Contributor Author

@smuzaffar I think the problem is the way I'm trying to find a built test

https://github.com/cms-sw/cmssw/blob/master/FWCore/SharedMemory/test/test_monitor_thread_signals.sh#L10

Do you have a better way to find it?

@smuzaffar
Copy link
Contributor

Note that test/arch is part of PATH during scram build runtests so just calling testFWCoreSharedMemoryMonitorThread should work but if you want test_monitor_thread_signals.sh to work standalone then you can do some thing like

testpath="testFWCoreSharedMemoryMonitorThread"
for dir in $CMSSW_BASE $CMSSW_RELEASE_BASE $CMSSW_FULL_RELEASE_BASE ; do
  if [ -e $dir/test/${SCRAM_ARCH}/testFWCoreSharedMemoryMonitorThread ] ; then
    testpath=$dir/test/${SCRAM_ARCH}/testFWCoreSharedMemoryMonitorThread
    break
  fi
done
$testpath 6 >& testFWCoreSharedMemoryMonitorThread.log
``

We can provide a helper script (which will always in path) to find a cmssw file e.g `cmssw-find test/arch/name` could return full path to `test/arch/name` by looking in to CMSSW_BASE, CMSSW_RELEASE_BASE and CMSSW_FULL_RELEASE_BASE.

@Dr15Jones Dr15Jones deleted the testInterProcess branch February 10, 2020 16:09
colizz added a commit to colizz/cmssw that referenced this pull request Jul 12, 2020
…(106X)

Backport from cms-sw#28792.
Additional changes:
 - FWCore/Integration/bin/interprocess.cc: L108: disable macro CMS_SA_ALLOW.
 - FWCore/Integration/test/TestInterProcessProd.cc: L3: add header "FWCore/Framework/interface/Run.h".
 - FWCore/Integration/test/test_TestInterProcessProd_cfg.py: L35,46: default way to specify process.maxEvents, process.options.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants