-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate file descriptor closure in DQMFileSaverPB fix #35881
Duplicate file descriptor closure in DQMFileSaverPB fix #35881
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35881/26279
|
A new Pull Request was created by @pmandrik for master. It involves the following packages:
@emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
-1 Failed Tests: RelVals-INPUT The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: RelVals-INPUT
Comparison SummarySummary:
|
please test |
-1 Failed Tests: RelVals-INPUT The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: RelVals-INPUT
Comparison SummarySummary:
|
please test |
-1 Failed Tests: RelVals-INPUT RelVals-INPUT
Comparison SummarySummary:
|
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9d6b00/20188/summary.html Comparison SummarySummary:
|
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
From Srecko Morovic mail :
"We have been seeing rare issues with writing output files in HLT with symptoms of the file sometimes being closed prematurely [1]. We saw similar issues in recent DAQ3 tests that used to happen when DQM File Saver is included (but was not investigated in detail at that time).
On inspecting the code, it turns out that in this module there is a protocol buffer file stream that is closed, and after that also the file descriptor [2] gets closed. In PB documentation [3] it's stated that the file is already closed by closing the PB stream, so the other close should not be necessary.
This could cause the race condition we're seeing, e.g. when in a multi-threaded setup some other thread opens the same file descriptor ID between two close calls. Maybe it also is the cause for other problems we have, such as frontier issues we see occasionally, which could be caused by socket fd close called prematurely.
In my private tests I see the empty output file problem disappearing when I remove line #326. Open fd count per process remains constant (so there is no fd leak introduced by this).
Conclusion from this is that line 326 should be removed. Please have a look and cross-check yourself.
[1] http://cmsonline.cern.ch/cms-elog/1127051
[2] https://github.com/cms-sw/cmssw/blob/master/DQMServices/FileIO/plugins/DQMFileSaverPB.cc#L326
[3] https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.zero_copy_stream_impl#FileOutputStream.Close.details
"
PR validation:
backport tested at p5 DQM playback