-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve IndexIntoFile for concurrent lumis/runs #37532
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37532/29244
|
A new Pull Request was created by @wddgit (W. David Dagenhart) for master. It involves the following packages:
@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-37b6b7/23827/summary.html Comparison SummarySummary:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made ti through the first two files (stopped before the first tests)
/// 3. All runs and lumis associated with a run should be processed | ||
/// when the last contiguous sequence of events for that run is processed. | ||
/// If a run has no events, then it is interspersed within that sequence | ||
/// of runs according to its run TTree entry number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is a source supposed to tell the framework that all parts of a Lumi or a Run have now been read from the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Framework will know all parts were read if shouldProcessRun() returned true for any Run entry in a contiguous sequence of Run entries for the same Run and then a different Run is encountered (or the end of all input). For a single input file, the run entries that should be read will always be contiguous (by design) so they can all be read and merged in one contiguous pass. Earlier entries can occur, but only with shouldProcessRun() returning false.
Most of the time, there will be a single contiguous sequence of Run entries where shouldProcessRun() returns true for all of them and only for those Run entries. There is one special case when crossing a file boundary that allows shouldProcessRun to return false just after a file boundary. In that case, the RunPrincipal remembers it has products merged into it that still need to be written, but does not merge the products from that Run entry (or entries) after the file boundary.
I hope that is clear. I know this is complicated.
Trying to clarify so the comment is easier to understand.
Move some into Info class, make const, or change to return output value
ec0a81d
to
087e4ea
Compare
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-37b6b7/24345/summary.html Comparison SummarySummary:
|
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
@wddgit @makortel @Dr15Jones is any effect on performance expected? |
I am not expecting any significant performance change. My first inclination was that profiling was not necessary. I have not run a profiler with CMS code in a while. Does igprof still work or are we using something else these days? For me it would take some time to figure out how to do this. Is there an established procedure documented somewhere? If you want me to do this, I will do it. If there is already an automated way to run the profiler and compare or an expert willing to test it, that would be great. Most of the code changes are only used when noRunLumiSort mode is enabled. I am not aware of any relVal or runTheMatrix type tests that use that mode. I added some unit tests that run it. noRunLumiSort mode is currently used when merging files when we want fast cloning to occur and concurrent lumis might have scrambled the event order at luminosity block boundaries. Also one would expect any performance issues to be more likely to show up in a file created by skimming that included a large number of Runs and Lumis. That kind of input file would challenge this new code the most. But even in those cases, I would guess the effect would be small. Most of the work occurs once per input file. It's not in a tight loop that occurs many times. In the normal iteration modes, the changes are small. There are some changes when filling IndexIntoFile, but those might even make it faster by a tiny amount. It would probably indicate a bug somewhere if the performance changed significantly, because it was a design goal to perturb the other iteration modes as little as possible. |
enable profiling |
@cmsbuild, please test |
I just launched automated profiling, let's see what it gives (although I agree with David that the impact on regular workflows should be negligible). |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-37b6b7/24453/summary.html Comparison SummaryThere are some workflows for which there are errors in the baseline: Summary:
|
+1 |
PR description:
Modifies the class IndexIntoFile, so that it can handle concurrent runs and better handle concurrent lumis. This PR does not include the implementation of concurrent runs. That will be submitted in a separate PR in the future. There should not be any further changes needed in IndexIntoFile.h or IndexIntoFile.cc in that future PR.
This PR includes heavy edits to the code that supports the "noRunLumiSort" ordering in IndexIntoFile (mostly in the nested class IndexIntoFile::IndexIntoFileItrEntryOrder). That ordering is currently used in some contexts to support file merging with files that have been created with concurrent lumis. That ordering allows fast cloning even when concurrent lumis have scrambled the event order in the files at lumi boundaries. Before concurrent lumis, events from a luminosity block would all be written contiguously into the output file. With concurrent lumis, events from a following lumi can be written before all the events from the preceding lumi are written. This is because the time to process an event can vary. The same thing will also happen when concurrent runs are implemented at run boundaries.
This also modifies the code in IndexIntoFile used by output modules to build the new IndexIntoFile written into a new output file.
There are some minimal changes outside IndexIntoFile needed to deal with the changes in the IndexIntoFile interface and behavior.
One might or might not consider this to be a bug fix. The existing version of the code should be working properly in the contexts where it is currently used. No one has reported any problems and the problem should be obvious if it occurs. On the other hand, it is currently possible to create files where the events in a run are not contiguous by file merging. This is similar to what will also happen with concurrent runs. If one is reading with "noRunLumiSort" mode, then these noncontiguous events from different runs can cause an assert failure in the Framework. We might consider backporting this change to 12_2_X and 12_3_X. Before those release series, "noRunLumiSort" mode did not exist and there was not a problem. One might hesitate to backport this PR because it touches a significant number of lines of critical code. The potential for a new bug is a risk. My inclination would be to not backport it unless problems actually occur, although I will backport it if asked. I am not aware of any conflicts between this PR and those earlier releases.
PR validation:
New unit tests are added in FWCore/Integration/test and DataFormats/Provenance/test to cover these changes.