Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add temporary failure element to Framework Job Report #41124

Merged
merged 1 commit into from
Mar 26, 2023

Conversation

Dr15Jones
Copy link
Contributor

PR description:

Until the job report finishes, we add a FrameworkError element to denote that if the job suddenly ends the error will be present. When job completes successfully or with another error, that temporary element is not written.

The new error was added to edm::errors.

PR validation:

Code compiles and framework unit tests pass.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41124/34762

  • This PR adds an extra 36KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for master.

It involves the following packages:

  • FWCore/MessageLogger (core)
  • FWCore/MessageService (core)
  • FWCore/Utilities (core)
  • IOPool/Common (core)

@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks.
@makortel, @missirol, @wddgit, @felicepantaleo this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor Author

please test

Comment on lines 305 to 306
static char const* const kJobReportEndElement = "</FrameworkJobReport>\n";
static constexpr int kEndElementSize = 22;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible and feasible to use std::string_view and take the number of elements from its size() (which, I see, should be constexpr)?

if ((endpos - pos) > kEndElementSize) {
//need to add some padding so use a comment element
auto padding = (endpos - pos) - (kEndElementSize + kMinSizeOfComment);
*(impl_->ost_) << "<!--";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here explaining briefly why an XML comment is used instead of white space? (to remind us the next time we need to look at this code)

@makortel
Copy link
Contributor

Not really part of this PR, but while we are here, I looked where temporarilyCloseXML() is called, and noticed that

void JobReport::reportPerformanceSummary(std::string const& metricClass,
std::map<std::string, std::string> const& metrics) {
if (impl_->ost_) {
std::ostream& msg = *(impl_->ost_);
msg << "<PerformanceReport>\n"
<< " <PerformanceSummary Metric=\"" << metricClass << "\">\n";
typedef std::map<std::string, std::string>::const_iterator const_iterator;
for (const_iterator iter = metrics.begin(), iterEnd = metrics.end(); iter != iterEnd; ++iter) {
msg << " <Metric Name=\"" << iter->first << "\" "
<< "Value=\"" << iter->second << "\"/>\n";
}
msg << " </PerformanceSummary>\n"
<< "</PerformanceReport>\n";
temporarilyCloseXML();
}
}

void JobReport::reportPerformanceForModule(std::string const& metricClass,
std::string const& moduleName,
std::map<std::string, std::string> const& metrics) {
if (impl_->ost_) {
std::ostream& msg = *(impl_->ost_);
msg << "<PerformanceReport>\n"
<< " <PerformanceModule Metric=\"" << metricClass << "\" "
<< " Module=\"" << moduleName << "\" >\n";
typedef std::map<std::string, std::string>::const_iterator const_iterator;
for (const_iterator iter = metrics.begin(), iterEnd = metrics.end(); iter != iterEnd; ++iter) {
msg << " <Metric Name=\"" << iter->first << "\" "
<< "Value=\"" << iter->second << "\"/>\n";
}
msg << " </PerformanceModule>\n"
<< "</PerformanceReport>\n";
temporarilyCloseXML();
}
}

do not lock the mutex. Currently they are called only in ActivityRegistry's postEndJob signal that we seem to be calling serially. Should we nevertheless lock the mutex in these functions?

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a928c2/31496/summary.html
COMMIT: fb249dd
CMSSW: CMSSW_13_1_X_2023-03-21-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41124/31496/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a928c2/31496/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a928c2/31496/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test PrimaryVertex had ERRORS
---> test testPVPlotting had ERRORS
---> test trackerMaterialAnalysisPlots had ERRORS
---> test createDBObjecs had ERRORS
and more ...

Comparison Summary

Summary:

  • You potentially added 4 lines to the logs
  • Reco comparison results: 14 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3550915
  • DQMHistoTests: Total failures: 116
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3550777
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

Another incarnation of #39803 and #39754

Until the job report finishes, we add a FrameworkError element to denote that if the job suddenly ends the error will be present. When job completes successfully or with another error, that temporary element is not written.
The new error was added to edm::errors.
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41124/34821

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

Pull request #41124 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.

@makortel
Copy link
Contributor

@cmsbuild, please test

@makortel
Copy link
Contributor

Regarding #41124 (comment), for future reference, we decided not to add mutex there because the functions are (and should be) called only at endJob transition, and we do not foresee adding concurrency there.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a928c2/31569/summary.html
COMMIT: 243f7d4
CMSSW: CMSSW_13_1_X_2023-03-23-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41124/31569/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 10 lines to the logs
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3552750
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3552728
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

+core

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants