Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TBB] Fix concurrent_[bounded_]queue correctness on weak memory models [13.0.x] #8358

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Mar 4, 2023

Applied oneapi-src/oneTBB#782 patch on top of v2021.8.0 .
This should fix the testFWCoreUtilities failure in ARM IBs

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2023

type bugfix

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2023

backport #8355

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2023

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_13_0_X/master.

@cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2023

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2023

please test for el8_ppc64le_gcc11

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2023

please test for el8_aarch64_gcc11

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2023

urgent

@perrotta, @rappoccio, as this is a bug fix, can we have it in 13.0.0 ?

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31070/summary.html
COMMIT: 867a8b5
CMSSW: CMSSW_13_0_X_2023-03-02-2300/el8_aarch64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8358/31070/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31070/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31070/git-merge-result

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2023

-1

Failed Tests: UnitTests RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31071/summary.html
COMMIT: 867a8b5
CMSSW: CMSSW_13_0_X_2023-03-02-2300/el8_ppc64le_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8358/31071/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31071/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31071/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test testONNXRuntime had ERRORS
---> test testFWCoreConcurrency had ERRORS

RelVals

  • 11634.011634.0_TTbar_14TeV+2021/step1_TTbar_14TeV+2021.log
  • 11634.711634.7_TTbar_14TeV+2021_trackingMkFit/step1_TTbar_14TeV+2021_trackingMkFit.log
  • 11634.91111634.911_TTbar_14TeV+2021_DD4hep/step1_TTbar_14TeV+2021_DD4hep.log
Expand to see more relval errors ...

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 5, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31069/summary.html
COMMIT: 867a8b5
CMSSW: CMSSW_13_0_X_2023-03-04-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8358/31069/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3557934
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3557909
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 7, 2023

please test for el8_ppc64le_gcc11

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 8, 2023

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31134/summary.html
COMMIT: 867a8b5
CMSSW: CMSSW_13_0_X_2023-03-05-2300/el8_ppc64le_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8358/31134/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test testONNXRuntime had ERRORS
---> test testFWCoreConcurrency had ERRORS

@smuzaffar
Copy link
Contributor

please test

@smuzaffar
Copy link
Contributor

please test for el8_aarch64_gcc11

@perrotta
Copy link
Contributor

@fwyzard I started this morning the build of CMSSW_13_0_1 and I forgot to (test and, if successful) include this one
I could stop the build, if needed. However, as far as I understand this is not strictly necessary, and it could even enter a forthcoming new 13_0_X release (expected soon because also PPS needs to include some update for the data taking): if so, I would not delay the ongoing build of 13_0_1, requested by HLT and L1T.
Please let us know,

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 23, 2023

Hi @perrotta,
you can go ahead with the ongoing build.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 23, 2023

The long version is: this PR fixes a problem in TBB queues on ARM and Power - but we've lived with it for years now, so empirically it should not break anything.

Given how TBB queues are used in the framework, it may lead to a non optimal reuse of resources, but it should introduce any incorrect behaviour.

While there may be other code that benefits from the optimal behaviour, a concurrent_queue cannot really guarantee that it is empty (another thread could push to it right after the check was made), so nothing should actually rely on it.

So, the fix is good to have, but not critical.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31539/summary.html
COMMIT: 867a8b5
CMSSW: CMSSW_13_0_X_2023-03-21-2300/el8_aarch64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8358/31539/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31539/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31539/git-merge-result

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b1cccf/31541/summary.html
COMMIT: 867a8b5
CMSSW: CMSSW_13_0_X_2023-03-22-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8358/31541/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 8 lines from the logs
  • Reco comparison results: 6 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3552993
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3552968
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_13_0_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@smuzaffar
Copy link
Contributor

merging it for next 13.0.X IB so that it can be part of next 13.0.2 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants