Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential data race in ESProductResolver #46330

Merged
merged 1 commit into from
Oct 11, 2024

Conversation

wddgit
Copy link
Contributor

@wddgit wddgit commented Oct 9, 2024

PR description:

Fixes a potential data race in ESProductResolver. Noticed while reading code for a different development. No one has reported an issue related to this. Probably it is not an actual issue because writing the same pointer value to the same memory location concurrently is not an actual problem on the CPUs we currently use, although technically it is a data race and might someday be a problem on CPUs in use in the future (we think...). This condition should also occur extremely rarely.

I removed the memory order specifications in this change because that is what we usually currently do, but if anyone thinks it worth it I will put them back. I just thought it was a remnant from when we used to use those in the early concurrency days.

Also removed an old comment about a global mutex that was removed a long time ago.

PR validation:

Existing unit tests pass. There shouldn't be any change in behavior or output.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2024

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2024

A new Pull Request was created by @wddgit for master.

It involves the following packages:

  • FWCore/Framework (core)

@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks.
@makortel, @missirol this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@wddgit
Copy link
Contributor Author

wddgit commented Oct 9, 2024

please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3fefca/42087/summary.html
COMMIT: 123b6a1
CMSSW: CMSSW_14_2_X_2024-10-09-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46330/42087/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

Comment on lines 87 to 88
cache_.store(getAfterPrefetchImpl());
cache = cache_.load();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cache_.store(getAfterPrefetchImpl());
cache = cache_.load();
cache = cache_ = getAfterPrefetchImpl();

this may allow the compiler to generate more optimized code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks, that is better.

@cmsbuild
Copy link
Contributor

@wddgit
Copy link
Contributor Author

wddgit commented Oct 10, 2024

enable threading

@cmsbuild
Copy link
Contributor

Pull request #46330 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

@wddgit
Copy link
Contributor Author

wddgit commented Oct 10, 2024

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals RelVals-INPUT RelVals-THREADING
Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3fefca/42104/summary.html
COMMIT: 53d2f30
CMSSW: CMSSW_14_2_X_2024-10-09-2300/el8_amd64_gcc12
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46330/42104/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

----- Begin Fatal Exception 10-Oct-2024 19:48:18 CEST-----------------------
An exception of category 'LogicalFileNameNotFound' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing input source of type PoolSource
Exception Message:
RootFileSequenceBase::initTheFile()
Logical file name '++ echo /store/data/Run2012B/SinglePhoton/RAW/v1/000/194/533/1084D9DA-AEA2-E111-B3AB-001D09F24DA8.root' was not found in the file catalog.
If you wanted a local file, you forgot the 'file:' prefix
before the file name in your configuration file.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 10-Oct-2024 19:48:27 CEST-----------------------
An exception of category 'LogicalFileNameNotFound' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing input source of type PoolSource
Exception Message:
RootFileSequenceBase::initTheFile()
Logical file name '++ echo /store/data/Commissioning2021/MinimumBias/RAW/v1/000/346/512/00000/be4e0e99-6d25-4b6f-8648-1adefb79c7bf.root' was not found in the file catalog.
If you wanted a local file, you forgot the 'file:' prefix
before the file name in your configuration file.
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 10-Oct-2024 19:48:27 CEST-----------------------
An exception of category 'LogicalFileNameNotFound' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing input source of type PoolSource
Exception Message:
RootFileSequenceBase::initTheFile()
Logical file name '++ echo /store/data/Run2016B/SinglePhoton/RAW/v2/000/274/199/00000/CA646D43-8526-E611-A6D9-02163E014300.root' was not found in the file catalog.
If you wanted a local file, you forgot the 'file:' prefix
before the file name in your configuration file.
----- End Fatal Exception -------------------------------------------------
Expand to see more relval errors ...

RelVals-INPUT

  • 4.264.26_ZMuSkim2011A/step2_ZMuSkim2011A.log
  • 4.174.17_RunMinBias2011A/step2_RunMinBias2011A.log
  • 4.244.24_WMuSkim2011A/step2_WMuSkim2011A.log
Expand to see more relval errors ...

RelVals-THREADING

  • 8.08.0_BeamHalo/step2_BeamHalo.log
  • 4.534.53_RunPhoton2012B/step2_RunPhoton2012B.log
  • 4.224.22_RunCosmics2011A/step2_RunCosmics2011A.log
Expand to see more relval errors ...

@smuzaffar
Copy link
Contributor

please test

there was bug in cms-bot which is fix now

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3fefca/42114/summary.html
COMMIT: 53d2f30
CMSSW: CMSSW_14_2_X_2024-10-10-1100/el8_amd64_gcc12
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46330/42114/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 104 differences found in the comparisons
  • DQMHistoTests: Total files compared: 44
  • DQMHistoTests: Total histograms compared: 3331066
  • DQMHistoTests: Total failures: 2917
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3328129
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 43 files compared)
  • Checked 193 log files, 163 edm output root files, 44 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

Comparison differences are related to #39803

@makortel
Copy link
Contributor

+core

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @rappoccio, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@makortel
Copy link
Contributor

1-7 % "increase" in maxmemory report is likely related to #46359

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 4c50d04 into cms-sw:master Oct 11, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants