Instabilities in 11634.911 (DD4Hep) workflow comparisons #35109

makortel · 2021-09-01T16:00:43Z

We've observed differences in the DD4Hep workflow 11634.911 comparisons in tests of a few PRs that should not affect results of the DD4Hep workflow. This issue is to collect pointers to those comparisons.

cmsbuild · 2021-09-01T16:01:02Z

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel · 2021-09-01T16:01:07Z

assign geometry

cmsbuild · 2021-09-01T16:01:24Z

New categories assigned: geometry

@Dr15Jones,@cvuosalo,@civanch,@ianna,@mdhildreth,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel · 2021-09-01T16:01:53Z

Observed in #35068 (comment) and #34995 (comment)

makortel · 2021-09-09T12:54:15Z

Here is another occurrence https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-cf3e63/18431/summary.html

civanch · 2021-09-13T09:38:21Z

@cvuosalo , is the problem back or it is another one?

cvuosalo · 2021-09-13T17:10:10Z

The instability appears to be random and rare. It is strange that wf 11634.912 does not show it. The difference between the two workflows is that 11634.911 runs the algorithms and calculates the reco geometry, while 11634.912 reads the already calculated algorithm results and reco geometry out of the DB.

cvuosalo · 2021-09-22T16:51:54Z

I ran workflow 11634.911 thirty times in CMSSW_12_1_X_2021-09-20-1100 with identical results each time. It appears the instability has gone away.

cmsbuild · 2021-09-24T22:25:45Z

This issue is fully signed and ready to be closed.

makortel · 2021-09-25T00:31:22Z

On the other hand the comparison differences have appeared rather rarely.

makortel · 2021-11-24T14:05:48Z

Here is another instance #36222 (comment).

Could we re-open the issue (and keep it open for longer time)?

civanch · 2021-11-24T15:36:11Z

@makortel , I cannot, may be you can reopen?

makortel · 2021-11-24T15:41:17Z

I don't have the power. I'm not sure @qliphy / @perrotta have, or if we need @smuzaffar.

perrotta · 2021-11-24T15:43:36Z

Wow: I have the power!

makortel · 2023-04-10T18:33:39Z

Let's record here that the tests in #41273 (comment) showed 5932 differences in the DQM comparisons of 11634.911 (and that being the only phase Run-{1,2,3} workflow showing differences). Running the tests for second time did not show any differences. The differences seemed to be across the board (i.e. not localized to a few subsystems)

makortel · 2023-05-04T19:29:46Z

Let's record here that the tests in #41522 (comment) showed 4822 differences in the DQM comparisons of 23634.911 across the board.

missirol · 2023-05-04T19:40:37Z

For the record, something similar happened in #41533: 47459 differences in the DQM comparisons of wf 23634.911.

makortel · 2023-05-04T19:47:57Z

@cms-sw/geometry-l2 Should we open a new issue to record these instabilities or reopen this one?

missirol · 2023-05-04T19:52:51Z

For the record, something similar happened in #41533: 47459 differences in the DQM comparisons of wf 23634.911.

Strange to me that #41541 (comment) reports exactly the same: 47459 differences in the DQM comparisons of wf 23634.911. I haven't seen this kind of differences often before, but twice today.

makortel · 2023-05-04T22:56:07Z

And another one in #41532 (comment), 4822 differences in workflow 23634.911.

perrotta · 2023-05-05T08:19:11Z

One more in #41504 (comment)

makortel · 2023-06-02T19:51:44Z

Another one in #41852 (comment), 5582 differences in workflow 11634.911

makortel · 2023-06-02T19:52:22Z

(reopening the issue)

makortel · 2023-10-18T06:53:29Z

Another one in #43041 (comment), 6123 differences in workflow 11634.911. The CPU model was the same (Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz) for both the reference and the PR test.

makortel · 2023-11-29T17:27:59Z

To note here that #43439 is removing 11634.911 from the short matrix, after which we would not see these instabilities anymore in PR tests.

AdrianoDee · 2023-12-11T11:37:13Z

To note here that #43439 is removing 11634.911 from the short matrix, after which we would not see these instabilities anymore in PR tests.

Let me know if you think it is preferable to keep it just to have this "constant reminder" of the issue or if it is something that we can leave to IB tests.

civanch · 2023-12-11T14:57:23Z

From my point of view, keeping this issues does not help much even if likely we have a problem with 11634.911, which is taken out of everyday testing.

makortel · 2023-12-11T15:12:04Z

To note here that #43439 is removing 11634.911 from the short matrix, after which we would not see these instabilities anymore in PR tests.

Let me know if you think it is preferable to keep it just to have this "constant reminder" of the issue or if it is something that we can leave to IB tests.

Good question. PR tests (including the short matrix) should be about ensuring the PRs behave as expected, and therefore I think using PR tests to stress-test reproducibility is likely not the best way.

If there is no other use for 11634.911 in short matrix (@cms-sw/geometry-l2 could you comment?), I'd be in favor of dropping 11634.911 from the short matrix. Unfortunately IBs themselves don't provide any facilities for inspecting workflow results. @smuzaffar Maybe we should think about something here, at least for select workflows? (not really optimal, but maybe better than (mis)using PR tests?)

makortel · 2024-03-11T18:53:11Z

Just to note that in the end #43439 kept 11634.911

srimanob · 2024-04-18T07:12:47Z

Hi @makortel
I think this issue is solved, should we close it? Thx.

makortel · 2024-04-18T13:14:06Z

Do we know how the issue got resolved? Or is it just not occurring anymore?

srimanob · 2024-04-18T13:52:36Z

The workflow in topic is Run-3, right? As DD4hep is run by default in Run-3 workflow (.911 = .0 for Run-3), I think we don't see any instabilities any more. Do I miss some points that we should keep investigating Run-3 DD4hep workflow?

makortel · 2024-04-18T18:03:05Z

From the history the frequency seems to have been one occurrence every 1-4 months (although I suspect not all L2s report those).

Earlier comments suggest that .911 and .0 are different, by .911 reading the geometry from XML and .0 from the DB.

srimanob · 2024-04-20T08:51:42Z

From the history the frequency seems to have been one occurrence every 1-4 months (although I suspect not all L2s report those).

Earlier comments suggest that .911 and .0 are different, by .911 reading the geometry from XML and .0 from the DB.

Ah, you are right. .911 is XML version, and .912 (which is .0 default now) is DB. Do we need to monitor XML when we use DB? I mean we don't do Run-1, Run-2 XML (DDD) anymore. So, we never know if there is an issue there or not.

cmsbuild added the pending-assignment label Sep 1, 2021

makortel mentioned this issue Sep 1, 2021

[UBSAN] DataFormats/CaloTowers initialization fix #35068

Merged

cmsbuild added geometry-pending pending-signatures and removed pending-assignment labels Sep 1, 2021

jpata mentioned this issue Sep 7, 2021

[HGCAL] Introduce SimTracksters linking #35158

Merged

qliphy mentioned this issue Sep 15, 2021

Extend L1T CSC DQM with option for ME234/2 chambers and ME2/1 chamber at B904 #35100

Merged

cmsbuild added fully-signed geometry-approved and removed geometry-pending pending-signatures labels Sep 24, 2021

qliphy closed this as completed Sep 25, 2021

missirol mentioned this issue Nov 24, 2021

update HLT addOnTests for Run-3 Data with 2021 pilot-beam data #36222

Merged

perrotta reopened this Nov 24, 2021

cms-sw deleted a comment from cvuosalo Nov 26, 2021

cmsbuild removed the geometry-approved label Nov 26, 2021

cmsbuild added the geometry-approved label Mar 8, 2022

mmusich mentioned this issue Jun 2, 2022

Sort pixel tracks in the SoA converter #38065

Merged

mmusich mentioned this issue Apr 10, 2023

redefine IT digitizer ToF window + add customize fcn for activating IT signal shape with RelVals #41273

Merged

makortel mentioned this issue May 4, 2023

[CORE] [CLANG] Fix warnings reported by llvm16 in CLANG IBs #41522

Merged

makortel mentioned this issue May 4, 2023

[CORE] [CLANG] Fix warnings reported by llvm16 in CLANG IBs #41532

Merged

perrotta mentioned this issue May 5, 2023

[DB-RECONSTRUCTION] [LLVM16]Fix mismatched bound warnings #41504

Merged

makortel reopened this Jun 2, 2023

makortel mentioned this issue Oct 18, 2023

Add the ASSERT_DEVICE_MATCHES_HOST_COLLECTION macro #43041

Merged

This was referenced Nov 28, 2023

Run3-gex174B Backport the most recent version of Run3 geometries of 2021 and 2023 from #43418 and earlier pull requests for GE21 and HCAL #43421

Closed

Short Matrix Update And Documentation #43439

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instabilities in 11634.911 (DD4Hep) workflow comparisons #35109

Instabilities in 11634.911 (DD4Hep) workflow comparisons #35109

makortel commented Sep 1, 2021

cmsbuild commented Sep 1, 2021 •

edited

Loading

makortel commented Sep 1, 2021

cmsbuild commented Sep 1, 2021

makortel commented Sep 1, 2021

makortel commented Sep 9, 2021

civanch commented Sep 13, 2021

cvuosalo commented Sep 13, 2021

cvuosalo commented Sep 22, 2021

cmsbuild commented Sep 24, 2021

makortel commented Sep 25, 2021

makortel commented Nov 24, 2021

civanch commented Nov 24, 2021

makortel commented Nov 24, 2021

perrotta commented Nov 24, 2021

makortel commented Apr 10, 2023 •

edited

Loading

makortel commented May 4, 2023 •

edited

Loading

missirol commented May 4, 2023

makortel commented May 4, 2023

missirol commented May 4, 2023 •

edited

Loading

makortel commented May 4, 2023

perrotta commented May 5, 2023

makortel commented Jun 2, 2023

makortel commented Jun 2, 2023

makortel commented Oct 18, 2023

makortel commented Nov 29, 2023

AdrianoDee commented Dec 11, 2023 •

edited

Loading

civanch commented Dec 11, 2023

makortel commented Dec 11, 2023

makortel commented Mar 11, 2024

srimanob commented Apr 18, 2024

makortel commented Apr 18, 2024

srimanob commented Apr 18, 2024

makortel commented Apr 18, 2024

srimanob commented Apr 20, 2024

Instabilities in 11634.911 (DD4Hep) workflow comparisons #35109

Instabilities in 11634.911 (DD4Hep) workflow comparisons #35109

Comments

makortel commented Sep 1, 2021

cmsbuild commented Sep 1, 2021 • edited Loading

makortel commented Sep 1, 2021

cmsbuild commented Sep 1, 2021

makortel commented Sep 1, 2021

makortel commented Sep 9, 2021

civanch commented Sep 13, 2021

cvuosalo commented Sep 13, 2021

cvuosalo commented Sep 22, 2021

cmsbuild commented Sep 24, 2021

makortel commented Sep 25, 2021

makortel commented Nov 24, 2021

civanch commented Nov 24, 2021

makortel commented Nov 24, 2021

perrotta commented Nov 24, 2021

makortel commented Apr 10, 2023 • edited Loading

makortel commented May 4, 2023 • edited Loading

missirol commented May 4, 2023

makortel commented May 4, 2023

missirol commented May 4, 2023 • edited Loading

makortel commented May 4, 2023

perrotta commented May 5, 2023

makortel commented Jun 2, 2023

makortel commented Jun 2, 2023

makortel commented Oct 18, 2023

makortel commented Nov 29, 2023

AdrianoDee commented Dec 11, 2023 • edited Loading

civanch commented Dec 11, 2023

makortel commented Dec 11, 2023

makortel commented Mar 11, 2024

srimanob commented Apr 18, 2024

makortel commented Apr 18, 2024

srimanob commented Apr 18, 2024

makortel commented Apr 18, 2024

srimanob commented Apr 20, 2024

cmsbuild commented Sep 1, 2021 •

edited

Loading

makortel commented Apr 10, 2023 •

edited

Loading

makortel commented May 4, 2023 •

edited

Loading

missirol commented May 4, 2023 •

edited

Loading

AdrianoDee commented Dec 11, 2023 •

edited

Loading