Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random failures of testReadWriteOnlineBSFromDB unit test #35670

Closed
makortel opened this issue Oct 14, 2021 · 8 comments · Fixed by #35675
Closed

Random failures of testReadWriteOnlineBSFromDB unit test #35670

makortel opened this issue Oct 14, 2021 · 8 comments · Fixed by #35675

Comments

@makortel
Copy link
Contributor

makortel commented Oct 14, 2021

The testReadWriteOnlineBSFromDB appears to fail randomly with

Running script: /data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_12_1_ROOT6_X_2021-10-13-2300/src/CondTools/BeamSpot/test/testReadWriteOnlineBSFromDB.sh
TESTING BeamSpotOnline From DB Read / Write codes ...
TESTING Writing BeamSpotOnlineLegacyObjectsRcd DB object ...\n\n

<snip>

MessageLogger Summary

 type     category        sev    module        subroutine        count    total
 ---- -------------------- -- ---------------- ----------------  -----    -----
    1 BeamSpotOnlineRecord -w BeamSpotOnlineRe                       1        1
    2 BeamSpotOnlineRecord -w BeamSpotOnlineRe                      30       30

 type    category    Examples: run/evt        run/evt          run/evt
 ---- -------------------- ---------------- ---------------- ----------------
    1 BeamSpotOnlineRecordsWriter pre-events                        
    2 BeamSpotOnlineRecordsWriter EndJob           EndJob           EndJob

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------
Warning                31                  31

dropped waiting message count 0


Fatal system signal has occurred during exit
/data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_12_1_ROOT6_X_2021-10-13-2300/src/CondTools/BeamSpot/test/testReadWriteOnlineBSFromDB.sh: line 20:  5319 Aborted                 cmsRun ${LOCAL_TEST_DIR}/BeamSpotOnlineRecordsWriter_cfg.py unitTest=True inputRecord=BeamSpotOnlineHLTObjectsRcd
Failure writing payload for BeamSpotOnlineHLTObjectsRcd: status 134
status = 34304

---> test testReadWriteOnlineBSFromDB had ERRORS

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc900/CMSSW_12_1_ROOT6_X_2021-10-13-2300/unitTestLogs/CondTools/BeamSpot#/

The problem is the Fatal system signal has occurred during exit. We've seen this kind of failures before e.g. in #32045, but now it seems to be specific to this unit test. So far I've seen this specific failure in

@cmsbuild
Copy link
Contributor

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

assign alca, db

@cmsbuild
Copy link
Contributor

New categories assigned: db,alca

@yuanchao,@ggovi,@francescobrivio,@francescobrivio,@malbouis,@malbouis,@tvami,@tvami you have been requested to review this Pull request/Issue and eventually sign? Thanks

@tvami
Copy link
Contributor

tvami commented Oct 14, 2021

I think this might be connected to this: #35556
as well. We have collected the files that need to be changed here: cms-AlCaDB/AlCaTools#28 this is WIP

@mmusich
Copy link
Contributor

mmusich commented Oct 14, 2021

I could not reproduce by doing:

cmsrel CMSSW_12_1_ROOT6_X_2021-10-13-2300
cd CMSSW_12_1_ROOT6_X_2021-10-13-2300/src/
git cms-addpkg CondTools/BeamSpot
scram b -j 20

and then running scram b runtests for 20 times in a row.
I guess this is somewhat rare. The two payload writers are writing on the same file, might there be some sort of concurrency issue? Though I would not expect the second process to start before the first is done.
I have a commit decoupling the two here: https://github.com/cms-sw/cmssw/compare/master...mmusich:possible_fix_for_BSReadUnitTest?expand=1
shall we try that?
Otherwise I am afraid I would need guidance on how to debug (or even reproduce).

@tvami
Copy link
Contributor

tvami commented Oct 14, 2021

Hi @mmusich I think decoupling them sounds like a good idea, independently if it resolves the issue or not, so please submit the PR, thanks! (Is printf prefered now over echo?)

@mmusich
Copy link
Contributor

mmusich commented Oct 14, 2021

I think decoupling them sounds like a good idea, independently if it resolves the issue or not,

here it is, but would be nice to see it if helps... #35675

Is printf prefered now over echo?

that's just to interpret \n as actual newline and not as a literal.

@mmusich
Copy link
Contributor

mmusich commented Oct 14, 2021

Well I didn't mean to close this.
I guess it can stay open until we see it is solved for real or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants