Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testCondToolsSiStripBuildersReaders fails occasionally #36319

Closed
makortel opened this issue Dec 1, 2021 · 18 comments · Fixed by #36361
Closed

testCondToolsSiStripBuildersReaders fails occasionally #36319

makortel opened this issue Dec 1, 2021 · 18 comments · Fixed by #36361

Comments

@makortel
Copy link
Contributor

makortel commented Dec 1, 2021

The unit test testCondToolsSiStripBuildersReaders in CondTools/SiStrip appears to fail occasionally along

connectionID=575ae316-5255-11ec-86f9-fa163e323604
Connection to service "SiStripConditionsDBFile.db" with connectionID=575ae316-5255-11ec-86f9-fa163e323604 will be disconnected Info Connection to service "SiStripConditionsDBFile.db" with connectionID=575ae316-5255-11ec-86f9-fa163e323604 will be disconnected
Deleting the ConnectionPool Info Deleting the ConnectionPool
 Done with the readers 


Using Global Tag: 122X_dataRun2_v1
SQLiteStatement::finalize 10 disk I/O error Error SQLiteStatement::finalize 10 disk I/O error
SQLiteStatement::execute 1 disk I/O error Error SQLiteStatement::execute 1 disk I/O error
----- Begin Fatal Exception 01-Dec-2021 04:18:31 CET-----------------------
An exception of category 'ConditionDatabase' occurred while
   [0] Processing  Event run: 303014 lumi: 1 event: 1 stream: 0
   [1] Running path 'p'
   [2] Calling method for module SiStripChannelGainFromDBMiscalibrator/'scaleAndSmearSiStripGains'
Exception Message:
disk I/O error ( CORAL : "SQLiteStatement::execute" from "CORAL/RelationalPlugins/sqlite" ) from PoolDBOutputService::writeOne 
----- End Fatal Exception -------------------------------------------------

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc10/CMSSW_12_2_X_2021-11-30-2300/unitTestLogs/CondTools/SiStrip#/

So far seen in e.g.

  • CMSSW_12_2_X_2021-11-30-2300 slc7_amd64_gcc10
  • CMSSW_12_2_X_2021-11-29-2300 slc7_aarch64_gcc9
  • CMSSW_12_2_X_2021-11-28-2300 cs8_amd64_gcc11
  • CMSSW_12_2_X_2021-11-26-2300 slc7_amd64_gcc10
  • CMSSW_12_2_ROOT6_X_2021-11-26-2300 slc7_amd64_gcc900
  • CMSSW_12_2_X_2021-11-25-2300 slc7_amd64_gcc10
  • CMSSW_12_2_DBG_X_2021-11-25-2300 slc7_amd64_gcc900
@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 1, 2021

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

makortel commented Dec 1, 2021

assign db

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 1, 2021

New categories assigned: db

@ggovi,@francescobrivio,@malbouis,@tvami you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor Author

makortel commented Dec 1, 2021

FYI @cms-sw/trk-dpg-l2

@makortel
Copy link
Contributor Author

makortel commented Dec 1, 2021

Seems like similar concurrency issue as in #36175, #26741

@mmusich
Copy link
Contributor

mmusich commented Dec 1, 2021

Seems like similar concurrency issue as in #36175, #26741

As far as I can tell the name of the output sqlite file of the failing test is unique in cmssw

process.CondDB.connect = 'sqlite_file:modifiedGains_'+ process.GlobalTag.globaltag._value+'_IOV_'+str(options.runNumber)+".db"

(see https://github.com/cms-sw/cmssw/search?q=modifiedGains_)

@makortel
Copy link
Contributor Author

makortel commented Dec 1, 2021

As far as I can tell the name of the output sqlite file of the failing test is unique in cmssw

Thanks for checking, this is something different then.

@mmusich
Copy link
Contributor

mmusich commented Dec 1, 2021

SQLiteStatement::finalize 10 disk I/O error Error SQLiteStatement::finalize 10 disk I/O error
SQLiteStatement::execute 1 disk I/O error Error SQLiteStatement::execute 1 disk I/O error
----- Begin Fatal Exception 01-Dec-2021 04:18:31 CET-----------------------
An exception of category 'ConditionDatabase' occurred while
   [0] Processing  Event run: 303014 lumi: 1 event: 1 stream: 0
   [1] Running path 'p'
   [2] Calling method for module SiStripChannelGainFromDBMiscalibrator/'scaleAndSmearSiStripGains'
Exception Message:
disk I/O error ( CORAL : "SQLiteStatement::execute" from "CORAL/RelationalPlugins/sqlite" ) from PoolDBOutputService::writeOne 

@smuzaffar is it possible we're getting an issue due to full disk?

@mmusich
Copy link
Contributor

mmusich commented Dec 3, 2021

so #36334 (changing the most obvious things could go wrong) didn't help.
There's still an error in CMSSW_12_2_X_2021-12-03-1100 unit tests.
I am open for suggestions on what to check next.

@mmusich
Copy link
Contributor

mmusich commented Dec 3, 2021

From the manual: https://www.sqlite.org/rescode.html#ioerr

The SQLITE_IOERR result code says that the operation could not finish because the operating system reported an I/O error.
A full disk drive will normally give an SQLITE_FULL error rather than an SQLITE_IOERR error.
There are many different extended result codes for I/O errors that identify the specific I/O operation that failed.

@mmusich
Copy link
Contributor

mmusich commented Dec 6, 2021

Dear @cms-sw/db-l2 after #36361 has been merged in CMSSW_12_2_X_2021-12-05-0000 I have monitored the IB unit tests and I didn't observe any failure related to testCondToolsSiStripBuildersReaders in:

  • CMSSW_12_2_X_2021-12-05-0000
  • CMSSW_12_2_X_2021-12-05-2300

shall we wait for the next IB for confirmation and then close this issue?

@tvami
Copy link
Contributor

tvami commented Dec 6, 2021

@tvami
Copy link
Contributor

tvami commented Dec 6, 2021

+db

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 6, 2021

This issue is fully signed and ready to be closed.

@makortel makortel closed this as completed Dec 6, 2021
@makortel
Copy link
Contributor Author

makortel commented Dec 7, 2021

This failure occurred again in CMSSW_12_3_ROOT624_X_2021-12-06-2300

Using Global Tag: 122X_dataRun2_v1
SiStripChannelGainFromDBMiscalibrator::analyze detid 369120277 	 APV 1 	 new gain: 0.565074 	 old gain: 0.845799 	
SiStripChannelGainFromDBMiscalibrator::analyze detid 369120277 	 APV 2 	 new gain: 0.497616 	 old gain: 0.83945 	
SiStripChannelGainFromDBMiscalibrator::analyze detid 369120277 	 APV 3 	 new gain: 0.568472 	 old gain: 0.874687 	
SiStripChannelGainFromDBMiscalibrator::analyze detid 369120277 	 APV 4 	 new gain: 0.54064 	 old gain: 0.817904 	
SiStripChannelGainFromDBMiscalibrator::analyze detid 369120277 	 APV 5 	 new gain: 0.555092 	 old gain: 0.880128 	
SiStripChannelGainFromDBMiscalibrator::analyze detid 369120277 	 APV 6 	 new gain: 0.548598 	 old gain: 0.892149 	
SQLiteStatement::finalize 10 disk I/O error Error SQLiteStatement::finalize 10 disk I/O error
SQLiteStatement::execute 1 disk I/O error Error SQLiteStatement::execute 1 disk I/O error
----- Begin Fatal Exception 07-Dec-2021 08:02:35 CET-----------------------
An exception of category 'ConditionDatabase' occurred while
   [0] Processing  Event run: 303014 lumi: 1 event: 1 stream: 0
   [1] Running path 'p'
   [2] Calling method for module SiStripChannelGainFromDBMiscalibrator/'scaleAndSmearSiStripGains'
Exception Message:
disk I/O error ( CORAL : "SQLiteStatement::execute" from "CORAL/RelationalPlugins/sqlite" ) from PoolDBOutputService::createNewIov 
----- End Fatal Exception -------------------------------------------------

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc900/CMSSW_12_3_ROOT624_X_2021-12-06-2300/unitTestLogs/CondTools/SiStrip#/

@makortel
Copy link
Contributor Author

makortel commented Dec 8, 2021

Occurred again in CMSSW_12_3_DEVEL_X_2021-12-07-2300 and CMSSW_12_3_CLANG_X_2021-12-07-2300, and in CMSSW_12_3_X_2021-12-07-2300 for cs8_amd64_gcc10.

@makortel
Copy link
Contributor Author

makortel commented Dec 8, 2021

The test appears to be failing in some PR tests too. I think we can conclude that there is a correlation with the overall load of the build+test machines. But why is it always this specific test that fails and not others that write to SQLite files?

@qliphy
Copy link
Contributor

qliphy commented Dec 9, 2021

Occurred also in CMSSW_12_2_X_2021-12-08-1100 and CMSSW_12_2_X_2021-12-08-2300

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants