-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash on pg_reload_conf() #10
Comments
Here's another crash resulting from the signal handler doing too much:
In the backtrace below, it reentered non-reentrant code while SPI was in
progress.
regards
Andreas
```
Core was generated by `postgres: bgworker: powa snapshot '.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
58 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1 0x00007f8eb41f640a in __GI_abort () at abort.c:89
#2 0x0000558c762b5ec3 in ExceptionalCondition (conditionName=conditionName@entry=0x558c76443689 "!(!InProgressBuf)", errorType=errorType@entry=0x558c762f929d "FailedAssertion", fileName=fileName@entry=0x558c764435d6 "bufmgr.c", lineNumber=lineNumber@entry=3835) at assert.c:54
#3 0x0000558c76186154 in StartBufferIO (buf=buf@entry=0x7f8eb0b03b80, forInput=forInput@entry=1 '\001') at bufmgr.c:3835
#4 0x0000558c761889a3 in BufferAlloc (foundPtr=0x7ffe8234255b "", strategy=0x0, blockNum=0, forkNum=MAIN_FORKNUM, relpersistence=112 'p', smgr=0x558c78ae2958) at bufmgr.c:1326
#5 ReadBuffer_common (smgr=0x558c78ae2958, relpersistence=<optimized out>, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=blockNum@entry=0, mode=RBM_NORMAL, strategy=0x0, hit=0x7ffe8234260f "") at bufmgr.c:745
#6 0x0000558c76188c60 in ReadBufferExtended (reln=reln@entry=0x7f8eb4cc14b0, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=0, mode=mode@entry=RBM_NORMAL, strategy=strategy@entry=0x0) at bufmgr.c:664
#7 0x0000558c76188d2e in ReadBuffer (reln=reln@entry=0x7f8eb4cc14b0, blockNum=<optimized out>) at bufmgr.c:596
#8 0x0000558c75f6922e in _bt_getbuf (rel=0x7f8eb4cc14b0, blkno=<optimized out>, access=1) at nbtpage.c:576
#9 0x0000558c75f69aa7 in _bt_getroot (rel=rel@entry=0x7f8eb4cc14b0, access=access@entry=1) at nbtpage.c:158
#10 0x0000558c75f6fe6b in _bt_search (rel=rel@entry=0x7f8eb4cc14b0, keysz=keysz@entry=2, scankey=scankey@entry=0x7ffe82342880, nextkey=nextkey@entry=0 '\000', bufP=bufP@entry=0x7ffe8234277c, access=access@entry=1, snapshot=0x558c78be47d8) at nbtsearch.c:99
#11 0x0000558c75f70bf4 in _bt_first (scan=scan@entry=0x558c78fc58d0, dir=dir@entry=ForwardScanDirection) at nbtsearch.c:983
#12 0x0000558c75f6cbd2 in btgettuple (scan=0x558c78fc58d0, dir=ForwardScanDirection) at nbtree.c:326
#13 0x0000558c75f63d5b in index_getnext_tid (scan=scan@entry=0x558c78fc58d0, direction=direction@entry=ForwardScanDirection) at indexam.c:415
#14 0x0000558c75f6407b in index_getnext (scan=0x558c78fc58d0, direction=direction@entry=ForwardScanDirection) at indexam.c:553
#15 0x0000558c75f62f49 in systable_getnext (sysscan=sysscan@entry=0x558c78f70d80) at genam.c:416
#16 0x0000558c762a1bdf in SearchCatCache (cache=0x558c78aac118, v1=<optimized out>, v2=<optimized out>, v3=<optimized out>, v4=<optimized out>) at catcache.c:1183
#17 0x0000558c762afc06 in GetSysCacheOid (cacheId=cacheId@entry=54, key1=<optimized out>, key2=<optimized out>, key3=key3@entry=0, key4=key4@entry=0) at syscache.c:1061
#18 0x0000558c75fc37b6 in get_ts_config_oid (names=0x558c78fc5848, missing_ok=missing_ok@entry=1 '\001') at namespace.c:2506
#19 0x0000558c762b5d65 in check_TSCurrentConfig (newval=0x7ffe82343df0, extra=<optimized out>, source=PGC_S_FILE) at ts_cache.c:599
#20 0x0000558c762cc6cf in call_string_check_hook (newval=newval@entry=0x7ffe82343df0, extra=extra@entry=0x7ffe82343df8, source=source@entry=PGC_S_FILE, elevel=elevel@entry=12, conf=<optimized out>, conf=<optimized out>) at guc.c:9799
#21 0x0000558c762cea3c in parse_and_validate_value (record=record@entry=0x558c76756850 <ConfigureNamesString+6992>, name=name@entry=0x558c78f72bc8 "default_text_search_config", value=value@entry=0x558c78f72c00 "pg_catalog.english", source=source@entry=PGC_S_FILE, elevel=elevel@entry=12, newval=newval@entry=0x7ffe82343df0, newextra=0x7ffe82343df8) at guc.c:5730
#22 0x0000558c762cf6cd in set_config_option (name=0x558c78f72bc8 "default_text_search_config", value=0x558c78f72c00 "pg_catalog.english", context=context@entry=PGC_SIGHUP, source=source@entry=PGC_S_FILE, action=action@entry=GUC_ACTION_SET, changeVal=<optimized out>, changeVal@entry=1 '\001', elevel=12, is_reload=0 '\000') at guc.c:6332
#23 0x0000558c762d5280 in ProcessConfigFileInternal (context=context@entry=PGC_SIGHUP, applySettings=applySettings@entry=1 '\001', elevel=elevel@entry=13) at guc-file.l:439
#24 0x0000558c762d5924 in ProcessConfigFile (context=PGC_SIGHUP) at guc-file.l:155
#25 0x00007f8eb3b99dae in powa_sighup (postgres_signal_arg=<optimized out>) at powa.c:342
#26 <signal handler called>
#27 0x00007f8eb429d5a0 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:84
#28 0x0000558c7618c367 in FileRead (file=<optimized out>, buffer=buffer@entry=0x7f8eb10f6300 "\001", amount=amount@entry=8192) at fd.c:1623
#29 0x0000558c761b357f in mdread (reln=<optimized out>, forknum=<optimized out>, blocknum=1, buffer=0x7f8eb10f6300 "\001") at md.c:754
#30 0x0000558c761881f4 in ReadBuffer_common (smgr=0x558c78ae2d78, relpersistence=<optimized out>, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=blockNum@entry=1, mode=RBM_NORMAL, strategy=0x0, hit=0x7ffe8234461f "") at bufmgr.c:890
#31 0x0000558c76188c60 in ReadBufferExtended (reln=reln@entry=0x7f8eb06bdc68, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=1, mode=mode@entry=RBM_NORMAL, strategy=strategy@entry=0x0) at bufmgr.c:664
#32 0x0000558c76188dc4 in ReadBuffer (blockNum=<optimized out>, reln=0x7f8eb06bdc68) at bufmgr.c:596
#33 ReleaseAndReadBuffer (buffer=buffer@entry=704, relation=relation@entry=0x2c0, blockNum=<optimized out>) at bufmgr.c:1545
#34 0x0000558c75f68d08 in _bt_relandgetbuf (rel=0x2c0, rel@entry=0x7f8eb06bdc68, obuf=704, blkno=blkno@entry=1, access=-1335108504, access@entry=1) at nbtpage.c:707
#35 0x0000558c75f70016 in _bt_search (rel=rel@entry=0x7f8eb06bdc68, keysz=keysz@entry=3, scankey=scankey@entry=0x7ffe82344830, nextkey=nextkey@entry=0 '\000', bufP=bufP@entry=0x7ffe8234472c, access=access@entry=1, snapshot=0x558c78be44b0) at nbtsearch.c:166
#36 0x0000558c75f70bf4 in _bt_first (scan=scan@entry=0x558c78e034c8, dir=dir@entry=ForwardScanDirection) at nbtsearch.c:983
#37 0x0000558c75f6cbd2 in btgettuple (scan=0x558c78e034c8, dir=ForwardScanDirection) at nbtree.c:326
#38 0x0000558c75f63d5b in index_getnext_tid (scan=scan@entry=0x558c78e034c8, direction=direction@entry=ForwardScanDirection) at indexam.c:415
#39 0x0000558c760ac008 in IndexOnlyNext (node=node@entry=0x558c78e00ad8) at nodeIndexonlyscan.c:79
#40 0x0000558c7609c7d3 in ExecScanFetch (recheckMtd=0x558c760abf50 <IndexOnlyRecheck>, accessMtd=0x558c760abf90 <IndexOnlyNext>, node=0x558c78e00ad8) at execScan.c:95
#41 ExecScan (node=node@entry=0x558c78e00ad8, accessMtd=accessMtd@entry=0x558c760abf90 <IndexOnlyNext>, recheckMtd=recheckMtd@entry=0x558c760abf50 <IndexOnlyRecheck>) at execScan.c:145
#42 0x0000558c760ac4ce in ExecIndexOnlyScan (node=node@entry=0x558c78e00ad8) at nodeIndexonlyscan.c:261
#43 0x0000558c76094d88 in ExecProcNode (node=node@entry=0x558c78e00ad8) at execProcnode.c:431
#44 0x0000558c760b1d81 in ExecNestLoop (node=node@entry=0x558c78dfe0a8) at nodeNestloop.c:174
#45 0x0000558c76094ce8 in ExecProcNode (node=node@entry=0x558c78dfe0a8) at execProcnode.c:476
#46 0x0000558c760afeb9 in ExecModifyTable (node=node@entry=0x558c78dfdae8) at nodeModifyTable.c:1392
#47 0x0000558c76094df8 in ExecProcNode (node=node@entry=0x558c78dfdae8) at execProcnode.c:396
#48 0x0000558c76090e93 in ExecPostprocessPlan (estate=0x558c78df03f0) at execMain.c:1421
#49 standard_ExecutorFinish (queryDesc=0x558c78e48520) at execMain.c:407
#50 0x00007f8eb3d9ec75 in pgss_ExecutorFinish (queryDesc=0x558c78e48520) at pg_stat_statements.c:900
#51 0x00007f8eb3992e43 in pgqs_ExecutorFinish (queryDesc=0x558c78e48520) at pg_qualstats.c:592
#52 0x0000558c760c072f in _SPI_pquery (tcount=<optimized out>, fire_triggers=<optimized out>, queryDesc=<optimized out>) at spi.c:2424
#53 _SPI_execute_plan (plan=plan@entry=0x558c78c44df0, paramLI=paramLI@entry=0x558c78d30888, snapshot=snapshot@entry=0x0, crosscheck_snapshot=crosscheck_snapshot@entry=0x0, read_only=read_only@entry=0 '\000', fire_triggers=fire_triggers@entry=1 '\001', tcount=<optimized out>) at spi.c:2188
#54 0x0000558c760c0c2a in SPI_execute_plan_with_paramlist (plan=0x558c78c44df0, params=0x558c78d30888, read_only=<optimized out>, tcount=1) at spi.c:446
#55 0x00007f8eb073aad2 in exec_stmt_execsql (estate=estate@entry=0x7ffe82346470, stmt=stmt@entry=0x558c78c0d418) at pl_exec.c:3515
#56 0x00007f8eb073bbc6 in exec_stmt (stmt=0x558c78c0d418, estate=0x7ffe82346470) at pl_exec.c:1503
#57 exec_stmts (estate=0x7ffe82346470, stmts=<optimized out>) at pl_exec.c:1398
#58 0x00007f8eb073e627 in exec_stmt_block (estate=estate@entry=0x7ffe82346470, block=0x558c78c0d868) at pl_exec.c:1336
#59 0x00007f8eb073e845 in plpgsql_exec_function (func=func@entry=0x558c78b24350, fcinfo=fcinfo@entry=0x558c78d2acc8, simple_eval_estate=simple_eval_estate@entry=0x0) at pl_exec.c:434
#60 0x00007f8eb0732f95 in plpgsql_call_handler (fcinfo=0x558c78d2acc8) at pl_handler.c:255
#61 0x0000558c76095d64 in ExecMakeFunctionResultNoSets (fcache=0x558c78d2ac58, econtext=0x558c78d2aa30, isNull=0x558c78d2b608 "\177~\177\177\177\177\177\177\060\r\315x\214U", isDone=<optimized out>) at execQual.c:2041
#62 0x0000558c7609c24e in ExecTargetList (tupdesc=<optimized out>, isDone=0x7ffe823467cc, itemIsDone=0x558c78d2b748, isnull=0x558c78d2b608 "\177~\177\177\177\177\177\177\060\r\315x\214U", values=0x558c78d2b5e8, econtext=0x558c78d2aa30, targetlist=0x558c78d2b710) at execQual.c:5423
#63 ExecProject (projInfo=<optimized out>, isDone=isDone@entry=0x7ffe823467cc) at execQual.c:5647
#64 0x0000558c760b3169 in ExecResult (node=node@entry=0x558c78d2a918) at nodeResult.c:155
#65 0x0000558c76094e08 in ExecProcNode (node=node@entry=0x558c78d2a918) at execProcnode.c:392
#66 0x0000558c76090c8e in ExecutePlan (dest=0x558c76753a60 <spi_printtupDR>, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x558c78d2a918, estate=0x558c78d2a800) at execMain.c:1569
#67 standard_ExecutorRun (queryDesc=0x558c78d28c10, direction=<optimized out>, count=0) at execMain.c:338
#68 0x00007f8eb3d9ed5d in pgss_ExecutorRun (queryDesc=0x558c78d28c10, direction=ForwardScanDirection, count=0) at pg_stat_statements.c:877
#69 0x00007f8eb3992f25 in pgqs_ExecutorRun (queryDesc=0x558c78d28c10, direction=ForwardScanDirection, count=0) at pg_qualstats.c:564
#70 0x0000558c760c087f in _SPI_pquery (tcount=0, fire_triggers=1 '\001', queryDesc=0x558c78d28c10) at spi.c:2412
#71 _SPI_execute_plan (plan=plan@entry=0x7ffe82346bd0, paramLI=paramLI@entry=0x0, snapshot=snapshot@entry=0x0, crosscheck_snapshot=crosscheck_snapshot@entry=0x0, read_only=read_only@entry=0 '\000', fire_triggers=fire_triggers@entry=1 '\001', tcount=<optimized out>) at spi.c:2188
#72 0x0000558c760c0ac3 in SPI_execute (src=src@entry=0x558c78d15508 "SELECT powa_statements_snapshot()", read_only=<optimized out>, tcount=tcount@entry=0) at spi.c:383
#73 0x00007f8eb073d186 in exec_stmt_dynexecute (stmt=0x558c78b5a0e0, estate=0x7ffe823472d0) at pl_exec.c:3707
#74 exec_stmt (stmt=0x558c78b5a0e0, estate=0x7ffe823472d0) at pl_exec.c:1507
#75 exec_stmts (estate=0x7ffe823472d0, stmts=<optimized out>) at pl_exec.c:1398
#76 0x00007f8eb073e4d3 in exec_stmt_block (estate=estate@entry=0x7ffe823472d0, block=block@entry=0x558c78b5b238) at pl_exec.c:1197
#77 0x00007f8eb073bdff in exec_stmt (stmt=0x558c78b5b238, estate=0x7ffe823472d0) at pl_exec.c:1431
#78 exec_stmts (estate=0x7ffe823472d0, stmts=<optimized out>) at pl_exec.c:1398
#79 0x00007f8eb073f6f1 in exec_for_query (estate=estate@entry=0x7ffe823472d0, stmt=stmt@entry=0x558c78b598f8, portal=0x558c78bb6290, prefetch_ok=prefetch_ok@entry=1 '\001') at pl_exec.c:5209
#80 0x00007f8eb073c03e in exec_stmt_fors (stmt=0x558c78b598f8, estate=0x7ffe823472d0) at pl_exec.c:2116
#81 exec_stmt (stmt=0x558c78b598f8, estate=0x7ffe823472d0) at pl_exec.c:1467
#82 exec_stmts (estate=0x7ffe823472d0, stmts=<optimized out>) at pl_exec.c:1398
#83 0x00007f8eb073e627 in exec_stmt_block (estate=estate@entry=0x7ffe823472d0, block=0x558c78b60970) at pl_exec.c:1336
#84 0x00007f8eb073e845 in plpgsql_exec_function (func=func@entry=0x558c78b1cc80, fcinfo=fcinfo@entry=0x558c78b75098, simple_eval_estate=simple_eval_estate@entry=0x0) at pl_exec.c:434
#85 0x00007f8eb0732f95 in plpgsql_call_handler (fcinfo=0x558c78b75098) at pl_handler.c:255
#86 0x0000558c76095d64 in ExecMakeFunctionResultNoSets (fcache=0x558c78b75028, econtext=0x558c78b74e00, isNull=0x558c78b759d8 "\200~rx\214U", isDone=<optimized out>) at execQual.c:2041
#87 0x0000558c7609c24e in ExecTargetList (tupdesc=<optimized out>, isDone=0x7ffe8234762c, itemIsDone=0x558c78b75b18, isnull=0x558c78b759d8 "\200~rx\214U", values=0x558c78b759b8, econtext=0x558c78b74e00, targetlist=0x558c78b75ae0) at execQual.c:5423
#88 ExecProject (projInfo=<optimized out>, isDone=isDone@entry=0x7ffe8234762c) at execQual.c:5647
#89 0x0000558c760b3169 in ExecResult (node=node@entry=0x558c78b74ce8) at nodeResult.c:155
#90 0x0000558c76094e08 in ExecProcNode (node=node@entry=0x558c78b74ce8) at execProcnode.c:392
#91 0x0000558c76090c8e in ExecutePlan (dest=0x558c76753a60 <spi_printtupDR>, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x558c78b74ce8, estate=0x558c78b74bd0) at execMain.c:1569
#92 standard_ExecutorRun (queryDesc=0x558c78d0cb50, direction=<optimized out>, count=0) at execMain.c:338
#93 0x00007f8eb3d9ed5d in pgss_ExecutorRun (queryDesc=0x558c78d0cb50, direction=ForwardScanDirection, count=0) at pg_stat_statements.c:877
#94 0x00007f8eb3992f25 in pgqs_ExecutorRun (queryDesc=0x558c78d0cb50, direction=ForwardScanDirection, count=0) at pg_qualstats.c:564
#95 0x0000558c760c087f in _SPI_pquery (tcount=0, fire_triggers=1 '\001', queryDesc=0x558c78d0cb50) at spi.c:2412
#96 _SPI_execute_plan (plan=plan@entry=0x7ffe82347a30, paramLI=paramLI@entry=0x0, snapshot=snapshot@entry=0x0, crosscheck_snapshot=crosscheck_snapshot@entry=0x0, read_only=read_only@entry=0 '\000', fire_triggers=fire_triggers@entry=1 '\001', tcount=<optimized out>) at spi.c:2188
#97 0x0000558c760c0ac3 in SPI_execute (src=src@entry=0x7f8eb3b9a398 "SELECT powa_take_snapshot()", read_only=read_only@entry=0 '\000', tcount=tcount@entry=0) at spi.c:383
#98 0x00007f8eb3b99fc2 in powa_main (main_arg=<optimized out>) at powa.c:288
#99 0x0000558c7614160b in StartBackgroundWorker () at bgworker.c:742
#100 0x0000558c7614d135 in do_start_bgworker (rw=0x558c786dbba0) at postmaster.c:5612
#101 maybe_start_bgworkers () at postmaster.c:5809
#102 0x0000558c7614d4a3 in reaper (postgres_signal_arg=<optimized out>) at postmaster.c:2834
#103 <signal handler called>
#104 0x00007f8eb42a3273 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84
#105 0x0000558c75f0e9be in ServerLoop () at postmaster.c:1683
#106 0x0000558c7614f153 in PostmasterMain (argc=3, argv=0x558c786b75b0) at postmaster.c:1327
#107 0x0000558c75f1069d in main (argc=3, argv=0x558c786b75b0) at main.c:228
```
|
Thanks again for the report! I definitely agree about the signal handler, I'll refactor the code to use a flag. BTW, I think only elog(ERROR) and above are promoted to PANIC in a critsection (not a good reason to use elog inside such sections though). And thanks also for sqlsmith, great tool as usual :) |
@anse1 do you still have the query(ies) that raised this issue, or should I try to hit it running sqlsmith? |
Julien Rouhaud writes:
@anse1 do you still have the query(ies) that raised this issue, or
should I try to hit it running sqlsmith?
I don't think that the queries themselves are very relevant to the
signal handler problems and sqlsmith might also not be the fastest way
to reproduce it, since it can take quite a while until sqlsmith by
itself calls pg_reload_conf() just in the right moment the powa worker
is doing something.
I guess a better receipe for this issue would be to configure an
aggressive sampling interval for the powa worker on a big, heaviy loaded
database to make sure the powa worker is generally very busy and then
issue a statement like the following to give the handler a high chance
of clobbering something:
| select pg_reload_conf(), pg_sleep(0.05) from generate_series(1,1000);
Alternatively, stopping the powa worker at interesting positions with
gdb and manually issuing a SIGHUP should also be an efficient way to
trigger these problems.
regards,
Andreas
|
Digging some more through the collected core dumps, I also found
|
Just to clarify: I didn't notice your fix on the branch, so my last report was still about a core dump with the old handler. I'll do some testing with the new handler this evening. |
Thanks! Actually I forgot to push it yesterday :/ FTR, @marco44 has also been doing some testing on it, and it looks ok. He couldn't reproduce the same problem as what's reported here though. I hope I'll also be able to do some more testing too soon. |
Looking good: No crashes in powa so far with the new handler of
3e71401 while I had about 50 yesterday with the same amount of
stress testing done.
regards,
andreas
|
Thanks a lot for the confirmation! I'll do some more testing to make sure the frequency is updated as needed in every situation and push the fix. |
Sorry for late replied. I double checked the patch and everything looks goof to me, so I just pushed 3e71401. Thanks again for reporting this issue! |
I just released new version including this bugfix. Thanks! |
Hi,
running sqlsmith on a database with powa installed produced
crashes on pg_reload_conf(). In the backtrace below the handler
fired in a CriticalSection, where AllocSetContextCreate is
forbidden. There are probably a lot more failure modes. For
example, any elog() would be promoted to a PANIC in a Critical
Section.
IMHO powa must do this like the romans do: Don't do anything in
the signal handler besides setting a flag that a reload is
requested and do the actual work outside the handler where it is
safe to do so.
regards,
Andreas
The text was updated successfully, but these errors were encountered: