Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure in cursor_fetcher_rewind #3786

Closed
svenklemm opened this issue Nov 8, 2021 · 3 comments · Fixed by #3917
Closed

Assertion failure in cursor_fetcher_rewind #3786

svenklemm opened this issue Nov 8, 2021 · 3 comments · Fixed by #3917
Assignees
Labels

Comments

@svenklemm
Copy link
Member

Query is in frame 34. Test setup is in #3784. Found by sqlsmith.

#0  0x00007f95eb796d22 in raise () from /usr/lib/libc.so.6
#1  0x00007f95eb780862 in abort () from /usr/lib/libc.so.6
#2  0x0000556e9311b5f3 in ExceptionalCondition (conditionName=0x7f95e22fe360 "cursor->state.data_req != NULL", errorType=0x7f95e22fe1de "FailedAssertion",
    fileName=0x7f95e22fe230 "/home/sven/projects/timescaledb/tsl/src/remote/cursor_fetcher.c", lineNumber=424) at assert.c:69
#3  0x00007f95e22ca761 in cursor_fetcher_rewind (df=0x556e9abe8328) at /home/sven/projects/timescaledb/tsl/src/remote/cursor_fetcher.c:424
#4  0x00007f95e22a473a in fdw_scan_rescan (ss=0x556e9abdaf60, fsstate=0x556e9abdb078) at /home/sven/projects/timescaledb/tsl/src/fdw/scan_exec.c:308
#5  0x00007f95e22adfa5 in data_node_scan_rescan (node=0x556e9abdaf60) at /home/sven/projects/timescaledb/tsl/src/fdw/data_node_scan_exec.c:113
#6  0x0000556e92d15822 in ExecReScanCustomScan (node=0x556e9abdaf60) at nodeCustom.c:136
#7  0x0000556e92cd1e70 in ExecReScan (node=0x556e9abdaf60) at execAmi.c:239
#8  0x0000556e92d10999 in ExecReScanAppend (node=0x556e9abdab70) at nodeAppend.c:454
#9  0x0000556e92cd1ce9 in ExecReScan (node=0x556e9abdab70) at execAmi.c:147
#10 0x00007f95e22aeada in async_append_rescan (node=0x556e9abda560) at /home/sven/projects/timescaledb/tsl/src/nodes/async_append.c:301
#11 0x0000556e92d15822 in ExecReScanCustomScan (node=0x556e9abda560) at nodeCustom.c:136
#12 0x0000556e92cd1e70 in ExecReScan (node=0x556e9abda560) at execAmi.c:239
#13 0x0000556e92d39e7e in ExecReScanNestLoop (node=0x556e9abda3b0) at nodeNestloop.c:401
#14 0x0000556e92cd1e81 in ExecReScan (node=0x556e9abda3b0) at execAmi.c:243
#15 0x0000556e92d2a38c in ExecProcNode (node=0x556e9abda3b0) at ../../../src/include/executor/executor.h:255
#16 0x0000556e92d2a602 in ExecLimit (pstate=0x556e9abda100) at nodeLimit.c:96
#17 0x0000556e92d41b72 in ExecProcNode (node=0x556e9abda100) at ../../../src/include/executor/executor.h:257
#18 0x0000556e92d41b93 in SubqueryNext (node=0x556e9abd9f50) at nodeSubqueryscan.c:53
#19 0x0000556e92cfacb2 in ExecScanFetch (node=0x556e9abd9f50, accessMtd=0x556e92d41b74 <SubqueryNext>, recheckMtd=0x556e92d41b9d <SubqueryRecheck>) at execScan.c:133
#20 0x0000556e92cfad53 in ExecScan (node=0x556e9abd9f50, accessMtd=0x556e92d41b74 <SubqueryNext>, recheckMtd=0x556e92d41b9d <SubqueryRecheck>) at execScan.c:199
#21 0x0000556e92d41bee in ExecSubqueryScan (pstate=0x556e9abd9f50) at nodeSubqueryscan.c:87
#22 0x0000556e92d394d5 in ExecProcNode (node=0x556e9abd9f50) at ../../../src/include/executor/executor.h:257
#23 0x0000556e92d39961 in ExecNestLoop (pstate=0x556e9abd6450) at nodeNestloop.c:160
#24 0x0000556e92cf69a2 in ExecProcNodeFirst (node=0x556e9abd6450) at execProcnode.c:463
#25 0x0000556e92d2a39d in ExecProcNode (node=0x556e9abd6450) at ../../../src/include/executor/executor.h:257
#26 0x0000556e92d2a602 in ExecLimit (pstate=0x556e9abd61a0) at nodeLimit.c:96
#27 0x0000556e92cf69a2 in ExecProcNodeFirst (node=0x556e9abd61a0) at execProcnode.c:463
#28 0x0000556e92cea5d2 in ExecProcNode (node=0x556e9abd61a0) at ../../../src/include/executor/executor.h:257
#29 0x0000556e92ced192 in ExecutePlan (estate=0x556e9926e280, planstate=0x556e9abd61a0, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x556e9a61bcb0,
    execute_once=true) at execMain.c:1551
#30 0x0000556e92ceacf9 in standard_ExecutorRun (queryDesc=0x556e96a62280, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:361
#31 0x0000556e92ceaae8 in ExecutorRun (queryDesc=0x556e96a62280, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:305
#32 0x0000556e92f650fa in PortalRunSelect (portal=0x556e93bfcac0, forward=true, count=0, dest=0x556e9a61bcb0) at pquery.c:919
#33 0x0000556e92f64d22 in PortalRun (portal=0x556e93bfcac0, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x556e9a61bcb0, altdest=0x556e9a61bcb0, qc=0x7ffd073a01b0) at pquery.c:763
#34 0x0000556e92f5dce8 in exec_simple_query (
    query_string=0x556e94146170 "select  \n  case when pg_catalog.pg_switch_wal() = pg_catalog.pg_last_wal_receive_lsn() then 9 else 9 end\n     as c0, \n  ref_0.label as c1, \n  subq_1.c1 as c2, \n  pg_catalog.inet_server_addr() as c3, \n  pg_catalog.overlaps(\n    cast(case when subq_1.c3 is not NULL then pg_catalog.make_time(\n        cast(subq_1.c0 as int4),\n        cast(ref_0.val as int4),\n        cast(cast(coalesce(pg_catalog.pg_stat_get_checkpoint_sync_time(),\n          pg_catalog.pg_stat_get_checkpoint_write_time()) as float8) as float8)) else pg_catalog.make_time(\n        cast(subq_1.c0 as int4),\n        cast(ref_0.val as int4),\n        cast(cast(coalesce(pg_catalog.pg_stat_get_checkpoint_sync_time(),\n          pg_catalog.pg_stat_get_checkpoint_write_time()) as float8) as float8)) end\n       as \"time\"),\n    cast(cast(null as \"interval\") as \"interval\"),\n    cast(pg_catalog.time(\n      cast(pg_catalog.pg_last_xact_replay_timestamp() as timestamptz)) as \"time\"),\n    cast(case when cast(null as circle) @> case when cast(null as oid) < cast(null as oid) then cast(null as point) else cast(null as point) end\n           then cast(coalesce(cast(null as \"time\"),\n        cast(null as \"time\")) as \"time\") else cast(coalesce(cast(null as \"time\"),\n        cast(null as \"time\")) as \"time\") end\n       as \"time\")) as c4, \n  (select id from public.test_hypertable limit 1 offset 6)\n     as c5, \n  subq_1.c0 as c6, \n  ref_0.id as c7, \n  ref_0.label as c8, \n  subq_1.c2 as c9, \n  pg_catalog.money(\n    cast(21 as int4)) as c10, \n  subq_1.c2 as c11, \n  ref_0.label as c12\nfrom \n  public.test_distributed as ref_0,\n  lateral (select  \n        subq_0.c3 as c0, \n        ref_1.id as c1, \n        (select id from public.test_hypertable limit 1 offset 3)\n           as c2, \n        ref_0.val as c3\n      from \n        public.test_distributed as ref_1,\n        lateral (select  \n              (select label from public.test_distributed limit 1 offset 6)\n                 as c0, \n              ref_1.label as c1, \n              ref_0.label as c2, \n              sample_0.val as c3, \n              ref_0.label as c4\n            from \n              public.test_distributed as sample_0 tablesample bernoulli (2.8) \n            where cast(null as int8) >= cast(null as int2)\n            limit 170) as subq_0\n      where (ref_0.id <> subq_0.c2) \n        or (ref_0.id = subq_0.c0)\n      limit 39) as subq_1\nwhere (pg_catalog.pg_stat_get_buf_fsync_backend() < subq_1.c0) \n  or (case when ((cast(null as circle) ~= cast(null as circle)) \n          or (subq_1.c2 is not NULL)) \n        or ((((EXISTS (\n                select  \n                    subq_1.c3 as c0, \n                    ref_0.val as c1, \n                    ref_0.time as c2, \n                    ref_0.val as c3, \n                    sample_1.label as c4\n                  from \n                    public.test_distributed as sample_1 tablesample bernoulli (0.3) \n                  where (cast(null as polygon) @> cast(null as point)) \n                    and (((cast(null as float8) = cast(null as float8)) \n                        or (false >= true)) \n                      or (cast(null as _aclitem) @> cast(null as aclitem))))) \n              and (ref_0.val is not NULL)) \n            or ((cast(null as \"numeric\") = cast(null as \"numeric\")) \n              or ((cast(null as inet) < cast(null as inet)) \n                or (false)))) \n          and (cast(null as uuid) <> cast(null as uuid))) then cast(coalesce(case when (cast(null as circle) |&> cast(null as circle)) \n            or (cast(null as record) *= cast(null as record)) then pg_catalog.make_time(\n            cast(97 as int4),\n            cast(ref_0.val as int4),\n            cast(cast(null as float8) as float8)) else pg_catalog.make_time(\n            cast(97 as int4),\n            cast(ref_0.val as int4),\n            cast(cast(null as float8) as float8)) end\n          ,\n        case when EXISTS (\n            select  \n                subq_1.c1 as c0, \n                sample_2.time as c1\n              from \n                public.test_hypertable as sample_2 tablesample system (2.4) \n              where cast(null as path) > cast(null as path)\n              limit 77) then pg_catalog.time_pl_interval(\n            cast(cast(null as \"time\") as \"time\"),\n            cast(cast(null as \"interval\") as \"interval\")) else pg_catalog.time_pl_interval(\n            cast(cast(null as \"time\") as \"time\"),\n            cast(cast(null as \"interval\") as \"interval\")) end\n          ) as \"time\") else cast(coalesce(case when (cast(null as circle) |&> cast(null as circle)) \n            or (cast(null as record) *= cast(null as record)) then pg_catalog.make_time(\n            cast(97 as int4),\n            cast(ref_0.val as int4),\n            cast(cast(null as float8) as float8)) else pg_catalog.make_time(\n            cast(97 as int4),\n            cast(ref_0.val as int4),\n            cast(cast(null as float8) as float8)) end\n          ,\n        case when EXISTS (\n            select  \n                subq_1.c1 as c0, \n                sample_2.time as c1\n              from \n                public.test_hypertable as sample_2 tablesample system (2.4) \n              where cast(null as path) > cast(null as path)\n              limit 77) then pg_catalog.time_pl_interval(\n            cast(cast(null as \"time\") as \"time\"),\n            cast(cast(null as \"interval\") as \"interval\")) else pg_catalog.time_pl_interval(\n            cast(cast(null as \"time\") as \"time\"),\n            cast(cast(null as \"interval\") as \"interval\")) end\n          ) as \"time\") end\n       <> cast(null as \"time\"))\nlimit 187") at postgres.c:1214
#35 0x0000556e92f62b43 in PostgresMain (argc=1, argv=0x7ffd073a03d0, dbname=0x7f95e24ecca8 "dev", username=0x7f95e24ecc88 "sven") at postgres.c:4486
#36 0x0000556e92e897a9 in BackendRun (port=0x556e93ba2e80) at postmaster.c:4506
#37 0x0000556e92e89010 in BackendStartup (port=0x556e93ba2e80) at postmaster.c:4228
#38 0x0000556e92e84e7b in ServerLoop () at postmaster.c:1745
#39 0x0000556e92e845f0 in PostmasterMain (argc=3, argv=0x556e93b75650) at postmaster.c:1417
#40 0x0000556e92d767f1 in main (argc=3, argv=0x556e93b75650) at main.c:209
@NunoFilipeSantos NunoFilipeSantos added bug segfault Segmentation fault labels Nov 8, 2021
@erimatnor
Copy link
Contributor

erimatnor commented Nov 22, 2021

@svenklemm can you explain what you mean by the test setup being found in #3784? That's a support issue and I cannot find anything sqlsmith-related.

It would be good if you could provide proper ways to reproduce these issues instead of just copy-pasting a stack trace without a way to reproduce it. I appreciate that you've found crashes, but it is very hard to move forward on these issues with this information.

Could you please provide the following:

  1. The schemas, tables, and query used to trigger the bug
  2. ...and/or the corresponding SQLsmith setup in a way that can be easily recreated
  3. The PostgreSQL version used.
  4. The TimescaleDB version used
  5. Any other useful information to trigger the bug

@erimatnor erimatnor self-assigned this Nov 22, 2021
@svenklemm
Copy link
Member Author

@svenklemm can you explain what you mean by the test setup being found in #3784? That's a support issue and I cannot find anything sqlsmith-related.

It would be good if you could provide proper ways to reproduce these issues instead of just copy-pasting a stack trace without a way to reproduce it. I appreciate that you've found crashes, but it is very hard to move forward on these issues with this information.

Could you please provide the following:

  1. The schemas, tables, and query used to trigger the bug
  2. ...and/or the corresponding SQLsmith setup in a way that can be easily recreated
  3. The PostgreSQL version used.
  4. The TimescaleDB version used
  5. Any other useful information to trigger the bug

@erimatnor The schema and tables are those from #3784. The query triggering the bug is in frame 34 of the stacktrace (which is mentioned in the first sentence of the ticket). Latest postgres and timescaledb version. Which extra information do you need? Did you even try to reproduce?

@NunoFilipeSantos
Copy link
Contributor

@svenklemm neither @nikkhils nor @erik were able to reproduce it. If you can provide the exact setup for the to reproduce it, please do it.

svenklemm added a commit to svenklemm/timescaledb that referenced this issue Dec 10, 2021
The code in in cursor_fetcher_rewind asserted that there always
is an associated request which is not true if EOF was reached
already.

Fixes timescale#3786
svenklemm added a commit to svenklemm/timescaledb that referenced this issue Dec 10, 2021
The code in in cursor_fetcher_rewind asserted that there always
is an associated request which is not true if EOF was reached
already.

Fixes timescale#3786
svenklemm added a commit to svenklemm/timescaledb that referenced this issue Dec 11, 2021
The code in in cursor_fetcher_rewind asserted that there always
is an associated request which is not true if EOF was reached
already.

Fixes timescale#3786
svenklemm added a commit to svenklemm/timescaledb that referenced this issue Dec 13, 2021
The code in cursor_fetcher_rewind asserted that there always
is an associated request which is not true if EOF was reached
already. Found by sqlsmith.

Fixes timescale#3786
svenklemm added a commit that referenced this issue Dec 13, 2021
The code in cursor_fetcher_rewind asserted that there always
is an associated request which is not true if EOF was reached
already. Found by sqlsmith.

Fixes #3786
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants