Improve transaction check in refresh_cagg #7566

staticlibs · 2025-01-01T17:28:24Z

Intro: Hi, I was investigating an issue with portal snapshots (0) did not account for all active snapshots (1) error inside another Postgres extension (wdb-97, unrelated to Timescale) and stumbled upon the #6533 issue in Timescale that was reporting the same error. Decided to have a deeper look into it.

This PR allows better error messages to be reported from refresh_continuous_aggregate when it is called from an atomic (no transaction allowed) context. One of the following messages:

ERROR: refresh_continuous_aggregate() cannot run inside a transaction block
ERROR: refresh_continuous_aggregate() cannot be executed from a function

is reported now instead of: ERROR: portal snapshots (N) did not account for all active snapshots (N+1). There are no other changes to refresh_continuous_aggregate logic.

Longer description, also included in process_utility.h:

Procedures that use multiple transactions cannot be run in a transaction block (from a function, from dynamic SQL) or in a subtransaction (from a procedure block with an EXCEPTION clause). Such procedures use PreventInTransactionBlock function to check whether they can be run.

Though currently such checks are incomplete, because PreventInTransactionBlock requires isTopLevel argument to throw a consistent error when the call originates from a function. This isTopLevel flag (that is a bit poorly named - see below) is not readily available inside C procedures. The source of truth for it - ProcessUtilityContext parameter is passed to ProcessUtility hooks, but is not included with the function calls. There is an undocumented SPI_inside_nonatomic_context function, that would have been sufficient for isTopLevel flag, but it currently returns false when SPI connection is absent (that is a valid scenario when C procedures are called from top-level SQL instead of PLPG procedures or DO blocks) so it cannot be used.

To work around this the value of ProcessUtilityContext parameter is saved when TS ProcessUtility hook is entered and can be accessed from C procedures using new ts_process_utility_is_context_nonatomic function. The result is called "non-atomic" instead of "top-level" because the way how isTopLevel flag is determined from the ProcessUtilityContext value in standard_ProcessUtility is insufficient for C procedures - it excludes PROCESS_UTILITY_QUERY_NONATOMIC value (used when called from PLPG procedure without an EXCEPTION clause) that is a valid use case for C procedures with transactions. See details in the description of ExecuteCallStmt function.

It is expected that calls to C procedures are done with CALL and always pass though the ProcessUtility hook. The ProcessUtilityContext parameter is set to PROCESS_UTILITY_TOPLEVEL value by default. In unlikely case when a C procedure is called without passing through ProcessUtility hook and the call is done in atomic context, then PreventInTransactionBlock checks will pass, but SPI_commit will fail when checking that all current active snapshots are portal-owned snapshots (the same behaviour that was observed before this change). In atomic context there will be an additional snapshot set in _SPI_execute_plan, see the snapshot handling invariants description in that function.

With initial version of this PR, in TS ProcessUtility hook the saved ProcessUtilityContext value is reset back to PROCESS_UTILITY_TOPLEVEL on normal exit but is NOT reset in case of ereport exit. C procedures can call ts_process_utility_context_reset function to reset the saved value before doing the checks that can result in ereport exit. The scenario when more thorough reset may be necessary - when subsequent calls after the failed atomic call are not passed through the ProcessUtility hook - seems to be unlikely.

Closes #6533.

Disable-check: commit-count

github-actions · 2025-01-01T17:28:44Z

@erimatnor, @gayyappan: please review this pull request.

Powered by pull-review

fabriziomello · 2025-01-02T15:57:59Z

@staticlibs thanks a lot for the PR. I'll have a look on it!

codecov · 2025-01-02T16:14:13Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.24%. Comparing base (59f50f2) to head (2f38d9c).
Report is 700 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7566      +/-   ##
==========================================
+ Coverage   80.06%   82.24%   +2.17%     
==========================================
  Files         190      238      +48     
  Lines       37181    43848    +6667     
  Branches     9450    11011    +1561     
==========================================
+ Hits        29770    36064    +6294     
- Misses       2997     3441     +444     
+ Partials     4414     4343      -71

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

staticlibs · 2025-01-02T17:05:57Z

@fabriziomello thanks! I've just added a few fixes to format and spelling (no code changes) and added the changelog file to .unreleased dir.

fabriziomello · 2025-01-02T17:31:51Z

@staticlibs BTW very good catch... thanks a lot.

tsl/src/continuous_aggs/refresh.c

src/process_utility.c

Procedures that use multiple transactions cannot be run in a transaction block (from a function, from dynamic SQL) or in a subtransaction (from a procedure block with an EXCEPTION clause). Such procedures use PreventInTransactionBlock function to check whether they can be run. Though currently such checks are incompete, because PreventInTransactionBlock requires isTopLevel argument to throw a consistent error when the call originates from a function. This isTopLevel flag (that is a bit poorly named - see below) is not readily available inside C procedures. The source of truth for it - ProcessUtilityContext parameter is passed to ProcessUtility hooks, but is not included with the function calls. There is an undocumented SPI_inside_nonatomic_context function, that would have been sufficient for isTopLevel flag, but it currently returns false when SPI connection is absent (that is a valid scenario when C procedures are called from top-lelev SQL instead of PLPG procedures or DO blocks) so it cannot be used. To work around this the value of ProcessUtilityContext parameter is saved when TS ProcessUtility hook is entered and can be accessed from C procedures using new ts_process_utility_is_context_nonatomic function. The result is called "non-atomic" instead of "top-level" because the way how isTopLevel flag is determined from the ProcessUtilityContext value in standard_ProcessUtility is insufficient for C procedures - it excludes PROCESS_UTILITY_QUERY_NONATOMIC value (used when called from PLPG procedure without an EXCEPTION clause) that is a valid use case for C procedures with transactions. See details in the description of ExecuteCallStmt function. It is expected that calls to C procedures are done with CALL and always pass though the ProcessUtility hook. The ProcessUtilityContext parameter is set to PROCESS_UTILITY_TOPLEVEL value by default. In unlikely case when a C procedure is called without passing through ProcessUtility hook and the call is done in atomic context, then PreventInTransactionBlock checks will pass, but SPI_commit will fail when checking that all current active snapshots are portal-owned snapshots (the same behaviour that was observed before this change). In atomic context there will be an additional snapshot set in _SPI_execute_plan, see the snapshot handling invariants description in that function. Closes timescale#6533.

@bharrisau

This release contains performance improvements and bug fixes since the 2.17.2 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6901: Add hypertable support for transition tables. * timescale#7104: Hypercore table access method. * timescale#7271: Push down `order by` in real-time continuous aggregate queries. * timescale#7295: Support `alter table set access method` on hypertable. * timescale#7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column * timescale#7390: Disable custom `hashagg` planner code. * timescale#7411: Change parameter name to enable hypercore table access method. * timescale#7412: Add GUC for `hypercore_use_access_method` default. * timescale#7413: Add GUC for segmentwise recompression. * timescale#7433 Add support for merging chunks * timescale#7436 Add index creation on orderby columns * timescale#7443: Add hypercore function and view aliases. * timescale#7455: Support `drop not null` on compressed hypertables. * timescale#7458: Support vecorized aggregation with aggregate `filter` clauses that are also vectorizable. * timescale#7482: Optimize recompression of partially compressed chunks. * timescale#7486: Prevent building against postgres versions with broken ABI. * timescale#7521 Add optional `force` argument to `refresh_continuous_aggregate` * timescale#7528 Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases. * timescale#7565 Add hint when hypertable creation fails * timescale#7587 Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API **Bugfixes** * timescale#7378: Remove obsolete job referencing `policy_job_error_retention`. * timescale#7409: Update `bgw_job` table when altering procedure. * timescale#7410: Fix the `aggregated compressed column not found` error on aggregation query. * timescale#7426: Fix `datetime` parsing error in chunk constraint creation. * timescale#7432: Verify that the heap tuple is valid before using. * timescale#7434: Fixes the segfault when internally setting the replica identity for a given chunk. * timescale#7488: Emit error for transition table trigger on chunks. * timescale#7514: Fix the error: `invalid child of chunk append`. * timescale#7517 Fixes performance regression on `cagg_migrate` procedure * timescale#7527 Restart scheduler on error * timescale#7557: Fix null handling for in-memory tuple filtering. * timescale#7566 Improve transaction check in CAgg refresh * timescale#7584 Fix NaN-handling for vectorized aggregation **Thanks** * @bharrisau for reporting the segfault when creating chunks. * @k-rus for suggesting the improvement * @pgloader for reporting the issue in an internal background job. * @staticlibs for sending PR to improve transaction check in CAgg refresh * @uasiddiqi for reporting the `aggregated compressed column not found` error.

@bharrisau

This release contains performance improvements and bug fixes since the 2.17.2 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6901: Add hypertable support for transition tables. * timescale#7104: Hypercore table access method. * timescale#7271: Push down `order by` in real-time continuous aggregate queries. * timescale#7295: Support `alter table set access method` on hypertable. * timescale#7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column * timescale#7390: Disable custom `hashagg` planner code. * timescale#7411: Change parameter name to enable hypercore table access method. * timescale#7412: Add GUC for `hypercore_use_access_method` default. * timescale#7413: Add GUC for segmentwise recompression. * timescale#7433 Add support for merging chunks * timescale#7436 Add index creation on orderby columns * timescale#7443: Add hypercore function and view aliases. * timescale#7455: Support `drop not null` on compressed hypertables. * timescale#7458: Support vecorized aggregation with aggregate `filter` clauses that are also vectorizable. * timescale#7482: Optimize recompression of partially compressed chunks. * timescale#7486: Prevent building against postgres versions with broken ABI. * timescale#7521 Add optional `force` argument to `refresh_continuous_aggregate` * timescale#7528 Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases. * timescale#7565 Add hint when hypertable creation fails * timescale#7587 Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API **Bugfixes** * timescale#7378: Remove obsolete job referencing `policy_job_error_retention`. * timescale#7409: Update `bgw_job` table when altering procedure. * timescale#7410: Fix the `aggregated compressed column not found` error on aggregation query. * timescale#7426: Fix `datetime` parsing error in chunk constraint creation. * timescale#7432: Verify that the heap tuple is valid before using. * timescale#7434: Fixes the segfault when internally setting the replica identity for a given chunk. * timescale#7488: Emit error for transition table trigger on chunks. * timescale#7514: Fix the error: `invalid child of chunk append`. * timescale#7517 Fixes performance regression on `cagg_migrate` procedure * timescale#7527 Restart scheduler on error * timescale#7557: Fix null handling for in-memory tuple filtering. * timescale#7566 Improve transaction check in CAgg refresh * timescale#7584 Fix NaN-handling for vectorized aggregation **Thanks** * @bharrisau for reporting the segfault when creating chunks. * @k-rus for suggesting the improvement * @pgloader for reporting the issue in an internal background job. * @staticlibs for sending PR to improve transaction check in CAgg refresh * @uasiddiqi for reporting the `aggregated compressed column not found` error.

github-actions bot requested review from erimatnor and gayyappan January 1, 2025 17:28

staticlibs mentioned this pull request Jan 1, 2025

[Bug]: {\"message\":\"ERROR: portal snapshots (0) did not account for all active snapshots (1) (SQLSTATE XX000)\"} when refreshing a continuous aggregate via a trigger #6533

Closed

fabriziomello assigned staticlibs and fabriziomello and unassigned staticlibs Jan 2, 2025

fabriziomello added continuous_aggregate bug labels Jan 2, 2025

fabriziomello force-pushed the refresh-cagg-transaction-check branch from 5ecfc69 to ac22176 Compare January 2, 2025 16:53

fabriziomello reviewed Jan 2, 2025

View reviewed changes

tsl/src/continuous_aggs/refresh.c Show resolved Hide resolved

fabriziomello force-pushed the refresh-cagg-transaction-check branch from 5091105 to 6a92da6 Compare January 7, 2025 13:45

fabriziomello reviewed Jan 7, 2025

View reviewed changes

src/process_utility.c Outdated Show resolved Hide resolved

fabriziomello force-pushed the refresh-cagg-transaction-check branch 2 times, most recently from a8fbd28 to ca80649 Compare January 9, 2025 14:09

fabriziomello approved these changes Jan 9, 2025

View reviewed changes

fabriziomello added this to the TimescaleDB 2.18.0 milestone Jan 9, 2025

fabriziomello force-pushed the refresh-cagg-transaction-check branch from ca80649 to cabf588 Compare January 10, 2025 14:15

fabriziomello requested a review from mkindahl January 13, 2025 14:06

erimatnor approved these changes Jan 14, 2025

View reviewed changes

fabriziomello force-pushed the refresh-cagg-transaction-check branch 2 times, most recently from b272ce7 to 4137d41 Compare January 15, 2025 14:46

fabriziomello force-pushed the refresh-cagg-transaction-check branch from 1bd8e0d to 2f38d9c Compare January 15, 2025 14:51

fabriziomello enabled auto-merge (rebase) January 15, 2025 14:55

fabriziomello merged commit 85437c0 into timescale:main Jan 15, 2025
48 checks passed

timescale-automation mentioned this pull request Jan 15, 2025

Backport to 2.17.x: #7566: Improve transaction check in refresh_cagg #7593

Closed

staticlibs deleted the refresh-cagg-transaction-check branch January 15, 2025 18:27

pallavisontakke mentioned this pull request Jan 16, 2025

Release 2.18.0 #7596

Open

pallavisontakke mentioned this pull request Jan 17, 2025

Release 2.18.0 #7599

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve transaction check in refresh_cagg #7566

Improve transaction check in refresh_cagg #7566

staticlibs commented Jan 1, 2025 •

edited by fabriziomello

Loading

github-actions bot commented Jan 1, 2025

fabriziomello commented Jan 2, 2025

codecov bot commented Jan 2, 2025 •

edited

Loading

staticlibs commented Jan 2, 2025

fabriziomello commented Jan 2, 2025

Improve transaction check in refresh_cagg #7566

Improve transaction check in refresh_cagg #7566

Conversation

staticlibs commented Jan 1, 2025 • edited by fabriziomello Loading

github-actions bot commented Jan 1, 2025

fabriziomello commented Jan 2, 2025

codecov bot commented Jan 2, 2025 • edited Loading

Codecov Report

staticlibs commented Jan 2, 2025

fabriziomello commented Jan 2, 2025

staticlibs commented Jan 1, 2025 •

edited by fabriziomello

Loading

codecov bot commented Jan 2, 2025 •

edited

Loading