Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2/2] Cherry-pick ecpglib: call newlocale() once per process etc #802

Merged
merged 38 commits into from
Dec 24, 2024

Conversation

yjhjstz
Copy link
Member

@yjhjstz yjhjstz commented Dec 19, 2024

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@yjhjstz yjhjstz added the cherry-pick cherry-pick upstream commts label Dec 19, 2024
@yjhjstz yjhjstz force-pushed the pick_1218 branch 4 times, most recently from 2677322 to a8c3e72 Compare December 22, 2024 23:56
@yjhjstz yjhjstz changed the title Cherry-pick ecpglib: call newlocale() once per process etc [2/2] Cherry-pick ecpglib: call newlocale() once per process etc Dec 23, 2024
nmisch and others added 22 commits December 24, 2024 09:45
ecpglib has been calling it once per SQL query and once per EXEC SQL GET
DESCRIPTOR.  Instead, if newlocale() has not succeeded before, call it
while establishing a connection.  This mitigates three problems:
- If newlocale() failed in EXEC SQL GET DESCRIPTOR, the command silently
  proceeded without the intended locale change.
- On AIX, each newlocale()+freelocale() cycle leaked memory.
- newlocale() CPU usage may have been nontrivial.

Fail the connection attempt if newlocale() fails.  Rearrange
ecpg_do_prologue() to validate the connection before its uselocale().

The sort of program that may regress is one running in an environment
where newlocale() fails.  If that program establishes connections
without running SQL statements, it will stop working in response to this
change.  I'm betting against the importance of such an ECPG use case.
Most SQL execution (any using ECPGdo()) has long required newlocale()
success, so there's little a connection could do without newlocale().

Back-patch to v10 (all supported versions).

Reviewed by Tom Lane.  Reported by Guillaume Lelarge.

Discussion: https://postgr.es/m/[email protected]
Long log:
When we write compressed data into a full disk, gpfdist will return 500 error.
After 5 mins, the session will be removed.
Then, if we write data into the full disk again, gpfdist will fail, like: 
```
gz_file_write_one_chunk: Assertion `ret1 != (-2)' failed.
```
This is a phenomenon that users do not expect.
In fact, gpfdist can still work for other request.
We think we shouldn't carry on assert(ret1 != Z_STREAM_ERROR);
If the error happens, we need to judge and return directly, avoiding gpfdist fails.
The log is shown as below:
```
2022-06-27 18:10:55 6837 INFO active segids in session: 0 1 2
2022-06-27 18:10:55 6837 WARN [1:6:1:11] handle_post_request, write error: cannot write into file
2022-06-27 18:10:55 6837 WARN [1:6:1:11] HTTP ERROR: 127.0.0.1 - 500 cannot write into file
2022-06-27 18:10:55 6837 WARN [1:6:1:11] gpfdist read unexpected data after shutdown
// .....................
2022-06-27 18:16:49 6837 INFO remove sessions
2022-06-27 18:16:49 6837 INFO remove out-dated session 812-0000000037.25.0.0:/home/gpadmin/core_analysis/load_files/test_file/demo_str.txt.gz
2022-06-27 18:16:49 6837 INFO free session 812-0000000037.25.0.0:/home/gpadmin/core_analysis/load_files/test_file/demo_str.txt.gz
gpfdist: gfile.c:376: gz_file_write_one_chunk: Assertion `ret1 != (-2)' failed.
```

Signed-off-by: Yongtao Huang<[email protected]>
Since commit e7010c "pg_ctl: Add wait option to promote action" have
option -t to wait for required number of secs for promotion. Hence,
deleting the retry loop with dbconn.execSQL() logic to check for
promotion completion via connection acceptance.

Using -t option with 10 mins as timeout to align the value with
MIRROR_PROMOTION_TIMEOUT used in other part of codebase. Having
consistent timeout value for same operation is helpful.

Reviewed-by: Nikhil Kak <[email protected]>
Reviewed-by: Soumyadeep Chakraborty <[email protected]>
…rtition table (#13687)

When we create a temporary table, we can use ON COMMIT to control the behavior
of temporary tables at the end of a transaction block, such as DELETE ROWS, means
delete all rows in this temporary table after this transaction is committed.

But when used on a partitioned table, this is not cascaded to its partitions, so we can just
use ONCOMMIT_NOOP (default behavior) when we create child partition tables.

And If we use the parent on commit action, it will lead to inconsistent behavior between
Greenplum and PostgreSQL, which we need to avoid.
When a query spawn a write gang and multiple read gangs on segment
and idle_in_transaction_session_timeout was set to a non-zero
value, in this case, if a read gang is still in progress, but the
write gang has done its execution, some unexpected errors may be
caused due to the early termination of the write gang.

In this fix, idle_in_transaction_session_timeout is disabled on QE.

Co-authored-by: wuchengwen <[email protected]>
Commit 1fe901d removed the rety loop for checking if promotion is
completed for standby. As side effect it also deleted the logic to
force CHECKPOINT after promotion. Bring that logic back as it required
to write CHECKPOINT after pg_ctl promote mainly to satisfy
tests. pg_ctl promote uses fast_promotion and hence pg_controlfile
gets updated with status as DB_IN_PRODUCTION but given CHECKPOINT is
not forced the TLI value is not updated. gpstart uses the TLI value to
validate if standby is promoted or not. Hence, bringing back the logic
to force CHECKPOINT after promotion.
When locking a specific named relation for a FOR [KEY] UPDATE/SHARE
clause, transformLockingClause() finds the relation to lock by
scanning the rangetable for an RTE with a matching eref->aliasname.
However, it failed to account for the visibility rules of a join RTE.

If a join RTE doesn't have a user-supplied alias, it will have a
generated eref->aliasname of "unnamed_join" that is not visible as a
relation name in the parse namespace. Such an RTE needs to be skipped,
otherwise it might be found in preference to a regular base relation
with a user-supplied alias of "unnamed_join", preventing it from being
locked.

In addition, if a join RTE doesn't have a user-supplied alias, but
does have a join_using_alias, then the RTE needs to be matched using
that alias rather than the generated eref->aliasname, otherwise a
misleading "relation not found" error will be reported rather than a
"join cannot be locked" error.

Backpatch all the way, except for the second part which only goes back
to 14, where JOIN USING aliases were added.

Dean Rasheed, reviewed by Tom Lane.

Discussion: https://postgr.es/m/CAEZATCUY_KOBnqxbTSPf=7fz9HWPnZ5Xgb9SwYzZ8rFXe7nb=w@mail.gmail.com

fix join
When `"${WITH_MIRRORS}" != "true"`, do not create the mirror directories
since no segment will be created in that directory.

Signed-off-by: Junwang Zhao <[email protected]>
When you hit ^C, the terminal driver in Unix-like systems echoes "^C" as
well as sending an interrupt signal (depending on stty settings).  At
least libedit (but maybe also libreadline) is then confused about the
current cursor location, and corrupts the display if you try to scroll
back.  Fix, by moving to a new line before the next prompt is displayed.

Back-patch to all supported released.

Author: Pavel Stehule <[email protected]>
Reported-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/3278793.1626198638%40sss.pgh.pa.us
dshash.c previously maintained flags to be able to assert that you
didn't hold any partition lock.  These flags could get out of sync with
reality in error scenarios.

Get rid of all that, and make assertions about the locks themselves
instead.  Since LWLockHeldByMe() loops internally, we don't want to put
that inside another loop over all partition locks.  Introduce a new
debugging-only interface LWLockAnyHeldByMe() to avoid that.

This problem was noted by Tom and Andres while reviewing changes to
support the new shared memory stats system, and later showed up in
reality while working on commit 389869af.

Back-patch to 11, where dshash.c arrived.

Reported-by: Tom Lane <[email protected]>
Reported-by: Andres Freund <[email protected]>
Reviewed-by: Kyotaro HORIGUCHI <[email protected]>
Reviewed-by: Zhihong Yu <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/CA%2BhUKGJ31Wce6HJ7xnVTKWjFUWQZPBngxfJVx4q0E98pDr3kAw%40mail.gmail.com
The commit https://github.com/greenplum-db/gpdb/commit/ab12230fa11574bf2e7470242489864fe8a63c1b
added some WIN32 ifdefs to fix fe-connect compilation errors.

The commit https://github.com/greenplum-db/gpdb/commit/5d3dc2e81f272396281eddc082bd998ab96fd0da
allows hashfn.c to be compiled into both frontend and backend code but
broke the win32 clients compilation with

`src\include\utils\elog.h(86): fatal error C1083: Cannot open include
file: 'pthread-win32.h': No such file or directory (compiling source
file src/common/hashfn.c)`

We carry a dependency on pthread_win32.h in elog.h, which causes
compilation errors when building Windows clients (as elog.h is included by
postgres.h). So use postgres_fe.h instead for this case.

Authored-by: Brent Doil <[email protected]>
On Linux, we call posix_fallocate() on shm_open()'d memory to avoid
later potential SIGBUS (see commit 899bd78).

Based on field reports of systems stuck in an EINTR retry loop there,
there, we made it possible to break out of that loop via slightly odd
coding where the CHECK_FOR_INTERRUPTS() call was somewhat removed from
the loop (see commit 422952e).

On further reflection, that was not a great choice for at least two
reasons:

1.  If interrupts were held, the CHECK_FOR_INTERRUPTS() would do nothing
and the EINTR error would be surfaced to the user.

2.  If EINTR was reported but neither QueryCancelPending nor
ProcDiePending was set, then we'd dutifully retry, but with a bit more
understanding of how posix_fallocate() works, it's now clear that you
can get into a loop that never terminates.  posix_fallocate() is not a
function that can do some of the job and tell you about progress if it's
interrupted, it has to undo what it's done so far and report EINTR, and
if signals keep arriving faster than it can complete (cf recovery
conflict signals), you're stuck.

Therefore, for now, we'll simply block most signals to guarantee
progress.  SIGQUIT is not blocked (see InitPostmasterChild()), because
its expected handler doesn't return, and unblockable signals like
SIGCONT are not expected to arrive at a high rate.  For good measure,
we'll include the ftruncate() call in the blocked region, and add a
retry loop.

Back-patch to all supported releases.

Reported-by: Alvaro Herrera <[email protected]>
Reported-by: Nicola Contu <[email protected]>
Reviewed-by: Alvaro Herrera <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql
This seems like a omission when mergeing upstream postgresql 12,
some `fprintf` are changed to `pg_log_error`, every change removed
the ending `\n` except here.

This patch fixed issue #13700

Signed-off-by: Junwang Zhao <[email protected]>
Commit 4518c798 intended to block signals in regular backends that
allocate DSM segments, but dsm_impl_resize() is also reached by
dsm_postmaster_startup().  It's not OK to clobber the postmaster's
signal mask, so only manipulate the signal mask when under the
postmaster.

Back-patch to all releases, like 4518c798.

Discussion: https://postgr.es/m/CA%2BhUKGKNpK%3D2OMeea_AZwpLg7Bm4%3DgYWk7eDjZ5F6YbozfOf8w%40mail.gmail.com
Commit 4518c798 blocks signals for a short region of code, but it
assumed that whatever called it had the signal mask set to UnBlockSig on
entry.  That may be true today (or may even not be, in extensions in the
wild), but it would be better not to make that assumption.  We should
save-and-restore the caller's signal mask.

The PG_SETMASK() portability macro couldn't be used for that, which is
why it wasn't done before.  But... considering that commit a65e086
established back in 9.6 that supported POSIX systems have sigprocmask(),
and that this is POSIX-only code, there is no reason not to use standard
sigprocmask() directly to achieve that.

Back-patch to all supported releases, like 4518c798 and 80845b7c.

Reviewed-by: Alvaro Herrera <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGKx6Biq7_UuV0kn9DW%2B8QWcpJC1qwhizdtD9tN-fn0H0g%40mail.gmail.com
As AllocSetTransferAccounting() was called during MemoryContextSetParent().

Before update the memory accouting info, it asserts MemoryContextIsValid(),
which includes AllocSetContext, SlabContext and GenerationContext.

Currently, there is no mixed context situation and other context doesn't have
the accounting info, so I think it's ok to remove the comment.
The motivation for this is to ensure successful transmission of the
values of constants of regconfig and other reg* types.  The remote
will be reading them with search_path = 'pg_catalog', so schema
qualification is necessary when referencing objects in other schemas.

Per bug #17483 from Emmanuel Quincerot.  Back-patch to all supported
versions.  (There's some other stuff to do here, but it's less
back-patchable.)

Discussion: https://postgr.es/m/[email protected]
Currently with debug_appendonly_print_read_block set, we can run into
ths assertion failure as AppendOnlyStorageRead->segmentFileName isn't
populated until AppendOnlyStorageRead_FinishOpenFile() is called.

Unexpected internal error (assert.c:44)",
"FailedAssertion(""!(strvalue != ((void *)0))"

1    0x558983f311ea postgres errstart + 0x3cb
2    0x558983f2fe48 postgres ExceptionalCondition + 0x91
3    0x55898404fde1 postgres <symbol not found> + 0x8404fde1
4    0x55898404f5f6 postgres pg_vsnprintf + 0x8f
5    0x55898405b412 postgres pvsnprintf + 0x34
6    0x55898405cf47 postgres appendStringInfoVA + 0x8a
7    0x558983f32112 postgres errmsg_internal + 0x192
8    0x55898400f72a postgres <symbol not found> + 0x8400f72a
9    0x55898400fa5e postgres AppendOnlyStorageRead_TryOpenFile + 0xa7
...

So, use the passed in filePathName argument instead.
This fixes an ABI break introduced by
293f5c5f496cd8ce87c65b393613da675fc0bb8d.

Author: Markus Wanner <[email protected]>
Discussion: https://www.postgresql.org/message-id/[email protected]
Multiple non-exclusive backups are able to be run conrrently in different
sessions. But, in the same session, only one non-exclusive backup can be
run at the same moment. If pg_backup_start (pg_start_backup in v14 or before)
is called in the middle of another non-exclusive backup in the same session,
an error is thrown.

However, previously, in logical replication walsender mode, even if that
walsender session had already called pg_backup_start and started
a non-exclusive backup, it could execute BASE_BACKUP command and
start another non-exclusive backup. Which caused subsequent pg_backup_stop
to throw an error because BASE_BACKUP unexpectedly reset the session state
marked by pg_backup_start.

This commit prevents BASE_BACKUP command in the middle of another
non-exclusive backup in the same session.

Back-patch to all supported branches.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/[email protected]
When a non-exclusive backup is canceled, do_pg_abort_backup() is called
and resets some variables set by pg_backup_start (pg_start_backup in v14
or before). But previously it forgot to reset the session state indicating
whether a non-exclusive backup is in progress or not in this session.

This issue could cause an assertion failure when the session running
BASE_BACKUP is terminated after it executed pg_backup_start and
pg_backup_stop (pg_stop_backup in v14 or before). Also it could cause
a segmentation fault when pg_backup_stop is called after BASE_BACKUP
in the same session is canceled.

This commit fixes the issue by making do_pg_abort_backup reset
that session state.

Back-patch to all supported branches.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/[email protected]
DevChattopadhyay and others added 16 commits December 24, 2024 09:45
…ase (#13735)

* Reset GV.retcode before running test for each database using gpcheckcat

Previously while running gpcheckcat, if the check fails on one database then the value of GV.retcode was not reset before running the checks on the next database in the list.Due to this a check failed on one database was causing a AssertionError for the next database in the list which passes. With this PR added a change for resetting the value of GV.retcode after each database check.
If palloc() failed in ResWaitOnLock due to OOM, it will trigger
a PANIC "Waiting on lock already held" because partitionLock is
not released correctly.

The first attempt is to add a try/catch surrounding for this
palloc() call, and release the partitionLock before rethrowing
error. But it is not good as ResIncrementRemove is not called.

So use local char new_status[160], length 160 is enough.

----

Steps to reproduce:

Add a fault at the palloc() position:
    SIMPLE_FAULT_INJECTOR("reslock_acquire_before_wait");

```sql
-- superuser:

-- test user session 1:
> SELECT pg_sleep(120);

-- test user session 2:
> SELECT version();
PANIC:  Waiting on lock already held! (lwlock.c:668)
server closed the connection unexpectedly
```
We have a few commands that "can't run in a transaction block",
meaning that if they complete their processing but then we fail
to COMMIT, we'll be left with inconsistent on-disk state.
However, the existing defenses for this are only watertight for
simple query protocol.  In extended protocol, we didn't commit
until receiving a Sync message.  Since the client is allowed to
issue another command instead of Sync, we're in trouble if that
command fails or is an explicit ROLLBACK.  In any case, sitting
in an inconsistent state while waiting for a client message
that might not come seems pretty risky.

This case wasn't reachable via libpq before we introduced pipeline
mode, but it's always been an intended aspect of extended query
protocol, and likely there are other clients that could reach it
before.

To fix, set a flag in PreventInTransactionBlock that tells
exec_execute_message to force an immediate commit.  This seems
to be the approach that does least damage to existing working
cases while still preventing the undesirable outcomes.

While here, add some documentation to protocol.sgml that explicitly
says how to use pipelining.  That's latent in the existing docs if
you know what to look for, but it's better to spell it out; and it
provides a place to document this new behavior.

Per bug #17434 from Yugo Nagata.  It's been wrong for ages,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/[email protected]
A RowExpr with more than MaxTupleAttributeNumber columns would fail at
execution anyway, since we cannot form a tuple datum with more than that
many columns.  While heap_form_tuple() has a check for too many columns,
it emerges that there are some intermediate bits of code that don't
check and can be driven to failure with sufficiently many columns.
Checking this at parse time seems like the most appropriate place to
install a defense, since we already check SELECT list length there.

While at it, make the SELECT-list-length error use the same errcode
(TOO_MANY_COLUMNS) as heap_form_tuple does, rather than the generic
PROGRAM_LIMIT_EXCEEDED.

Per bug #17561 from Egor Chindyaskin.  The given test case crashes
in all supported branches (and probably a lot further back),
so patch all.

Discussion: https://postgr.es/m/[email protected]
gp_dist_wait_status used to destroy a gang after
execution, which led to a transaction failing.
This fix deletes related code lines, so that the
transaction won't fail.
errno was not reported correctly after attempting to clone a file,
leading to incorrect error reports.  While scanning through the code, I
have not noticed any similar mistakes.

Error introduced in 3a769d8.

Author: Justin Pryzby
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 12
I thought commit fd96d14d9 had plugged all the holes of this sort,
but no, function RTEs could produce oversize tuples too, either
via long coldeflists or just from multiple functions in one RTE.
(I'm pretty sure the other variants of base RTEs aren't a problem,
because they ultimately refer to either a table or a sub-SELECT,
whose widths are enforced elsewhere.  But we explicitly allow join
RTEs to be overwidth, as long as you don't try to form their
tuple result.)

Per further discussion of bug #17561.  As before, patch all branches.

Discussion: https://postgr.es/m/[email protected]
We've heard a couple of reports of people having trouble with
multi-gigabyte-sized query-texts files.  It occurred to me that on
32-bit platforms, there could be an issue with integer overflow
of calculations associated with the total query text size.
Address that with several changes:

1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX.
The hashtable code will bound it to that anyway unless "long"
is 64 bits.  We still need overflow guards on its use, but
this helps.

2. Add a check to prevent extending the query-texts file to
more than MaxAllocHugeSize.  If it got that big, qtext_load_file
would certainly fail, so there's not much point in allowing it.
Without this, we'd need to consider whether extent, query_offset,
and related variables shouldn't be off_t not size_t.

3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit
arithmetic on all platforms.  It appears possible that under duress
those multiplications could overflow 32 bits, yielding a false
conclusion that we need to garbage-collect the texts file, which
could lead to repeatedly garbage-collecting after every hash table
insertion.

Per report from Bruno da Silva.  I'm not convinced that these
issues fully explain his problem; there may be some other bug that's
contributing to the query-texts file becoming so large in the first
place.  But it did get that big, so apache#2 is a reasonable defense,
and apache#3 could explain the reported performance difficulties.

(See also commit 8bbe4cb, which addressed some related bugs.
The second Discussion: link is the thread that led up to that.)

This issue is old, and is primarily a problem for old platforms,
so back-patch.

Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
The sto_using_cursor and sto_using_select tests were coded to exercise
every permutation of their test steps, but AFAICS there is no value in
exercising more than one.  This matters because each permutation costs
about six seconds, thanks to the "pg_sleep(6)".  Perhaps we could
reduce that, but the useless permutations seem worth getting rid of
in any case.  (Note that sto_using_hash_index got it right already.)

While here, clean up some other sloppiness such as an unused table.

This doesn't make too much difference in interactive testing, since the
wasted time is typically masked by parallelization with other tests.
However, the buildfarm runs this as a serial step, which means we can
expect to shave ~40 seconds from every buildfarm run.  That makes it
worth back-patching.

Discussion: https://postgr.es/m/[email protected]
SELECT or other read only operations can result in setting hintbits on
pages, which in turn results in generating Full Page Images (FPI) WAL
records. These operations are not throttled currently and hence can
generate huge wal traffic on primary without waiting for mirror. This
causes concurrent transactions to suffer from replication lag.

One of scenario this was seen in production was during SELECT post
restore of large heap table from backup. Ideally, these situations
should be avoided by performing vacuum on such tables or via some
other mechanisms. Though there is still possibility the best practice
guidance is not followed and hence to still avoid the situation better
to have throttling in writing the FPI code as well.

Co-authored-by: Divyesh Vanjare <[email protected]>
Ordinarily the functions called in this loop ought to have plenty
of CFIs themselves; but we've now seen a case where no such CFI is
reached, making the loop uninterruptible.  Even though that's from
a recently-introduced bug, it seems prudent to install a CFI at
the loop level in all branches.

Per discussion of bug #17558 from Andrew Kesper (an actual fix for
that bug will follow).

Discussion: https://postgr.es/m/[email protected]
according to the comment, when `GetAllAOCSFileSegInfo_pg_aocsseg_rel`
was called from `gp_aoseg_history` after an upgrade, the `state` and
`modcount` might not be set, but no `formatversion`, the if conditon
should always be true, so just remove it to make the code more clean.

Signed-off-by: Junwang Zhao <[email protected]>
Ensure reorganize keep default compresstype, compresslevel and blocksize table options.

This is need for use for append-optimized column-oriented tables
with compresstype, compresslevel and blocksize options
when new column without compresstype, compresslevel and blocksize options
is added after table reorganize to inherit these options from table
I think the `extern` keyword tells the compiler that the function
is defined somewhere else, since here for `RestoreOidAssignments`
it is a definition, `extern` is useless, so remove it.

Signed-off-by: Junwang Zhao <[email protected]>
Previously, if an extension script did CREATE OR REPLACE and there was
an existing object not belonging to the extension, it would overwrite
the object and adopt it into the extension.  This is problematic, first
because the overwrite is probably unintentional, and second because we
didn't change the object's ownership.  Thus a hostile user could create
an object in advance of an expected CREATE EXTENSION command, and would
then have ownership rights on an extension object, which could be
modified for trojan-horse-type attacks.

Hence, forbid CREATE OR REPLACE of an existing object unless it already
belongs to the extension.  (Note that we've always forbidden replacing
an object that belongs to some other extension; only the behavior for
previously-free-standing objects changes here.)

For the same reason, also fail CREATE IF NOT EXISTS when there is
an existing object that doesn't belong to the extension.

Our thanks to Sven Klemm for reporting this problem.

Security: CVE-2022-2625
Per buildfarm, the output order of \dx+ isn't consistent across
locales.  Apply NO_LOCALE to force C locale.  There might be a
more localized way, but I'm not seeing it offhand, and anyway
there is nothing in this test module that particularly cares
about locales.

Security: CVE-2022-2625
@my-ship-it my-ship-it merged commit b7c0660 into apache:main Dec 24, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts
Projects
None yet
Development

Successfully merging this pull request may close these issues.