Cherry pick "Derive Combined Hashed Spec For Outer Joins" #804

jiaqizho · 2024-12-20T06:31:55Z

Fixes #ISSUE_Number

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

Issue: Outer join operations enforce unnecessary data redistribution, causing ORCA plan execution to be much longer than planner execution. Root cause: Unlike inner join operators, outer join operators only derive hashed distribution spec from one out of the two relations. Children nodes not delivering all the distribution properties led to parent nodes enforcing unnecessary data redistribution. Solution: To mimic inner join distribution spec derivation, derive combined hashed spec for outer join operations from both relations. Eg. 10-relation outer join delivers a combined hashed spec with 10 (including itself) equivalent specs. Implementation: 1. [CPhysicalLeftOuterHashJoin] -- Override PdsDerive (distribution spec derivation) in CPhysicalJoin. Add a case where both outer and inner relations are hash distributed. Return combined distribution spec if both outer and inner relations are hash distributed. Since NULLs are only added to unmatched rows, set Nullscolocated to false for all equivalent distribution specs of the inner relation. 2. [CPhysicalHashJoin] -- Set Nullscolocated to false when requesting or matching the hash distributed spec. 3. [CDistributionSpecHashed] -- Rewrite Combine function for hash distribution spec with linked list concatenation. 4. [CDistributionSpecHashed] -- Rewrite Copy function with recursion to ensure deep copy. 5. [CDistributionSpecHashed] -- Add Copy function to allow fNullsColocated configuration 6. [CDistributionSpecHashed] -- Enforce nulls colocation for hash redistribution. This is necessary when the non-null hash distribution request is not met. 7. [CPhysicalFullMergeJoin] -- Fix PdsDerive (distribution spec derivation). In full joins, both tables are outer tables. The join output is hash distributed by non-NULL join keys. 8. [CDistributionSpecTest] -- Add function test for hash spec combinination and copy. 9. [regress] -- Update regression test output. Verified plan equivalency. 10. [minidump] -- MDP plan shape update: LOJNonNullRejectingPredicates, LOJReorderWithSimplePredicate, Remove-Distinct-From-Subquery. The rest are SpaceSize and scan order changes. Add LeftJoinNullsNotColocated. Co-authored-by: Jingyu Wang <[email protected]>

Following scenario led to crash due to missing statistics: ```sql CREATE TABLE t1 (c11 varchar, c12 numeric(15,4)); CREATE TABLE t2 (c2 varchar); CREATE TABLE t3 (c3 varchar); SET allow_system_table_mods=true; UPDATE pg_class SET relpages = 97399::int, reltuples = 9106730.0::real, relallvisible = 0::int WHERE relname = 't1'; UPDATE pg_class SET relpages = 68553::int, reltuples = 7054520.0::real, relallvisible = 0::int WHERE relname = 't2'; SET optimizer_join_order=exhaustive; SELECT (SELECT c11 FROM t1) AS column1, (SELECT sum(c12) FROM t1 INNER JOIN t2 ON c11 = c2 INNER JOIN t3 ON c2 = c3 INNER JOIN t3 a1 ON a1.c3 = a2.c3 LEFT OUTER JOIN t3 a3 ON a1.c3 = a3.c3 LEFT OUTER JOIN t3 a4 ON a1.c3 = a4.c3 ) AS column2 FROM t3 a2; ``` Underlying cause is due to the fact that derive and reset for group stats was not symmetric. In the case of "exhaustive", multiple xforms may be run on the same group that derive stats before applying and reset stats afterward. Prior to this commit it was possible to have a group with "dirty" stats where child nodes may have been cleaned up, but the group still tecnically has stats object. If it is a duplicate of another group then it was possible to "trick" the other group into believing that the stats were already derived. That fake news could lead to a crash. Co-authored-by: Jingyu Wang <[email protected]>

During the optimisation of CTE’s for distributed replicated tables, Sequence operator optimize the first child with any distribution Requirement and compute the distribution request on the other children based on derived distribution of the first child. If distribution of first child is a Singleton, requests singleton on all children If distribution of first child is a Non-Singleton, requests Non-Singleton on all children, Here when the first child is a Replicated/TaintedReplicated, still we requests Non-Singleton, Hence optimiser adding redistribution motion on Top of second child, which is creating a wrong plan hence query is getting hung. So we are trying to request Non-singleton without enforcers when the first child is non-singleton, non-universal and Replicated/TaintedReplicated. Which can avoid adding redistribution motion on top of second child. Old plan: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice4; segments: 3) (cost=0.00..1293.00 rows=1 width=24) -> Sequence (cost=0.00..1293.00 rows=1 width=24) -> Shared Scan (share slice:id 4:0) (cost=0.00..431.00 rows=1 width=1) -> Materialize (cost=0.00..431.00 rows=1 width=1) -> WindowAgg (cost=0.00..431.00 rows=1 width=16) Partition By: testtable.name -> Sort (cost=0.00..431.00 rows=1 width=5) Sort Key: testtable.name -> Seq Scan on testtable (cost=0.00..431.00 rows=1 width=5) -> Redistribute Motion 1:3 (slice3) (cost=0.00..862.00 rows=1 width=24) -> Hash Left Join (cost=0.00..862.00 rows=1 width=24) Hash Cond: (“outer”.tblnm = pg_catalog.textin(unknownout(“outer”.tblnm), ‘’::void, (-1))) -> Result (cost=0.00..431.00 rows=1 width=8) -> Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 rows=1 width=1) -> Result (cost=0.00..431.00 rows=1 width=1) -> Shared Scan (share slice:id 1:0) (cost=0.00..431.00 rows=1 width=1) -> Hash (cost=431.00..431.00 rows=1 width=16) -> Result (cost=0.00..431.00 rows=1 width=16) -> Aggregate (cost=0.00..431.00 rows=1 width=8) -> Gather Motion 1:1 (slice2; segments: 1) (cost=0.00..431.00 rows=1 width=1) -> Result (cost=0.00..431.00 rows=1 width=1) -> Shared Scan (share slice:id 2:0) (cost=0.00..431.00 rows=1 width=1) Optimizer: Pivotal Optimizer (GPORCA) (23 rows) New Plan: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Sequence (cost=0.00..1293.00 rows=1 width=24) (actual time=1.120..1.120 rows=0 loops=1) -> Shared Scan (share slice:id 0:0) (cost=0.00..431.00 rows=1 width=1) (actual time=0.708..0.708 rows=0 loops=1) -> Materialize (cost=0.00..431.00 rows=1 width=1) (actual time=0.706..0.707 rows=0 loops=1) -> Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 rows=1 width=16) (actual time=0.697..0.697 rows=0 loops=1) -> WindowAgg (cost=0.00..431.00 rows=1 width=16) (never executed) Partition By: testtable.name -> Sort (cost=0.00..431.00 rows=1 width=10) (never executed) Sort Key: testtable.name Sort Method: quicksort Memory: 33kB -> Seq Scan on testtable (cost=0.00..431.00 rows=1 width=10) (never executed) -> Hash Left Join (cost=0.00..862.00 rows=1 width=24) (actual time=0.410..0.410 rows=0 loops=1) Hash Cond: (“outer”.tblnm = pg_catalog.textin(unknownout(“outer”.tblnm), ‘’::void, (-1))) Extra Text: Hash chain length 1.0 avg, 1 max, using 1 of 65536 buckets. -> Result (cost=0.00..431.00 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=1) -> Shared Scan (share slice:id 0:0) (cost=0.00..431.00 rows=1 width=1) (actual time=0.001..0.001 rows=0 loops=1) -> Hash (cost=431.00..431.00 rows=1 width=16) (actual time=0.014..0.014 rows=1 loops=1) Buckets: 65536 Batches: 1 Memory Usage: 1kB -> Result (cost=0.00..431.00 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=1) -> Aggregate (cost=0.00..431.00 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=1) -> Shared Scan (share slice:id 0:0) (cost=0.00..431.00 rows=1 width=1) (actual time=0.002..0.002 rows=0 loops=1) Optimizer: Pivotal Optimizer (GPORCA) Execution time: 1.800 ms Co-authored-by: Hari krishna Maddileti <[email protected]>

Commit 2d49b616fe updated memo group reset to include reset of a group's duplicate. Previously, group reset would recursively traverse only the children. However, by also traversing duplicates it became possible to form cyclic reset paths in the memo (e.g. a group's child is a duplicate of parent group). This can lead to a infinite reset loop. Admittedly, this patch should only be a temporary solution. Intent of the function FResetStats() is to only reset if logical operators were added to any group reachable from reset group. In order to do that we must first search the children before resetting ourselves. Ultimately, we need a way to properly detect cycles. Co-authored-by: Jingyu Wang <[email protected]>

Issue: Community reports regression that post-fc662ea plans had redundant redistribution motion in inner joins Root cause: Blanket change of Nulls Colocation to false in computing a matching hashed distribution spec Solution: In matching a hashed distribution spec in inner join operations, set Nulls Colocation to true; and in matching a hashed distribution spec in outer join operations, set Nulls Colocation to false. This reflects the Nulls Colocation property required for / delivered by the outer relation in hash join operations. Implementation: [CPhysicalHashJoin] -- Return Nulls Colocation in spec matching for inner joins, and Non Nulls Colocation for outer joins. [CPhysicalLeftOuterHashJoin] -- Add TODO comment. Left outer join should be able to return a combined hash spec even when only one relation is hash distributed. [minidump] -- Space size change only. Added user's example to verify inner join matches the outer relation's Nulls Colocation. Co-authored-by: Jingyu Wang <[email protected]>

Postgres commit 578b229 (from Postgres 12 merge) removed WITH OIDS support. That eliminated the "specialness" of oid columns which previously were not stored as a normal column, but as part of the tuple header. Now, in pg_class for example, it is a normal column. ORCA had a framework in place to handle this "specialness". During the Postgres 12 merge the framework was kept in place and hardcoded to false with a FIXME to remove later. This commit does that.

In Orca, we copy group statistics to avoid costly stats re-deriving. However, we unintentionally didn't copy the relpages, relallvisible, and rebinds fields. These fields are used in costing, and in some cases the wrong rebind value caused us to improperly cost a NLJ much lower and Orca selected a non-optimal plan.

AssertOp is used by ORCA for run-time assertion checking. For example, it guarantees that the following query will not violate implicit cardinality constraints (i.e. foo cannot contain more than 1 row): ``` CREATE TABLE foo(a int); CREATE TABLE bar(b int); SELECT * FROM foo WHERE (SELECT a FROM foo) IN (SELECT b FROM bar); ``` PLANNER handles that check in the executor subplan node where it can determine if the subquery is used in an expression sublink where it should only return 1 row. However, this is not sufficient for ORCA which may generate de-correlated plan that contains a join node instead of a subplan node. Postgres 12 merge commit 2e653c6e54b disabled this feature in ORCA so that implementation may be fixed at a later date. This commit does that.

* Revert "Derive Combined Hashed Spec For Outer Joins - Patch (#13899)" This reverts commit 512561fe9920df5be844a60926e612562d782d4a. * Revert "Derive Combined Hashed Spec For Outer Joins (#13714)" This reverts commit fc662eadf9d4fcbeecdb32d661deccba72c86f1a. * Rerun mdp, regress/with_clause Co-authored-by: Jingyu Wang <[email protected]>

It was observed that, in response to a DELETE based DML query wherein we know that the data resides on a particular segment, the commands should be issued by Orca to that segment only. However, it was found that the commands were issued to all the segment. This behavior was cross-checked with the Legacy Planner, it was found that Legacy Planner was sending the query to single segment only. To correct this behavior changes were done in the following files: 1. FILE NAME : CTranslatorExprToDXL.cpp In function CtranslatorExprToDXL::PdxlnDML , object for CDXLDirectDispatchInfo is created by calling GetDXLDirectDispatchInfo function. Before the update, in the GetDXLDirectDispatchInfo function, null pointer was returned for any DML command other than INSERT. Now, this condition has been changed to include DELETE command also, thus now for INSERT and DELETE, nullpointer shall not be returned and object for CDXLDirectDispatchInfo class shall be created. 2. FILE NAME :: CTranslatorDXLToPlStmt.cpp The above created object is checked while creating the planned statement for the query to enable the direct dispatch flag in the planned statment. Earlier this was allowed for INSERT command only, changes were done to enable this logic for DELETE command also.

After partitioning rework absorbed in Postgres 12 merge, this GUC became dead code as demonstrated in commit baad023.

This was already addressed in commit 49049ee67504, but missed this FIXME.

Commit 3ea20ad added function PdxlnBitmapIndexProbeForChildPart() and commit 2826c2098a50 removed usage of it. Rather than refactor away the "specialness" just delete it.

A function that gets system column name, type, and length from attno already exists in Postgres. Use that function and remove ORCA version.

Prior to this commit, the preprocessing for supported ordered-set agg would split the ordered-set agg into a NLJ between total_count and the CTEConsumer for the input table with a gp_percentile_* GbAgg on top. For skewed dataset, this wouldn't be as performant as the JOIN would return all the rows. This commit updates the code to split the ordered-set agg into a NLJ between CTEConsumer for deduplicated data and total_count on that CTE with a gp_percentile_* GbAgg on top. Since, we deduplicate the data, we also pass along the count of each distinct row as peer_count to gp_percentile_* agg. Below, is the preporcessed query: Input query: ``` +--CLogicalGbAgg( Global ) |--CLogicalGet "t" ("t") +--CScalarProjectList +--CScalarProjectElement "percentile_cont" +--CScalarAggFunc (percentile_cont , Distinct: false , Aggregate Stage: Global) |--CScalarValuesList | +--CScalarIdent "a" (0) |--CScalarValuesList | +--CScalarConst (0.250) |--CScalarValuesList | +--CScalarSortGroupClause(tleSortGroupRef:0,eqop:96,sortop:97,nulls_first:false,hashable:true) +--CScalarValuesList ``` Output preprocessed query: ``` Common Table Expressions: +--CLogicalCTEProducer (0) +--CLogicalGbAgg( Global ) Grp Cols: ["a" (10)] |--CLogicalGet "t" ("t") +--CScalarProjectList +--CScalarProjectElement "ColRef_0009" (11) +--CScalarAggFunc (count , Distinct: false , Aggregate Stage: Global) |--CScalarValuesList | +--CScalarIdent "a" (10) |--CScalarValuesList |--CScalarValuesList +--CScalarValuesList Algebrized preprocessed query: +--CLogicalCTEAnchor (0) +--CLogicalGbAgg( Global ) |--CLogicalLimit ( (97,1.0), "a" (0), NULLsLast ) global | |--CLogicalNAryJoin | | |--CLogicalCTEConsumer (0), Columns: ["a" (0), "ColRef_0009" (9)] | | |--CLogicalProject | | | |--CLogicalGbAgg( Global ) | | | | |--CLogicalCTEConsumer (0), Columns: ["a" (19), "ColRef_0009" (20)] | | | | +--CScalarProjectList | | | | +--CScalarProjectElement "ColRef_0035" (35) | | | | +--CScalarAggFunc (sum , Distinct: false , Aggregate Stage: Global) | | | | |--CScalarValuesList | | | | | +--CScalarIdent "ColRef_0009" (20) | | | | |--CScalarValuesList | | | | |--CScalarValuesList | | | | +--CScalarValuesList | | | +--CScalarProjectList | | | +--CScalarProjectElement "ColRef_0036" (36) | | | +--CScalarFunc (int8) | | | +--CScalarIdent "ColRef_0035" (35) | | +--CScalarConst (1) | |--CScalarConst (0) | +--CScalarConst (null) +--CScalarProjectList +--CScalarProjectElement "percentile_disc" (8) +--CScalarAggFunc (percentile_disc , Distinct: false , Aggregate Stage: Global) |--CScalarValuesList | |--CScalarIdent "a" (0) | |--CScalarConst (0.250) | |--CScalarIdent "ColRef_0036" (36) | +--CScalarIdent "ColRef_0009" (9) |--CScalarValuesList |--CScalarValuesList +--CScalarValuesList ``` This also inclodes, updating C function for gp_percentile Since we pass the peer_count value along with the total_count, we need to consider it while calculating percentile values. This commit also updates the transition functions to non-strict as for strict transition function `advance_aggregates()` calls `ExecInterpExpr()` which initializes the transition value for the first row in the group as part of `EEOP_AGG_INIT_TRANS` step. This results in calling the transition function from second row with first row passed in as the previos state value. This worked fine previously as we read all rows and the peer_count would always be 1, but now since we need to read the information for peer_count for each row, initializing the first row doesn't work. Since, the transition function isn't strict anymore, we explicitly handle for NULL inputs.

… Orca (#13873) Previously, Orca disallowed all aggregate functions from being executed on replicated slices. This meant that the results were broadcasted or gathered on a single segment to ensure consistency and correct results. This is necessary because some functions such as array_agg and custom user-created functions are sensitive to the order of data. This can cause wrong results in some cases. However, many functions, especially commonly used ones such as sum, avg, count, min, and max, are not sensitive to the order of data and can be safely executed. We now make an exception for these common cases, currently the above agg functions on ints and count(*). See https://github.com/greenplum-db/gpdb/pull/10978 for previous discussion.

ORCA commit f8990fb enables more datatypes in constraint evaluation. However, it also exposed an issue in preprocessor step PexprInferPredicates() which can cause ORCA to produce a plan with duplicate casted predicates. This commit fixes the issue by deduplicating cast equality predicates. Example: Date-TimeStamp-HashJoin.mdp

Prior to this commit, partition propogation spec stored a partition info list in an array that needed to stay sorted. It needed to stay sorted in order to compare for equality against another partition propogation spec where order of the stored partition info list is relevant. By using an array implementation we pay a cost of O(N log N X N) to insert data that is sorted on each insert and at best O(log N)) to find using binary search. By contrast a hash map implementation costs O(N) to build and O(1) to find. This commit store partition info list in a hash map.

* Remove FIXME for signature change As there are other functions that also take in Node as an input, its better to keep it as is instead of changing the signature and breaking something not captured as ICW. * Remove dead code This commit was to address the FIXME that removes CountLeafPartTables(). On looking at the call hierarchy CountLeafPartTables() was called by RetrieveNumChildPartitions() -> GenerateStatsForSystemCols() -> RetrieveColStats() for attno < 0 (system columns). Since we do not extract/use the stats on system columns, the entire call stack is not used code. This commit removes this part of the code altogether. It also previously called RelPartIsNone() call to which was removed, thus removing this function too. This commit also removes FIXME for collation.

Postgres commit fc22b66 implemented SQL-standard feature for generated columns. This was turned off in ORCA during the merge. Afer this commit the following SQL works as expected: ```sql CREATE TABLE t_gencol(a int, b int GENERATED ALWAYS AS (a * 2) stored); EXPLAIN ANALYZE INSERT INTO t_gencol (a) VALUES (1), (2); SELECT * FROM t_gencol; a | b ---+--- 1 | 2 2 | 4 (2 rows) ```

There were a lot of asserts on NULL != target_list in the translator, but most of them were unnecessary. Fix ORCA to handle empty target list. - Add trace fallback to union testcase - Fix up CXformDifference2LeftAntiSemiJoin to handle case of empty columns Following SQL works ``` EXPLAIN (COSTS OFF) SELECT UNION SELECT; ```

Issue: Outer join operations enforce unnecessary data redistribution, causing ORCA plan execution to be much longer than planner execution. Root cause: Unlike inner join operators, outer join operators only derive hashed distribution spec from one out of the two relations. Children nodes not delivering all the distribution properties led to parent nodes enforcing unnecessary data redistribution. Solution: To mimic inner join distribution spec derivation, derive combined hashed spec for outer join operations from both relations. Eg. 10-relation outer join delivers a combined hashed spec with 10 (including itself) equivalent specs. Implementation: 1. [CPhysicalLeftOuterHashJoin] -- Override PdsDerive (distribution spec derivation) in CPhysicalJoin. Add a case where both outer and inner relations are hash distributed. Return combined distribution spec if both outer and inner relations are hash distributed. Since NULLs are only added to unmatched rows, set Nullscolocated to false for all equivalent distribution specs of the inner relation. 2. [CPhysicalHashJoin] -- In matching a hashed distribution spec in inner join operations, set Nulls Colocation to true. In matching a hashed distribution spec in outer join operations, set Nulls Colocation to false only if the join condition isn't null-aware. This reflects the Nulls Colocation property required for / delivered by the outer relation in hash join operations. 3. [CDistributionSpecHashed] -- (1) Rewrite Combine function for hash distribution spec with linked list concatenation. (2) Rewrite Copy function with recursion to ensure deep copy. (3) Add Copy function to allow fNullsColocated configuration. (4) Enforce nulls colocation for hash redistribution. This is necessary when the non-null hash distribution request is not met. (5) Make ComputeEquivHashExprs recursive to compute hash expression for all equivalent hashed specs. (6) Expose FMatchHashedDistribution to public. 4. [CPhysicalFullMergeJoin] -- Fix PdsDerive (distribution spec derivation). In full joins, both tables are outer tables. The join output is hash distributed by non-NULL join keys. 5. [CPhysical*Join] -- Add is_null_aware member to all the classes using the AddHashOrMergeJoinAlternative template. If the join is null-aware, nulls colocation has to be set true in deriving/requesting hash distribution specs. If the join isn't null-aware, nulls colocation can be set false. 6. [CXformUtils] -- Check if the join condition is composed of equality predicates only. The output is passed to AddHashOrMergeJoinAlternative for determination of join condition null-awareness. 7. [CDistributionSpecTest] -- Add function test for hash spec combination and copy. Replace GPOS_ASSERT with GPOS_RTL_ASSERT 8. [regress] -- Test hashed distribution spec derivation and motion enforcement in outer join with INDF join condition 9. [minidump] -- MDP plan shape update: LOJNonNullRejectingPredicates, LOJReorderWithSimplePredicate, Remove -Distinct-From-Subquery. The rest are SpaceSize and scan order changes. Add LeftJoinNullsNotColocated. Added user examples to verify inner join matches the outer relation's Nulls Colocation. Co-authored-by: Jingyu Wang <[email protected]>

GPDB have the same issue. The current issue was introduced by "Update ordered-set agg preprocess step for skew"(GPDP commit 5280297) This is because a not null DATUM(0) is returned during the call function to gp_percentile_disc_transition.

- Fixed GPDB incorrect results in bfv_join.sql - Fixed some plan diff

jiaqizho changed the title ~~Cherry pick "Derive Combined Hashed Spec For Outer Joins"~~ [DNM]Cherry pick "Derive Combined Hashed Spec For Outer Joins" Dec 20, 2024

my-ship-it requested review from fanfuxiaoran and my-ship-it December 20, 2024 06:43

my-ship-it added the cherry-pick cherry-pick upstream commts label Dec 20, 2024

jiaqizho force-pushed the cherry-pick-orca-in-path-order-2 branch 3 times, most recently from 6cf4434 to 005dcae Compare December 26, 2024 05:51

jiaqizho changed the title ~~[DNM]Cherry pick "Derive Combined Hashed Spec For Outer Joins"~~ Cherry pick "Derive Combined Hashed Spec For Outer Joins" Dec 26, 2024

jiaqizho force-pushed the cherry-pick-orca-in-path-order-2 branch 4 times, most recently from 5d30d7c to e74f9fc Compare December 30, 2024 08:39

avamingli previously approved these changes Dec 30, 2024

View reviewed changes

jiaqizho mentioned this pull request Dec 30, 2024

FIX: re-enable the NL-index in ORCA and fix the Join2IndexApplyGeneric #807

Merged

12 tasks

jiaqizho dismissed avamingli’s stale review via 424acc8 December 30, 2024 09:23

jiaqizho force-pushed the cherry-pick-orca-in-path-order-2 branch from e74f9fc to 424acc8 Compare December 30, 2024 09:23

THANATOSLAVA and others added 14 commits December 30, 2024 17:30

Convert ORCA pipelines to use Vault variables (#13907)

805111b

Remove unused GUC optimizer_enable_partial_index

7d80fc6

After partitioning rework absorbed in Postgres 12 merge, this GUC became dead code as demonstrated in commit baad023.

Remove FIXME label to gut has_oids

ef1da56

This was already addressed in commit 49049ee67504, but missed this FIXME.

Remove unused function PdxlnBitmapIndexPathForChildPart()

8e04ff8

Commit 3ea20ad added function PdxlnBitmapIndexProbeForChildPart() and commit 2826c2098a50 removed usage of it. Rather than refactor away the "specialness" just delete it.

dgkimura and others added 12 commits December 30, 2024 17:30

Remove ORCA specific duplicate code

fee376c

A function that gets system column name, type, and length from attno already exists in Postgres. Use that function and remove ORCA version.

Add HashValue function to CPartitionPropagationSpec

b75dec0

Add exception using unsupported default comparator evaulator types

ceb8c91

jiaqizho force-pushed the cherry-pick-orca-in-path-order-2 branch from 424acc8 to 40aa09e Compare December 30, 2024 09:31

Fix icw test "Derive Combined Hashed Spec For Outer Joins"

acfc6c5

- Fixed GPDB incorrect results in bfv_join.sql - Fixed some plan diff

jiaqizho force-pushed the cherry-pick-orca-in-path-order-2 branch from 40aa09e to acfc6c5 Compare December 30, 2024 10:24

jiaqizho requested a review from avamingli December 31, 2024 01:43

avamingli approved these changes Dec 31, 2024

View reviewed changes

my-ship-it approved these changes Dec 31, 2024

View reviewed changes

my-ship-it merged commit 7f919d8 into apache:main Dec 31, 2024
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry pick "Derive Combined Hashed Spec For Outer Joins" #804

Cherry pick "Derive Combined Hashed Spec For Outer Joins" #804

jiaqizho commented Dec 20, 2024

Cherry pick "Derive Combined Hashed Spec For Outer Joins" #804

Cherry pick "Derive Combined Hashed Spec For Outer Joins" #804

Conversation

jiaqizho commented Dec 20, 2024

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions