Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: re-enable the NL-index in ORCA and fix the Join2IndexApplyGeneric #807

Merged
merged 1 commit into from
Dec 30, 2024

Conversation

jiaqizho
Copy link
Contributor

@jiaqizho jiaqizho commented Dec 22, 2024

Fixes #567

What does this PR do?

The CXformJoin2IndexApplyGeneric xfrom will create the CPhysicalInnerHashJoin in ROOT path.

But when DynamicGet in a child node, it doesn't go through any checks, which is incorrect.

If the components of the current relationship are inconsistent with the group by key, the logical transformation will be invalid. After the current logical conversion success, the enforce phase will not be required to process the partial key by default.

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@jiaqizho jiaqizho changed the title [DNM]FIX: re-enable the NL-index in ORCA and fix the Join2IndexApplyG… [DNM]FIX: re-enable the NL-index in ORCA and fix the Join2IndexApplyGeneric Dec 23, 2024
@fanfuxiaoran fanfuxiaoran self-requested a review December 24, 2024 02:41
@jiaqizho jiaqizho force-pushed the fix-orc-nl-index branch 3 times, most recently from 0dc70ff to 2e64e45 Compare December 24, 2024 10:11
@jiaqizho jiaqizho changed the title [DNM]FIX: re-enable the NL-index in ORCA and fix the Join2IndexApplyGeneric FIX: re-enable the NL-index in ORCA and fix the Join2IndexApplyGeneric Dec 24, 2024
my-ship-it
my-ship-it previously approved these changes Dec 26, 2024
@fanfuxiaoran
Copy link
Contributor

explain SELECT
*
FROM(
SELECT
tradingday,
1 AS ins_SpanInsArbitrageRatio FROM
t_clientinstrumentind2 t WHERE
t.tradingday BETWEEN '20190715'AND'20190715' GROUP BY
t.tradingday
)t1
INNER JOIN
(
SELECT
t.tradingday,
0.9233716475 AS prod_SpanInsArbitrageRatio FROM
t_clientproductind2 t WHERE
t.tradingday BETWEEN'20190715'AND '20190715' GROUP BY
t.tradingday
)t2 ON t1.tradingday = t2.tradingday;

                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=93.49..100.01 rows=240 width=102)
   ->  Hash Join  (cost=93.49..96.81 rows=80 width=102)
         Hash Cond: (t.tradingday = (t_1.tradingday)::text)
         ->  HashAggregate  (cost=45.71..46.82 rows=111 width=36)
               Group Key: t.tradingday
               ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=7.53..45.43 rows=111 width=32)
                     Hash Key: t.tradingday
                     ->  Bitmap Heap Scan on t_clientinstrumentind_2_prt_p2019 t  (cost=7.53..43.20 rows=111 width=32)
                           Recheck Cond: ((tradingday >= '20190715'::text) AND (tradingday <= '20190715'::text))
                           ->  Bitmap Index Scan on t_clientinstrumentind_2_prt_p2019_pkey  (cost=0.00..7.50 rows=111 width=0)
                                 Index Cond: ((tradingday >= '20190715'::text) AND (tradingday <= '20190715'::text))
         ->  Hash  (cost=46.78..46.78 rows=80 width=66)
               ->  Redistribute Motion 3:3  (slice3; segments: 3)  (cost=43.58..46.78 rows=80 width=66)
                     Hash Key: t_1.tradingday
                     ->  HashAggregate  (cost=43.58..44.38 rows=80 width=66)
                           Group Key: t_1.tradingday
                           ->  Redistribute Motion 3:3  (slice4; segments: 3)  (cost=6.58..43.38 rows=80 width=34)
                                 Hash Key: t_1.tradingday
                                 ->  Bitmap Heap Scan on t_clientproductind_2_prt_p2019 t_1  (cost=6.58..41.78 rows=80 width=34)
                                       Recheck Cond: (((tradingday)::text >= '20190715'::text) AND ((tradingday)::text <= '20190715'::text))
                                       ->  Bitmap Index Scan on t_clientproductind_2_prt_p2019_pkey  (cost=0.00..6.56 rows=80 width=0)
                                             Index Cond: (((tradingday)::text >= '20190715'::text) AND ((tradingday)::text <= '20190715'::text))
 Optimizer: Postgres query optimizer

checked the new plan, there are 2 Redistribute Motion in the plan before join, and the hash key is same. If I understood correctly, only one Redistribute Motion is enough.

@jiaqizho
Copy link
Contributor Author

explain SELECT
*
FROM(
SELECT
tradingday,
1 AS ins_SpanInsArbitrageRatio FROM
t_clientinstrumentind2 t WHERE
t.tradingday BETWEEN '20190715'AND'20190715' GROUP BY
t.tradingday
)t1
INNER JOIN
(
SELECT
t.tradingday,
0.9233716475 AS prod_SpanInsArbitrageRatio FROM
t_clientproductind2 t WHERE
t.tradingday BETWEEN'20190715'AND '20190715' GROUP BY
t.tradingday
)t2 ON t1.tradingday = t2.tradingday;

                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=93.49..100.01 rows=240 width=102)
   ->  Hash Join  (cost=93.49..96.81 rows=80 width=102)
         Hash Cond: (t.tradingday = (t_1.tradingday)::text)
         ->  HashAggregate  (cost=45.71..46.82 rows=111 width=36)
               Group Key: t.tradingday
               ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=7.53..45.43 rows=111 width=32)
                     Hash Key: t.tradingday
                     ->  Bitmap Heap Scan on t_clientinstrumentind_2_prt_p2019 t  (cost=7.53..43.20 rows=111 width=32)
                           Recheck Cond: ((tradingday >= '20190715'::text) AND (tradingday <= '20190715'::text))
                           ->  Bitmap Index Scan on t_clientinstrumentind_2_prt_p2019_pkey  (cost=0.00..7.50 rows=111 width=0)
                                 Index Cond: ((tradingday >= '20190715'::text) AND (tradingday <= '20190715'::text))
         ->  Hash  (cost=46.78..46.78 rows=80 width=66)
               ->  Redistribute Motion 3:3  (slice3; segments: 3)  (cost=43.58..46.78 rows=80 width=66)
                     Hash Key: t_1.tradingday
                     ->  HashAggregate  (cost=43.58..44.38 rows=80 width=66)
                           Group Key: t_1.tradingday
                           ->  Redistribute Motion 3:3  (slice4; segments: 3)  (cost=6.58..43.38 rows=80 width=34)
                                 Hash Key: t_1.tradingday
                                 ->  Bitmap Heap Scan on t_clientproductind_2_prt_p2019 t_1  (cost=6.58..41.78 rows=80 width=34)
                                       Recheck Cond: (((tradingday)::text >= '20190715'::text) AND ((tradingday)::text <= '20190715'::text))
                                       ->  Bitmap Index Scan on t_clientproductind_2_prt_p2019_pkey  (cost=0.00..6.56 rows=80 width=0)
                                             Index Cond: (((tradingday)::text >= '20190715'::text) AND ((tradingday)::text <= '20190715'::text))
 Optimizer: Postgres query optimizer

checked the new plan, there are 2 Redistribute Motion in the plan before join, and the hash key is same. If I understood correctly, only one Redistribute Motion is enough.

That's PG plan, ORCA still will fallback in this case(same as no index in this case), But the fallback reason is not we disable the NL-index.

@fanfuxiaoran
Copy link
Contributor

explain SELECT
*
FROM(
SELECT
tradingday,
1 AS ins_SpanInsArbitrageRatio FROM
t_clientinstrumentind2 t WHERE
t.tradingday BETWEEN '20190715'AND'20190715' GROUP BY
t.tradingday
)t1
INNER JOIN
(
SELECT
t.tradingday,
0.9233716475 AS prod_SpanInsArbitrageRatio FROM
t_clientproductind2 t WHERE
t.tradingday BETWEEN'20190715'AND '20190715' GROUP BY
t.tradingday
)t2 ON t1.tradingday = t2.tradingday;

                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=93.49..100.01 rows=240 width=102)
   ->  Hash Join  (cost=93.49..96.81 rows=80 width=102)
         Hash Cond: (t.tradingday = (t_1.tradingday)::text)
         ->  HashAggregate  (cost=45.71..46.82 rows=111 width=36)
               Group Key: t.tradingday
               ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=7.53..45.43 rows=111 width=32)
                     Hash Key: t.tradingday
                     ->  Bitmap Heap Scan on t_clientinstrumentind_2_prt_p2019 t  (cost=7.53..43.20 rows=111 width=32)
                           Recheck Cond: ((tradingday >= '20190715'::text) AND (tradingday <= '20190715'::text))
                           ->  Bitmap Index Scan on t_clientinstrumentind_2_prt_p2019_pkey  (cost=0.00..7.50 rows=111 width=0)
                                 Index Cond: ((tradingday >= '20190715'::text) AND (tradingday <= '20190715'::text))
         ->  Hash  (cost=46.78..46.78 rows=80 width=66)
               ->  Redistribute Motion 3:3  (slice3; segments: 3)  (cost=43.58..46.78 rows=80 width=66)
                     Hash Key: t_1.tradingday
                     ->  HashAggregate  (cost=43.58..44.38 rows=80 width=66)
                           Group Key: t_1.tradingday
                           ->  Redistribute Motion 3:3  (slice4; segments: 3)  (cost=6.58..43.38 rows=80 width=34)
                                 Hash Key: t_1.tradingday
                                 ->  Bitmap Heap Scan on t_clientproductind_2_prt_p2019 t_1  (cost=6.58..41.78 rows=80 width=34)
                                       Recheck Cond: (((tradingday)::text >= '20190715'::text) AND ((tradingday)::text <= '20190715'::text))
                                       ->  Bitmap Index Scan on t_clientproductind_2_prt_p2019_pkey  (cost=0.00..6.56 rows=80 width=0)
                                             Index Cond: (((tradingday)::text >= '20190715'::text) AND ((tradingday)::text <= '20190715'::text))
 Optimizer: Postgres query optimizer

checked the new plan, there are 2 Redistribute Motion in the plan before join, and the hash key is same. If I understood correctly, only one Redistribute Motion is enough.

That's PG plan, ORCA still will fallback in this case(same as no index in this case), But the fallback reason is not we disable the NL-index.

You are right. We can talk about the orca failure about this query in another thread.

@fanfuxiaoran
Copy link
Contributor

LGTM!

@my-ship-it my-ship-it merged commit 0faf8e1 into apache:main Dec 30, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] [ORCA] Wrong plan for index nestloop join
4 participants