[Enhancement] add partition scan num limit when query internal olap table #53747

MatthewH00 · 2024-12-10T05:13:16Z

Why I'm doing:

when query big size internal olap table with full table scan or scan too many partitions, would cause BE/CN node high load, lead to cluster instability.

What I'm doing:

add a new FE session variable scan_olap_partition_num_limit to limit partition scan num when query internal olap table.
(default value is 0, means no limitation)

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

Signed-off-by: MatthewH00 <[email protected]>

Signed-off-by: hmx <[email protected]>

MatthewH00 · 2024-12-11T01:39:23Z

@kevincai Hi Could you please review the pr when have free time?
the pr add a fe session variable to limit partition scan num, avoid cluster instability caused by full table scan or scan too many partitions when query big size internal olap table.

kevincai

should add UT to enforce the behavior of the session variable. Sqltester is an add-on testing in case it is too complicated to cover the code path in UT.

kevincai · 2024-12-11T01:53:24Z

fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rewrite/OptOlapPartitionPruner.java

+            LOG.warn("fail to get variable scan_olap_partition_num_limit, set default value 0, msg: {}", e.getMessage());
+        }
+        if (scanOlapPartitionNumLimit > 0 && selectedPartitionNum > scanOlapPartitionNumLimit) {
+            String msg = "Exceeded the limit of " + scanOlapPartitionNumLimit + " max scan olap partitions. " +


Exceeded the limit of number of olap table partitions to be scanned. Number of partitions allowed: {}, number of partitions to be scanned: {}. Please adjust the SQL or change the limit ...

kevincai · 2024-12-11T01:55:05Z

fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rewrite/OptOlapPartitionPruner.java

+            checkScanPartitionLimit(selectedPartitionIds.size());
+        } catch (AnalysisException e) {
+            LOG.warn("{} queryId: {}", e.getMessage(), DebugUtil.printId(ConnectContext.get().getQueryId()));
+            throw new StarRocksPlannerException(e.getMessage(), ErrorType.INTERNAL_ERROR);


INTERNAL_ERROR or USER_ERROR?

MatthewH00 · 2024-12-11T02:59:18Z

should add UT to enforce the behavior of the session variable. Sqltester is an add-on testing in case it is too complicated to cover the code path in UT.

you mean set variable diffrent value like -1,0,a,2..? add more diffrent value in Sqltester should be ok?

kevincai · 2024-12-11T03:05:21Z

should add UT to enforce the behavior of the session variable. Sqltester is an add-on testing in case it is too complicated to cover the code path in UT.

you mean set variable diffrent value like -1,0,a,2..? add more diffrent value in Sqltester should be ok?

direct Unit test cases are preferred in fe/fe-core/

Signed-off-by: hmx <[email protected]>

MatthewH00 · 2024-12-11T11:25:01Z

should add UT to enforce the behavior of the session variable. Sqltester is an add-on testing in case it is too complicated to cover the code path in UT.

@kevincai Please review again. i add ut in the pr and fix the problem you raise above.

kevincai · 2024-12-11T14:14:04Z

should add UT to enforce the behavior of the session variable. Sqltester is an add-on testing in case it is too complicated to cover the code path in UT.

@kevincai Please review again. i add ut in the pr and fix the problem you raise above.

Wondering why [4191, 4192] are not covered by the tests. Is the interface needed at all?

MatthewH00 · 2024-12-12T01:30:36Z

Wondering why [4191, 4192] are not covered by the tests. Is the interface needed at all?

the function setScanOlapPartitionNumLimit[4191, 4192] is not necessary after testing.
since all other variables include setter function, so i retain it.

MatthewH00 · 2024-12-12T03:27:38Z

Wondering why [4191, 4192] are not covered by the tests. Is the interface needed at all?

the function setScanOlapPartitionNumLimit[4191, 4192] is not necessary after testing. since all other variables include setter function, so i retain it.

@kevincai Could you please help to push the pr find other R&D to review?

For the setScanOlapPartitionNumLimit[4191, 4192] is necessary or not, i find other currently exists session variable to test. similarly find the setter function is not necessary(could take affect by set xxx=value even though without setter function), you could find related R&D to confirm it.

Signed-off-by: hmx <[email protected]>

yingtingdong · 2024-12-19T04:39:59Z

1.the variable scan_olap_partition_num_limit is suitable to per table when query is a complex join query. 2.like current variable query_timeout\query_mem_limit\scan_hive_partition_num_limit , the variable scan_olap_partition_num_limit is for cluster level , not set on resouce group. @kevincai you could consider two pr , see which one is better

@kevincai @kaijianding I think both retain may be good. if user not set resouce group could use scan_olap_partition_num_limit at cluster level, if user set resouce group could set limit at resouce group level. for current limit like query_timeout/query_mem_limit could both set cluster and resouce group level.

@kaijianding I also tend to keep both. In your pr #53916 , you set the scan limit for each table, but the parameter name is partition_scan_number_limit_rule. Can the two parameters be unified? Set a scan limit that takes effect for all partitions instead of specifying it separately for each table.

kaijianding · 2024-12-19T05:55:35Z

1.the variable scan_olap_partition_num_limit is suitable to per table when query is a complex join query. 2.like current variable query_timeout\query_mem_limit\scan_hive_partition_num_limit , the variable scan_olap_partition_num_limit is for cluster level , not set on resouce group. @kevincai you could consider two pr , see which one is better

@kevincai @kaijianding I think both retain may be good. if user not set resouce group could use scan_olap_partition_num_limit at cluster level, if user set resouce group could set limit at resouce group level. for current limit like query_timeout/query_mem_limit could both set cluster and resouce group level.

@kaijianding I also tend to keep both. In your pr #53916 , you set the scan limit for each table, but the parameter name is partition_scan_number_limit_rule. Can the two parameters be unified? Set a scan limit that takes effect for all partitions instead of specifying it separately for each table.

I think every table should have its own limit. In a complex query, a bigger table should have smaller limit, a smaller table may not be limited at all.

yingtingdong · 2024-12-19T06:13:08Z

1.the variable scan_olap_partition_num_limit is suitable to per table when query is a complex join query. 2.like current variable query_timeout\query_mem_limit\scan_hive_partition_num_limit , the variable scan_olap_partition_num_limit is for cluster level , not set on resouce group. @kevincai you could consider two pr , see which one is better

@kevincai @kaijianding I think both retain may be good. if user not set resouce group could use scan_olap_partition_num_limit at cluster level, if user set resouce group could set limit at resouce group level. for current limit like query_timeout/query_mem_limit could both set cluster and resouce group level.

@kaijianding I also tend to keep both. In your pr #53916 , you set the scan limit for each table, but the parameter name is partition_scan_number_limit_rule. Can the two parameters be unified? Set a scan limit that takes effect for all partitions instead of specifying it separately for each table.

I think every table should have its own limit. In a complex query, a bigger table should have smaller limit, a smaller table may not be limited at all.

The row limit of large tables should be larger than that of small tables, so is it okay to use just one value directly. Setting different rules for different tables seems uncommon. Moreover, the data of the tables is dynamic. If different values are set for each table, does the user need to adjust the rules frequently? Here, if only considering the resource usage limit, it is more reasonable to set the same threshold for all tables.

kaijianding · 2024-12-20T02:44:35Z

1.the variable scan_olap_partition_num_limit is suitable to per table when query is a complex join query. 2.like current variable query_timeout\query_mem_limit\scan_hive_partition_num_limit , the variable scan_olap_partition_num_limit is for cluster level , not set on resouce group. @kevincai you could consider two pr , see which one is better

@kevincai @kaijianding I think both retain may be good. if user not set resouce group could use scan_olap_partition_num_limit at cluster level, if user set resouce group could set limit at resouce group level. for current limit like query_timeout/query_mem_limit could both set cluster and resouce group level.

@kaijianding I also tend to keep both. In your pr #53916 , you set the scan limit for each table, but the parameter name is partition_scan_number_limit_rule. Can the two parameters be unified? Set a scan limit that takes effect for all partitions instead of specifying it separately for each table.

I think every table should have its own limit. In a complex query, a bigger table should have smaller limit, a smaller table may not be limited at all.

The row limit of large tables should be larger than that of small tables, so is it okay to use just one value directly. Setting different rules for different tables seems uncommon. Moreover, the data of the tables is dynamic. If different values are set for each table, does the user need to adjust the rules frequently? Here, if only considering the resource usage limit, it is more reasonable to set the same threshold for all tables.

This rule is to limit partition scan number, it's not row limit.

In my prod env, this rule is not adjusted since it's creation due to we know which big tables should be limited with partitions scan number from beginning.

yingtingdong · 2024-12-25T07:36:48Z

1.the variable scan_olap_partition_num_limit is suitable to per table when query is a complex join query. 2.like current variable query_timeout\query_mem_limit\scan_hive_partition_num_limit , the variable scan_olap_partition_num_limit is for cluster level , not set on resouce group. @kevincai you could consider two pr , see which one is better

@kevincai @kaijianding I think both retain may be good. if user not set resouce group could use scan_olap_partition_num_limit at cluster level, if user set resouce group could set limit at resouce group level. for current limit like query_timeout/query_mem_limit could both set cluster and resouce group level.

@kaijianding I also tend to keep both. In your pr #53916 , you set the scan limit for each table, but the parameter name is partition_scan_number_limit_rule. Can the two parameters be unified? Set a scan limit that takes effect for all partitions instead of specifying it separately for each table.

I think every table should have its own limit. In a complex query, a bigger table should have smaller limit, a smaller table may not be limited at all.

The row limit of large tables should be larger than that of small tables, so is it okay to use just one value directly. Setting different rules for different tables seems uncommon. Moreover, the data of the tables is dynamic. If different values are set for each table, does the user need to adjust the rules frequently? Here, if only considering the resource usage limit, it is more reasonable to set the same threshold for all tables.

This rule is to limit partition scan number, it's not row limit.

In my prod env, this rule is not adjusted since it's creation due to we know which big tables should be limited with partitions scan number from beginning.

I think the resource group is bound to the computing resources, rather than to the table or even the partition. The partition limit seems to be applicable only when the user clearly knows the size of their table. It is difficult to set this value in scenarios where the size cannot be clearly estimated.

MatthewH00 · 2024-12-25T08:15:58Z

@kevincai The PR code review looks have passed last week. Could you help to merge it to the main branch when have free time?

kaijianding · 2024-12-25T09:12:52Z

I think the resource group is bound to the computing resources, rather than to the table or even the partition. The partition limit seems to be applicable only when the user clearly knows the size of their table. It is difficult to set this value in scenarios where the size cannot be clearly estimated.

User can modify this rule after their table has data according to their query needs. It's easy to know the size of a table or a partition by show data or show partitions

Yes, I think the purpose to limit the partition scan number is because that there are limited computing resources, a query should be rejected if it can ocuppy too many resources.

github-actions · 2024-12-26T02:30:22Z

@Mergifyio backport branch-3.4

github-actions · 2024-12-26T02:30:23Z

@Mergifyio backport branch-3.3

mergify · 2024-12-26T02:30:26Z

backport branch-3.4

✅ Backports have been created

#54352 [Enhancement] add partition scan num limit when query internal olap table (backport #53747) has been created for branch branch-3.4

mergify · 2024-12-26T02:30:31Z

backport branch-3.3

✅ Backports have been created

#54353 [Enhancement] add partition scan num limit when query internal olap table (backport #53747) has been created for branch branch-3.3 but encountered conflicts

…able (#53747) Why I'm doing: when query big size internal olap table with full table scan or scan too many partitions, would cause BE/CN node high load, lead to cluster instability. What I'm doing: add a new FE session variable scan_olap_partition_num_limit to limit partition scan num when query internal olap table. (default value is 0, means no limitation) Signed-off-by: MatthewH00 <[email protected]> Signed-off-by: hmx <[email protected]> (cherry picked from commit a0a25b4)

…able (#53747) Why I'm doing: when query big size internal olap table with full table scan or scan too many partitions, would cause BE/CN node high load, lead to cluster instability. What I'm doing: add a new FE session variable scan_olap_partition_num_limit to limit partition scan num when query internal olap table. (default value is 0, means no limitation) Signed-off-by: MatthewH00 <[email protected]> Signed-off-by: hmx <[email protected]> (cherry picked from commit a0a25b4) # Conflicts: # fe/fe-core/src/main/java/com/starrocks/qe/SessionVariable.java

…able (backport #53747) (#54352) Co-authored-by: hmx <[email protected]>

…able (backport #53747) (#54353) Signed-off-by: Kevin Xiaohua Cai <[email protected]> Co-authored-by: hmx <[email protected]> Co-authored-by: Kevin Xiaohua Cai <[email protected]>

Refs StarRocks#54353, StarRocks#53747 Signed-off-by: Kevin Xiaohua Cai <[email protected]>

…able (StarRocks#53747) Why I'm doing: when query big size internal olap table with full table scan or scan too many partitions, would cause BE/CN node high load, lead to cluster instability. What I'm doing: add a new FE session variable scan_olap_partition_num_limit to limit partition scan num when query internal olap table. (default value is 0, means no limitation) Signed-off-by: MatthewH00 <[email protected]> Signed-off-by: hmx <[email protected]>

…able (backport StarRocks#53747) (StarRocks#54353) Signed-off-by: Kevin Xiaohua Cai <[email protected]> Co-authored-by: hmx <[email protected]> Co-authored-by: Kevin Xiaohua Cai <[email protected]>

partition scan num limit

d2b4577

Signed-off-by: MatthewH00 <[email protected]>

MatthewH00 requested a review from a team as a code owner December 10, 2024 05:13

mergify bot assigned MatthewH00 Dec 10, 2024

github-actions bot added 3.3 behavior_changed labels Dec 10, 2024

MatthewH00 added 4 commits December 10, 2024 15:20

fix

53abc70

Signed-off-by: hmx <[email protected]>

fix

f41934d

Signed-off-by: hmx <[email protected]>

fix

a34e838

Signed-off-by: hmx <[email protected]>

rerun

c44a408

Signed-off-by: hmx <[email protected]>

github-actions bot added the 3.4 label Dec 11, 2024

kevincai reviewed Dec 11, 2024

View reviewed changes

MatthewH00 added 5 commits December 11, 2024 16:34

adjust

c7048b3

Signed-off-by: hmx <[email protected]>

adjust

17a89b1

Signed-off-by: hmx <[email protected]>

add ut

08f9b83

Signed-off-by: hmx <[email protected]>

fix

759301a

Signed-off-by: hmx <[email protected]>

fix

3759959

Signed-off-by: hmx <[email protected]>

kevincai previously approved these changes Dec 11, 2024

View reviewed changes

Seaven previously approved these changes Dec 12, 2024

View reviewed changes

adjust

eb5bfcb

Signed-off-by: hmx <[email protected]>

MatthewH00 dismissed stale reviews from Seaven and kevincai via eb5bfcb December 12, 2024 06:53

kevincai previously approved these changes Dec 12, 2024

View reviewed changes

rerun

cf98db9

Signed-off-by: hmx <[email protected]>

Seaven enabled auto-merge (squash) December 17, 2024 03:50

alvin-celerdata disabled auto-merge December 26, 2024 02:29

alvin-celerdata merged commit a0a25b4 into StarRocks:main Dec 26, 2024
49 of 51 checks passed

github-actions bot removed the 3.4 label Dec 26, 2024

github-actions bot removed the 3.3 label Dec 26, 2024

mergify bot mentioned this pull request Dec 26, 2024

[Enhancement] add partition scan num limit when query internal olap table (backport #53747) #54352

Merged

18 tasks

mergify bot mentioned this pull request Dec 26, 2024

[Enhancement] add partition scan num limit when query internal olap table (backport #53747) #54353

Merged

18 tasks

wanpengfei-git pushed a commit that referenced this pull request Dec 26, 2024

[Enhancement] add partition scan num limit when query internal olap t…

218d8c4

…able (backport #53747) (#54352) Co-authored-by: hmx <[email protected]>

github-actions bot added the 3.4-merged label Dec 26, 2024

github-actions bot added the 3.3-merged label Dec 26, 2024

kevincai added a commit to kevincai/starrocks that referenced this pull request Dec 30, 2024

[Doc] doc for scan_olap_partition_num_limit session variable

30d6067

Refs StarRocks#54353, StarRocks#53747 Signed-off-by: Kevin Xiaohua Cai <[email protected]>

kevincai mentioned this pull request Dec 30, 2024

[Doc] doc for scan_olap_partition_num_limit session variable #54480

Merged

24 tasks

This was referenced Jan 7, 2025

[Doc] doc for scan_olap_partition_num_limit session variable (backport #54480) #54807

Merged

[Doc] doc for scan_olap_partition_num_limit session variable (backport #54480) #54808

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] add partition scan num limit when query internal olap table #53747

[Enhancement] add partition scan num limit when query internal olap table #53747

MatthewH00 commented Dec 10, 2024 •

edited by kevincai

Loading

MatthewH00 commented Dec 11, 2024

kevincai left a comment

kevincai Dec 11, 2024

kevincai Dec 11, 2024

MatthewH00 commented Dec 11, 2024

kevincai commented Dec 11, 2024

MatthewH00 commented Dec 11, 2024

kevincai commented Dec 11, 2024

MatthewH00 commented Dec 12, 2024

MatthewH00 commented Dec 12, 2024

yingtingdong commented Dec 19, 2024

kaijianding commented Dec 19, 2024

yingtingdong commented Dec 19, 2024

kaijianding commented Dec 20, 2024 •

edited

Loading

yingtingdong commented Dec 25, 2024

MatthewH00 commented Dec 25, 2024

kaijianding commented Dec 25, 2024 •

edited

Loading

github-actions bot commented Dec 26, 2024

github-actions bot commented Dec 26, 2024

mergify bot commented Dec 26, 2024 •

edited

Loading

mergify bot commented Dec 26, 2024 •

edited

Loading

[Enhancement] add partition scan num limit when query internal olap table #53747

[Enhancement] add partition scan num limit when query internal olap table #53747

Conversation

MatthewH00 commented Dec 10, 2024 • edited by kevincai Loading

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

MatthewH00 commented Dec 11, 2024

kevincai left a comment

Choose a reason for hiding this comment

kevincai Dec 11, 2024

Choose a reason for hiding this comment

kevincai Dec 11, 2024

Choose a reason for hiding this comment

MatthewH00 commented Dec 11, 2024

kevincai commented Dec 11, 2024

MatthewH00 commented Dec 11, 2024

kevincai commented Dec 11, 2024

MatthewH00 commented Dec 12, 2024

MatthewH00 commented Dec 12, 2024

yingtingdong commented Dec 19, 2024

kaijianding commented Dec 19, 2024

yingtingdong commented Dec 19, 2024

kaijianding commented Dec 20, 2024 • edited Loading

yingtingdong commented Dec 25, 2024

MatthewH00 commented Dec 25, 2024

kaijianding commented Dec 25, 2024 • edited Loading

github-actions bot commented Dec 26, 2024

github-actions bot commented Dec 26, 2024

mergify bot commented Dec 26, 2024 • edited Loading

✅ Backports have been created

mergify bot commented Dec 26, 2024 • edited Loading

✅ Backports have been created

MatthewH00 commented Dec 10, 2024 •

edited by kevincai

Loading

kaijianding commented Dec 20, 2024 •

edited

Loading

kaijianding commented Dec 25, 2024 •

edited

Loading

mergify bot commented Dec 26, 2024 •

edited

Loading

mergify bot commented Dec 26, 2024 •

edited

Loading