Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] add partition scan num limit when query internal olap table #53747

Merged
merged 13 commits into from
Dec 26, 2024

Conversation

MatthewH00
Copy link
Contributor

@MatthewH00 MatthewH00 commented Dec 10, 2024

Why I'm doing:

when query big size internal olap table with full table scan or scan too many partitions, would cause BE/CN node high load, lead to cluster instability.

What I'm doing:

add a new FE session variable scan_olap_partition_num_limit to limit partition scan num when query internal olap table.
(default value is 0, means no limitation)

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

Signed-off-by: MatthewH00 <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
@MatthewH00
Copy link
Contributor Author

@kevincai Hi Could you please review the pr when have free time?
the pr add a fe session variable to limit partition scan num, avoid cluster instability caused by full table scan or scan too many partitions when query big size internal olap table.

@github-actions github-actions bot added the 3.4 label Dec 11, 2024
Copy link
Contributor

@kevincai kevincai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should add UT to enforce the behavior of the session variable. Sqltester is an add-on testing in case it is too complicated to cover the code path in UT.

LOG.warn("fail to get variable scan_olap_partition_num_limit, set default value 0, msg: {}", e.getMessage());
}
if (scanOlapPartitionNumLimit > 0 && selectedPartitionNum > scanOlapPartitionNumLimit) {
String msg = "Exceeded the limit of " + scanOlapPartitionNumLimit + " max scan olap partitions. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exceeded the limit of number of olap table partitions to be scanned. Number of partitions allowed: {}, number of partitions to be scanned: {}. Please adjust the SQL or change the limit ...

checkScanPartitionLimit(selectedPartitionIds.size());
} catch (AnalysisException e) {
LOG.warn("{} queryId: {}", e.getMessage(), DebugUtil.printId(ConnectContext.get().getQueryId()));
throw new StarRocksPlannerException(e.getMessage(), ErrorType.INTERNAL_ERROR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INTERNAL_ERROR or USER_ERROR?

@MatthewH00
Copy link
Contributor Author

should add UT to enforce the behavior of the session variable. Sqltester is an add-on testing in case it is too complicated to cover the code path in UT.

you mean set variable diffrent value like -1,0,a,2..? add more diffrent value in Sqltester should be ok?

@kevincai
Copy link
Contributor

should add UT to enforce the behavior of the session variable. Sqltester is an add-on testing in case it is too complicated to cover the code path in UT.

you mean set variable diffrent value like -1,0,a,2..? add more diffrent value in Sqltester should be ok?

direct Unit test cases are preferred in fe/fe-core/

Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
Signed-off-by: hmx <[email protected]>
@MatthewH00
Copy link
Contributor Author

should add UT to enforce the behavior of the session variable. Sqltester is an add-on testing in case it is too complicated to cover the code path in UT.

@kevincai Please review again. i add ut in the pr and fix the problem you raise above.

kevincai
kevincai previously approved these changes Dec 11, 2024
@kevincai
Copy link
Contributor

should add UT to enforce the behavior of the session variable. Sqltester is an add-on testing in case it is too complicated to cover the code path in UT.

@kevincai Please review again. i add ut in the pr and fix the problem you raise above.

Wondering why [4191, 4192] are not covered by the tests. Is the interface needed at all?

@MatthewH00
Copy link
Contributor Author

Wondering why [4191, 4192] are not covered by the tests. Is the interface needed at all?

the function setScanOlapPartitionNumLimit[4191, 4192] is not necessary after testing.
since all other variables include setter function, so i retain it.

@MatthewH00
Copy link
Contributor Author

Wondering why [4191, 4192] are not covered by the tests. Is the interface needed at all?

the function setScanOlapPartitionNumLimit[4191, 4192] is not necessary after testing. since all other variables include setter function, so i retain it.

@kevincai Could you please help to push the pr find other R&D to review?

For the setScanOlapPartitionNumLimit[4191, 4192] is necessary or not, i find other currently exists session variable to test. similarly find the setter function is not necessary(could take affect by set xxx=value even though without setter function), you could find related R&D to confirm it.

Seaven
Seaven previously approved these changes Dec 12, 2024
Signed-off-by: hmx <[email protected]>
@MatthewH00 MatthewH00 dismissed stale reviews from Seaven and kevincai via eb5bfcb December 12, 2024 06:53
kevincai
kevincai previously approved these changes Dec 12, 2024
Signed-off-by: hmx <[email protected]>
@Seaven Seaven enabled auto-merge (squash) December 17, 2024 03:50
@yingtingdong
Copy link
Contributor

1.the variable scan_olap_partition_num_limit is suitable to per table when query is a complex join query. 2.like current variable query_timeout\query_mem_limit\scan_hive_partition_num_limit , the variable scan_olap_partition_num_limit is for cluster level , not set on resouce group. @kevincai you could consider two pr , see which one is better

@kevincai @kaijianding I think both retain may be good. if user not set resouce group could use scan_olap_partition_num_limit at cluster level, if user set resouce group could set limit at resouce group level. for current limit like query_timeout/query_mem_limit could both set cluster and resouce group level.

@kaijianding I also tend to keep both. In your pr #53916 , you set the scan limit for each table, but the parameter name is partition_scan_number_limit_rule. Can the two parameters be unified? Set a scan limit that takes effect for all partitions instead of specifying it separately for each table.

@kaijianding
Copy link
Contributor

1.the variable scan_olap_partition_num_limit is suitable to per table when query is a complex join query. 2.like current variable query_timeout\query_mem_limit\scan_hive_partition_num_limit , the variable scan_olap_partition_num_limit is for cluster level , not set on resouce group. @kevincai you could consider two pr , see which one is better

@kevincai @kaijianding I think both retain may be good. if user not set resouce group could use scan_olap_partition_num_limit at cluster level, if user set resouce group could set limit at resouce group level. for current limit like query_timeout/query_mem_limit could both set cluster and resouce group level.

@kaijianding I also tend to keep both. In your pr #53916 , you set the scan limit for each table, but the parameter name is partition_scan_number_limit_rule. Can the two parameters be unified? Set a scan limit that takes effect for all partitions instead of specifying it separately for each table.

I think every table should have its own limit. In a complex query, a bigger table should have smaller limit, a smaller table may not be limited at all.

@yingtingdong
Copy link
Contributor

1.the variable scan_olap_partition_num_limit is suitable to per table when query is a complex join query. 2.like current variable query_timeout\query_mem_limit\scan_hive_partition_num_limit , the variable scan_olap_partition_num_limit is for cluster level , not set on resouce group. @kevincai you could consider two pr , see which one is better

@kevincai @kaijianding I think both retain may be good. if user not set resouce group could use scan_olap_partition_num_limit at cluster level, if user set resouce group could set limit at resouce group level. for current limit like query_timeout/query_mem_limit could both set cluster and resouce group level.

@kaijianding I also tend to keep both. In your pr #53916 , you set the scan limit for each table, but the parameter name is partition_scan_number_limit_rule. Can the two parameters be unified? Set a scan limit that takes effect for all partitions instead of specifying it separately for each table.

I think every table should have its own limit. In a complex query, a bigger table should have smaller limit, a smaller table may not be limited at all.

The row limit of large tables should be larger than that of small tables, so is it okay to use just one value directly. Setting different rules for different tables seems uncommon. Moreover, the data of the tables is dynamic. If different values are set for each table, does the user need to adjust the rules frequently? Here, if only considering the resource usage limit, it is more reasonable to set the same threshold for all tables.

@kaijianding
Copy link
Contributor

kaijianding commented Dec 20, 2024

1.the variable scan_olap_partition_num_limit is suitable to per table when query is a complex join query. 2.like current variable query_timeout\query_mem_limit\scan_hive_partition_num_limit , the variable scan_olap_partition_num_limit is for cluster level , not set on resouce group. @kevincai you could consider two pr , see which one is better

@kevincai @kaijianding I think both retain may be good. if user not set resouce group could use scan_olap_partition_num_limit at cluster level, if user set resouce group could set limit at resouce group level. for current limit like query_timeout/query_mem_limit could both set cluster and resouce group level.

@kaijianding I also tend to keep both. In your pr #53916 , you set the scan limit for each table, but the parameter name is partition_scan_number_limit_rule. Can the two parameters be unified? Set a scan limit that takes effect for all partitions instead of specifying it separately for each table.

I think every table should have its own limit. In a complex query, a bigger table should have smaller limit, a smaller table may not be limited at all.

The row limit of large tables should be larger than that of small tables, so is it okay to use just one value directly. Setting different rules for different tables seems uncommon. Moreover, the data of the tables is dynamic. If different values are set for each table, does the user need to adjust the rules frequently? Here, if only considering the resource usage limit, it is more reasonable to set the same threshold for all tables.

This rule is to limit partition scan number, it's not row limit.

In my prod env, this rule is not adjusted since it's creation due to we know which big tables should be limited with partitions scan number from beginning.

@yingtingdong
Copy link
Contributor

1.the variable scan_olap_partition_num_limit is suitable to per table when query is a complex join query. 2.like current variable query_timeout\query_mem_limit\scan_hive_partition_num_limit , the variable scan_olap_partition_num_limit is for cluster level , not set on resouce group. @kevincai you could consider two pr , see which one is better

@kevincai @kaijianding I think both retain may be good. if user not set resouce group could use scan_olap_partition_num_limit at cluster level, if user set resouce group could set limit at resouce group level. for current limit like query_timeout/query_mem_limit could both set cluster and resouce group level.

@kaijianding I also tend to keep both. In your pr #53916 , you set the scan limit for each table, but the parameter name is partition_scan_number_limit_rule. Can the two parameters be unified? Set a scan limit that takes effect for all partitions instead of specifying it separately for each table.

I think every table should have its own limit. In a complex query, a bigger table should have smaller limit, a smaller table may not be limited at all.

The row limit of large tables should be larger than that of small tables, so is it okay to use just one value directly. Setting different rules for different tables seems uncommon. Moreover, the data of the tables is dynamic. If different values are set for each table, does the user need to adjust the rules frequently? Here, if only considering the resource usage limit, it is more reasonable to set the same threshold for all tables.

This rule is to limit partition scan number, it's not row limit.

In my prod env, this rule is not adjusted since it's creation due to we know which big tables should be limited with partitions scan number from beginning.

I think the resource group is bound to the computing resources, rather than to the table or even the partition. The partition limit seems to be applicable only when the user clearly knows the size of their table. It is difficult to set this value in scenarios where the size cannot be clearly estimated.

@MatthewH00
Copy link
Contributor Author

@kevincai The PR code review looks have passed last week. Could you help to merge it to the main branch when have free time?

@kaijianding
Copy link
Contributor

kaijianding commented Dec 25, 2024

I think the resource group is bound to the computing resources, rather than to the table or even the partition. The partition limit seems to be applicable only when the user clearly knows the size of their table. It is difficult to set this value in scenarios where the size cannot be clearly estimated.

User can modify this rule after their table has data according to their query needs. It's easy to know the size of a table or a partition by show data or show partitions

Yes, I think the purpose to limit the partition scan number is because that there are limited computing resources, a query should be rejected if it can ocuppy too many resources.

@alvin-celerdata alvin-celerdata merged commit a0a25b4 into StarRocks:main Dec 26, 2024
49 of 51 checks passed
Copy link

@Mergifyio backport branch-3.4

@github-actions github-actions bot removed the 3.4 label Dec 26, 2024
Copy link

@Mergifyio backport branch-3.3

@github-actions github-actions bot removed the 3.3 label Dec 26, 2024
Copy link
Contributor

mergify bot commented Dec 26, 2024

backport branch-3.4

✅ Backports have been created

Copy link
Contributor

mergify bot commented Dec 26, 2024

backport branch-3.3

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Dec 26, 2024
…able (#53747)

Why I'm doing:
when query big size internal olap table with full table scan or scan too many partitions, would cause BE/CN node high load, lead to cluster instability.

What I'm doing:
add a new FE session variable scan_olap_partition_num_limit to limit partition scan num when query internal olap table.
(default value is 0, means no limitation)

Signed-off-by: MatthewH00 <[email protected]>
Signed-off-by: hmx <[email protected]>
(cherry picked from commit a0a25b4)
mergify bot pushed a commit that referenced this pull request Dec 26, 2024
…able (#53747)

Why I'm doing:
when query big size internal olap table with full table scan or scan too many partitions, would cause BE/CN node high load, lead to cluster instability.

What I'm doing:
add a new FE session variable scan_olap_partition_num_limit to limit partition scan num when query internal olap table.
(default value is 0, means no limitation)

Signed-off-by: MatthewH00 <[email protected]>
Signed-off-by: hmx <[email protected]>
(cherry picked from commit a0a25b4)

# Conflicts:
#	fe/fe-core/src/main/java/com/starrocks/qe/SessionVariable.java
wanpengfei-git pushed a commit that referenced this pull request Dec 26, 2024
wanpengfei-git pushed a commit that referenced this pull request Dec 26, 2024
…able (backport #53747) (#54353)

Signed-off-by: Kevin Xiaohua Cai <[email protected]>
Co-authored-by: hmx <[email protected]>
Co-authored-by: Kevin Xiaohua Cai <[email protected]>
kevincai added a commit to kevincai/starrocks that referenced this pull request Dec 30, 2024
maggie-zhu pushed a commit to maggie-zhu/starrocks that referenced this pull request Jan 6, 2025
…able (StarRocks#53747)

Why I'm doing:
when query big size internal olap table with full table scan or scan too many partitions, would cause BE/CN node high load, lead to cluster instability.

What I'm doing:
add a new FE session variable scan_olap_partition_num_limit to limit partition scan num when query internal olap table.
(default value is 0, means no limitation)

Signed-off-by: MatthewH00 <[email protected]>
Signed-off-by: hmx <[email protected]>
zhangyifan27 pushed a commit to zhangyifan27/starrocks that referenced this pull request Feb 10, 2025
…able (backport StarRocks#53747) (StarRocks#54353)

Signed-off-by: Kevin Xiaohua Cai <[email protected]>
Co-authored-by: hmx <[email protected]>
Co-authored-by: Kevin Xiaohua Cai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants