Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] check partition scan number limit in resource group #53916

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kaijianding
Copy link
Contributor

@kaijianding kaijianding commented Dec 13, 2024

Why I'm doing:

Want to limit the number of scanned partitions to prevent starrocks from heavy load

What I'm doing:

Fixes #issue

Add a limit rule in resource group.
eg :
image

partition_scan_number_limit_rule: {"db_1.tbl_1":7, "db_1.tbl_2":10}

If table scan in a single sub-query exceeds the limit, the query will be rejected.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

if (partitionScanNumberLimitRule != null) {
twg.setPartition_scan_number_limit_rule(partitionScanNumberLimitRule);
}

twg.setExclusive_cpu_cores(getNormalizedExclusiveCpuCores());

twg.setVersion(version);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:

The method getPartitionScanNumberLimitRule() may return null, which can lead to a NullPointerException if the returned value is used without a null check. This risk arises when clients of this method assume that the rule is never null and don't implement proper error handling.

You can modify the code like this:

public String getPartitionScanNumberLimitRule() {
    return partitionScanNumberLimitRule != null ? partitionScanNumberLimitRule : "";
}

This modification ensures that the method returns an empty string instead of null, reducing the chance of a NullPointerException.

public void setCheckPartitionScanNumberLimitWhenExplain(boolean checkPartitionScanNumberLimitWhenExplain) {
this.checkPartitionScanNumberLimitWhenExplain = checkPartitionScanNumberLimitWhenExplain;
}

// Serialize to thrift object
// used for rest api
public TQueryOptions toThrift() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Missing serialization for new variables checkPartitionScanNumberLimit and checkPartitionScanNumberLimitWhenExplain in toThrift() method.

You can modify the code like this:

// Serialize to thrift object
// used for rest api
public TQueryOptions toThrift() {
    TQueryOptions queryOptions = new TQueryOptions();
    
    // Existing serialization logic here
    
    // Add serialization for new variables
    queryOptions.setCheckPartitionScanNumberLimit(checkPartitionScanNumberLimit);
    queryOptions.setCheckPartitionScanNumberLimitWhenExplain(checkPartitionScanNumberLimitWhenExplain);

    return queryOptions;
}

" group " + workGroup.getName());
}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Potential NullPointerException when accessing workGroup.getPartition_scan_number_limit_rule().

You can modify the code like this:

private void checkPartitionScanNumberLimit() throws StarRocksException {
    if (connectContext.isExplain()
            && !connectContext.getSessionVariable().isCheckPartitionScanNumberLimitWhenExplain()) {
        return;
    }
    if (!connectContext.getSessionVariable().isCheckPartitionScanNumberLimit()) {
        return;
    }
    TWorkGroup workGroup = jobSpec.getResourceGroup();
    if (workGroup == null || workGroup.getPartition_scan_number_limit_rule() == null) {
        return;
    }
    Map<String, Integer> rule = GsonUtils.GSON.fromJson(workGroup.getPartition_scan_number_limit_rule(),
            new TypeToken<Map<String, Integer>>() {
            }.getType());
    for (ScanNode scanNode : jobSpec.getScanNodes()) {
        if (!(scanNode instanceof OlapScanNode)) {
            continue;
        }
        TableName tableName =
                connectContext.getResolvedTables().get(((OlapScanNode) scanNode).getOlapTable().getId());
        if (tableName == null) {
            continue;
        }
        String tblName = tableName.getDb() + "." + tableName.getTbl();
        Integer limit = rule.get(tblName);
        if (limit == null) {
            continue;
        }
        if (((OlapScanNode) scanNode).getSelectedPartitionIds().size() > limit) {
            throw new StarRocksException(tblName + " scans more than " + limit +
                    " partition(s), which violates the limit defined in partition_scan_number_limit_rule in resource" +
                    " group " + workGroup.getName());
        }
    }
}

@kaijianding kaijianding changed the title [Feature] check partition scan number limit in resource group [Enhancement] check partition scan number limit in resource group Dec 13, 2024
@kevincai
Copy link
Contributor

kevincai commented Dec 13, 2024

saw a similar PR to this partition scan limit in #53747

might be good to put together and discuss how this feature can be designed.

Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[FE Incremental Coverage Report]

pass : 59 / 67 (88.06%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/sql/ast/AlterResourceGroupStmt.java 0 1 00.00% [88]
🔵 com/starrocks/catalog/ResourceGroupMgr.java 2 3 66.67% [423]
🔵 com/starrocks/qe/SessionVariable.java 6 8 75.00% [4478, 4479]
🔵 com/starrocks/qe/DefaultCoordinator.java 26 30 86.67% [1354, 1368, 1373, 1378]
🔵 com/starrocks/sql/analyzer/QueryAnalyzer.java 1 1 100.00% []
🔵 com/starrocks/qe/StmtExecutor.java 1 1 100.00% []
🔵 com/starrocks/qe/ConnectContext.java 8 8 100.00% []
🔵 com/starrocks/catalog/ResourceGroup.java 7 7 100.00% []
🔵 com/starrocks/sql/analyzer/ResourceGroupAnalyzer.java 8 8 100.00% []

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants