-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Add cluster idle HTTP api #53850
Conversation
if (warehouse == null) { | ||
throw ErrorReportException.report(ErrorCode.ERR_UNKNOWN_WAREHOUSE, String.format("name: %s", warehouseName)); | ||
} | ||
Warehouse warehouse = getWarehouse(warehouseName); | ||
|
||
try { | ||
long workerGroupId = selectWorkerGroupInternal(warehouse.getId()).orElse(StarOSAgent.DEFAULT_WORKER_GROUP_ID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Potential null pointer dereference due to the use of getWarehouse
which might return null
.
You can modify the code like this:
public Warehouse getWarehouse(String warehouseName) {
Warehouse warehouse = nameToWh.get(warehouseName);
if (warehouse == null) {
throw ErrorReportException.report(ErrorCode.ERR_UNKNOWN_WAREHOUSE, String.format("name: %s", warehouseName));
}
return warehouse;
}
public Warehouse getWarehouse(long warehouseId) {
Warehouse warehouse = idToWh.get(warehouseId);
if (warehouse == null) {
throw ErrorReportException.report(ErrorCode.ERR_UNKNOWN_WAREHOUSE, String.format("id: %d", warehouseId));
}
return warehouse;
}
This ensures that Warehouse
is not null
before proceeding with operations on it, thus preventing potential null pointer exceptions.
fe/fe-core/src/main/java/com/starrocks/warehouse/WarehouseIdleChecker.java
Show resolved
Hide resolved
40a1c79
to
3294a0a
Compare
) Signed-off-by: shuming.li <[email protected]>
930162c
to
f9df7a8
Compare
@@ -396,6 +397,7 @@ protected void runRunningJob() throws AlterCancelException { | |||
this.finishedTimeMs = System.currentTimeMillis(); | |||
|
|||
GlobalStateMgr.getCurrentState().getEditLog().logAlterJob(this); | |||
WarehouseIdleChecker.updateJobLastFinishTime(warehouseId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a better choice to put it in AlterJobV2::run? Then you don't need to write each job separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
import com.starrocks.warehouse.IdleStatus; | ||
import io.netty.handler.codec.http.HttpMethod; | ||
|
||
public class IdleAction extends RestBaseAction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please indicate what the Idle content is. The name of this class cannot express the meaning of the content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -146,10 +151,7 @@ public List<Long> getAllComputeNodeIds(long warehouseId) { | |||
} | |||
|
|||
private List<Long> getAllComputeNodeIds(long warehouseId, long workerGroupId) { | |||
Warehouse warehouse = idToWh.get(warehouseId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to delete the existence check of warehouse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is check in the getWarehouse function
Signed-off-by: gengjun-git <[email protected]>
Signed-off-by: gengjun-git <[email protected]>
Signed-off-by: gengjun-git <[email protected]>
@@ -2755,6 +2755,9 @@ public class Config extends ConfigBase { | |||
@ConfField(mutable = true) | |||
public static int lake_warehouse_max_compute_replica = 3; | |||
|
|||
@ConfField(mutable = true, comment = "time interval to check whether warehouse is idle") | |||
public static long warehouse_idle_check_interval_seconds = 60; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider set this to 0 in open source version to disable the check by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a new config warehouse_idle_check_enable
, because the warehouse_idle_check_interval_seconds
is also used to check the last job finish time.
} | ||
|
||
@Override | ||
public void execute(BaseRequest request, BaseResponse response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this api endpoint be protected by authentication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not need, there is no secret info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be a security concern but I would defer it to you guys decision.
if (runningSQL.get() == 0 | ||
&& runningStreamLoad == 0 | ||
&& runningBrokerSparkLoad == 0 | ||
&& runningRoutineLoad == 0 | ||
&& runningBackupRestore == 0 | ||
&& runningAlterJob == 0 | ||
&& runningTask == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sum them , check the sum == 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: gengjun-git <[email protected]>
Signed-off-by: gengjun-git <[email protected]>
Quality Gate passedIssues Measures |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]❌ fail : 123 / 228 (53.95%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
@Mergifyio backport branch-3.4 |
@Mergifyio backport branch-3.3 |
@Mergifyio backport branch-3.2 |
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
(cherry picked from commit 6cd9fbc) # Conflicts: # fe/fe-core/src/main/java/com/starrocks/common/Config.java # fe/fe-core/src/main/java/com/starrocks/statistic/HyperStatisticsCollectJob.java
(cherry picked from commit 6cd9fbc) # Conflicts: # fe/fe-core/src/main/java/com/starrocks/common/Config.java # fe/fe-core/src/main/java/com/starrocks/qe/feedback/PlanAdvisorExecutor.java # fe/fe-core/src/main/java/com/starrocks/server/GlobalStateMgr.java # fe/fe-core/src/main/java/com/starrocks/statistic/HyperStatisticsCollectJob.java
(cherry picked from commit 6cd9fbc) # Conflicts: # fe/fe-core/src/main/java/com/starrocks/alter/LakeTableSchemaChangeJob.java # fe/fe-core/src/main/java/com/starrocks/alter/OnlineOptimizeJobV2.java # fe/fe-core/src/main/java/com/starrocks/alter/RollupJobV2.java # fe/fe-core/src/main/java/com/starrocks/alter/SchemaChangeJobV2.java # fe/fe-core/src/main/java/com/starrocks/backup/BackupJob.java # fe/fe-core/src/main/java/com/starrocks/backup/RestoreJob.java # fe/fe-core/src/main/java/com/starrocks/common/Config.java # fe/fe-core/src/main/java/com/starrocks/connector/metadata/MetadataExecutor.java # fe/fe-core/src/main/java/com/starrocks/datacache/DataCacheSelectExecutor.java # fe/fe-core/src/main/java/com/starrocks/load/routineload/RoutineLoadJob.java # fe/fe-core/src/main/java/com/starrocks/load/streamload/StreamLoadMgr.java # fe/fe-core/src/main/java/com/starrocks/load/streamload/StreamLoadTask.java # fe/fe-core/src/main/java/com/starrocks/qe/ConnectContext.java # fe/fe-core/src/main/java/com/starrocks/qe/StmtExecutor.java # fe/fe-core/src/main/java/com/starrocks/qe/feedback/PlanAdvisorExecutor.java # fe/fe-core/src/main/java/com/starrocks/server/GlobalStateMgr.java # fe/fe-core/src/main/java/com/starrocks/server/WarehouseManager.java # fe/fe-core/src/main/java/com/starrocks/sql/util/CustomizedQueryExecutor.java # fe/fe-core/src/main/java/com/starrocks/statistic/HyperStatisticsCollectJob.java # fe/fe-core/src/test/java/com/starrocks/scheduler/MVRefreshTestBase.java # fe/fe-core/src/test/java/com/starrocks/sql/optimizer/rule/transformation/materialization/MvRewriteTestBase.java
Signed-off-by: gengjun-git <[email protected]> Co-authored-by: gengjun-git <[email protected]>
Signed-off-by: gengjun-git <[email protected]> Co-authored-by: gengjun-git <[email protected]>
ignore backport check: 3.2.14 |
Why I'm doing:
Add cluster idle api to help judge the cluster status.
What I'm doing:
Add a new daemon thread WarehouseIdleChecker to count all the tasks being executed and the time when the last task ended. If all the tasks being executed are 0 and the time when the last task ended has exceeded
Config.warehouse_idle_check_interval_seconds
, the system is considered idle.The entry point for all SQL execution in the system is StmtExecutor. If it is synchronous SQL, we only need to count the SQL executed in StmtExecutor. If it is asynchronous SQL, such as broker load, we also need to count these asynchronously executed tasks: Stream Load, Broker Load, Spark Load, Routine Load, Backup/Restore, Schema Change.
In addition, for SQL executed within the system, such as statistics collection, we do not need to count them. Add the
isInternal
field in StmtExecutor to determine whether the SQL is initiated internally or by the user.api request
/api/idle_status
api response
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: