Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable TaskMonitor warnings of slow tasks by default #278

Open
lfrancke opened this issue Nov 22, 2022 · 0 comments
Open

Enable TaskMonitor warnings of slow tasks by default #278

lfrancke opened this issue Nov 22, 2022 · 0 comments

Comments

@lfrancke
Copy link
Member

lfrancke commented Nov 22, 2022

HBase has a TaskMonitor that can warn on any tasks that take too long.
https://github.com/apache/hbase/blob/47996d6c2128815e45bb8bdb6e3a470bfddd6106/hbase-server/src/main/java/org/apache/hadoop/hbase/monitoring/TaskMonitor.java#L49

By default it does not warn as the setting hbase.taskmonitor.rpc.warn.time defaults to 0 which disables the check. But it's definitely useful to have a warning like this.

We saw tasks stuck for a week at a customer and this went unnoticed. As I'm not 100% sure on which tasks (i.e. also Procedures?) are monitored I suggest a rather high threshold.
Maybe 1 hour? (Even though most tasks should finish within seconds)

This is a warning that we then - later - should extract from the logs to convert into an alert and/or at least have a dashboard (in Grafana or Opensearch Dashboard) to show these.
The log message will start with: "Task may be stuck"

Also note that older versions have a bug that can wrong warnings: https://issues.apache.org/jira/browse/HBASE-22935

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant