Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New liveness probe to check for deadlocked threads #11388

Open
wants to merge 24 commits into
base: 4.8.x
Choose a base branch
from

Conversation

abrenk
Copy link

@abrenk abrenk commented Nov 28, 2024

This is a new liveness probe that uses the ThreadMXBean to check for deadlocked threads.

It is currently enabled by default like the other probes in this module, as long as endpoints.health.deadlocked-threads.enabled is not set to false. There is no other configuration.

As one would expect the health status is set to DOWN if any deadlocked threads are found and their ThreadInfo including a formatted stacktrace are given in the details.

@CLAassistant
Copy link

CLAassistant commented Nov 28, 2024

CLA assistant check
All committers have signed the CLA.

@abrenk
Copy link
Author

abrenk commented Nov 28, 2024

This is an example for the details of a deadlock:

{
    "name": "example-app",
    "status": "DOWN",
    "details": {
        "deadlockedThreads": {
            "name": "example-app",
            "status": "DOWN",
            "details": [
                {
                    "threadId": "60",
                    "threadName": "Thread-0",
                    "threadState": "BLOCKED",
                    "daemon": "false",
                    "priority": "5",
                    "suspended": "false",
                    "inNative": "false",
                    "lockName": "java.lang.Object@7d10b1ca",
                    "lockOwnerName": "Thread-1",
                    "lockOwnerId": "61",
                    "lockedSynchronizers": [],
                    "stackTrace": "app//com.example.Deadlock.lambda$new$0(Deadlock.java:27)\n-  blocked on java.lang.Object@7d10b1ca\n-  locked java.lang.Object@4505ea74\napp//com.example.Deadlock$$Lambda/0x000001906948b360.run(Unknown Source)\njava.base@21/java.lang.Thread.runWith(Thread.java:1596)\njava.base@21/java.lang.Thread.run(Thread.java:1583)\n"
                },
                {
                    "threadId": "61",
                    "threadName": "Thread-1",
                    "threadState": "BLOCKED",
                    "daemon": "false",
                    "priority": "5",
                    "suspended": "false",
                    "inNative": "false",
                    "lockName": "java.lang.Object@4505ea74",
                    "lockOwnerName": "Thread-0",
                    "lockOwnerId": "60",
                    "lockedSynchronizers": [],
                    "stackTrace": "app//com.example.Deadlock.lambda$new$1(Deadlock.java:43)\n-  blocked on java.lang.Object@4505ea74\n-  locked java.lang.Object@7d10b1ca\napp//com.example.Deadlock$$Lambda/0x000001906948b580.run(Unknown Source)\njava.base@21/java.lang.Thread.runWith(Thread.java:1596)\njava.base@21/java.lang.Thread.run(Thread.java:1583)\n"
                }
            ]
        }
    }
}

@abrenk abrenk force-pushed the liveness-probe-deadlocked-threads branch from 9c3c995 to 5e6a158 Compare November 28, 2024 17:00
@graemerocher
Copy link
Contributor

thanks can you target the 4.8.x branch?

@abrenk abrenk force-pushed the liveness-probe-deadlocked-threads branch from 5e6a158 to 51b2b7e Compare November 28, 2024 18:37
@abrenk abrenk changed the base branch from 4.7.x to 4.8.x November 28, 2024 18:38
@abrenk abrenk force-pushed the liveness-probe-deadlocked-threads branch from 51b2b7e to 905204e Compare November 28, 2024 18:40
@abrenk
Copy link
Author

abrenk commented Nov 28, 2024

I have rebased onto the 4.8.x branch.

@abrenk
Copy link
Author

abrenk commented Nov 28, 2024

I will add some documentation to healthEndpoint.adoc tomorrow.

@abrenk abrenk force-pushed the liveness-probe-deadlocked-threads branch from 0eb3cac to efc9a60 Compare November 28, 2024 20:34
@graemerocher graemerocher added the type: enhancement New feature or request label Nov 29, 2024
@abrenk
Copy link
Author

abrenk commented Dec 2, 2024

I have adapted the HealthAggregator unit tests to the new indicator because the ordered list of messages now contains one element more.

Copy link
Contributor

@sdelamo sdelamo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abrenk Thanks for your contribution. I have added a functional test.

@graemerocher
Copy link
Contributor

Seems it is failing the native tests, not sure if this API for this is different with Native Image

@sdelamo
Copy link
Contributor

sdelamo commented Dec 3, 2024

@sdelamo
Copy link
Contributor

sdelamo commented Dec 3, 2024

ThreadMXBean implementation is not supported in Native Image

I have used official Graal SDK ImageInfo::inImageCode to load the health indicator only on JVM.

However, @yawkat pointed me to Jackson Databind's NativeImageUtil as an alternative to check if we are on native image. Personally, I think it is best to use the official API and one less to maintain. A drawback is that I added the org.graalvm.sdk:nativeimage as a transitive dependency of management.

@graemerocher what do you think?

@sdelamo sdelamo requested a review from yawkat December 3, 2024 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

5 participants