-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add docker healthchecks to server and ml #9583
Conversation
I think this is a great idea. Currently I run healthchecks by using a static curl binary mounted inside the container. A built in healthcheck would be great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation! I like the idea of using scripts to avoid adding dependencies.
immich-microservices
doesn't have a separate codebase; it shares the same codebase as the server. It's also just been removed, so no need to worry about it :)
server/src/utils/healthcheck.ts
Outdated
const http = require("http"); | ||
|
||
const options = { | ||
host: "localhost", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use the /api/server-info/ping
endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and we should probably also assert on the response body, not just the status code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expected value
I first tested with a wget call for a successful call to /api/server-info/ping
gave. It responded with this:
2024-05-19 20:47:35 URL: http://localhost:3001/api/server-info/ping 200 OK
Next I attempted to read that data inside the healthcheck.ts file
What I tried
I'm having some trouble using node.js
http.request()
function for the purpose of getting the response body. Looking at their docs. It seems something like this should work. When placed inside the callback function of http.request()
:
res.on('data', (chunk) => {
console.log(`BODY: ${chunk}`);
});
but this never gets triggered. Not sure what I'm doing wrong here. If anyone is familiar with http.request(), I need some help on reading the body. 🙇
Alternative
I didn't see any http requesting packages. So, I would like to make http.request() work here.
This only other useful piece of data I saw in the response was statusMessage
. Which has the value of "OK". This seems to be like it could be used as a replacement for the body. Any thoughts on this idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use fetch instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nice fetch does work. Seems I have to do a weird signal
thing to implement a timeout 😮💨
But I was able to read the body this way.
machine-learning/app/healthcheck.py
Outdated
port = os.getenv("IMMICH_PORT", 3003) | ||
|
||
try: | ||
response = requests.get(f"http://localhost:{port}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly /ping
here?
machine-learning/app/healthcheck.py
Outdated
port = os.getenv("IMMICH_PORT", 3003) | ||
|
||
try: | ||
response = requests.get(f"http://localhost:{port}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Node.js health check has a timeout set, but this one doesn't. They should probably be the same.
Looks like you need to format the node script through Prettier and the Python script through ruff. |
I have drafted some documentation. According to the PG docs pg_isready can return values other than 0 or 1. Can we add a I see some threads saying redis-cli may have the same issue, may be smart to add the same there. I personally end all of my healthchecks (not just Immich) with |
I believe you can fix the |
May as well add it to the Python script too for consistency (except |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rebase on main to fix the status checks?
I'm also thinking we should probably document the existence of these healthchecks somewhere, though I'm not sure where would be a good place at the moment 🤔
Dockerfiles have a HEALTHCHECK instruction, we should add these endpoints into there so that by default everyone gets healthchecks. I don't see a reason anyone wouldn't want them and if they specifically didn't they could override and disable them. @CodaBool |
@zackpollard I agree with adding it to the |
I have added an enhanced PG healthcheck in #9590 that uses database checksums. My preference would still be to modify that PR into one that just adds the checks to the docker-compose.yml |
I'm happy either way |
Happy to add that into the docker compose and have our own ones directly in the dockerfiles |
server/src/utils/healthcheck.ts
Outdated
if (response.ok) { | ||
const body = await response.json(); | ||
if (body.res === 'pong') { | ||
console.log('Server is running'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this write to the docker logs every time the HC is run (default 30 seconds)? Is that desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing a watch on the immich_server
container for 2 minutes and nothing showed up (healthchecks should by default happen every 30 seconds). There is no other logs than the container's stdout right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, interesting. I guess that should be fine then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo just remove it anyway, if it doesn't get written anywhere then there's no point having it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed 2 of the 3 console calls. I left the last one in because there is a very minor usefulness to them. If healthchecks are failing when they shouldn't be you could exec in and run the file with node. Then look at the output for debugging.
Let me know what you think. I think either way is fine. Docker is likely redirecting stdout to dev/null anyways but that's speculation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for making all these changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really important here, but for future reference, healthcheck logs are captured separately by docker and listed in the container status next to healthcheck status changes.
🐋 What is a docker healthcheck
the docker spec has a way to determine if a service is healthy. This is useful for servers. The command
docker ps
can be used to get a quick summary of if a container is responding positively.Platforms like AWS have built in integration for this spec and show the health status in their UI when running containers in ECS Fargate.
There probably are other benefits but I only know of those. I personally like to use the
lazydocker
cli to manage my docker services. Which also displays if they are passing healthchecks🏗️ My solution
One thing to know about the docker healthcheck is it uses CLI on the image to perform it. This can be tricky to accomplish in an era of distroless images. The images that Immich puts out seem to be in that category (which is a good thing). The following common CLI that allow docker healthchecks to easily be done are absent on the image for both
immich_server
andimmich_machine_learning
curl
wget
netcat
ornc
pgrep
This means to support docker healthchecks you have to go one of two routes. Add wget from the busybox image. Or use a programming language to script a healthcheck command.
If the end image has
node
orpython
, like is the case here, then the later is the most slim and secure way to do this.So, that's what I've added for the two images
🧪 Tests
Lint
I ran
npm run lint
but got the following errorHealthchecks do need to follow the Docker spec and return exactly what Docker wants.
So, since using
process.exit()
is a requirement. I added this to the ignore patterns of eslint instead of resolving it.Tests
Something I haven't tested
Healthchecks by default will run every 30 seconds. This in some apps could cause an excess of logs. I did not check for this. If anyone wants to give that concern a look. That would be appreciated.
😶🌫️Keep Optional
It would be possible to add these to the installation
docker-compose.yml
file. But I don't think it would be good to force this out (by adding it to/docker/docker-compose.yml
). It can always be something that people optionally add onto their compose file if they want it (it's more for people who know what they are doing anyways). For example postgres and redis have a built in way to do healthchecks. Which I'm adding to my docker compose file.I guess something in the docs would be useful for this. But curious what the maintainers think on this.