Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add docker healthchecks to server and ml #9583

Merged
merged 12 commits into from
May 22, 2024

Conversation

CodaBool
Copy link
Contributor

@CodaBool CodaBool commented May 18, 2024

🐋 What is a docker healthcheck

the docker spec has a way to determine if a service is healthy. This is useful for servers. The command docker ps can be used to get a quick summary of if a container is responding positively.

Platforms like AWS have built in integration for this spec and show the health status in their UI when running containers in ECS Fargate.

There probably are other benefits but I only know of those. I personally like to use the lazydocker cli to manage my docker services. Which also displays if they are passing healthchecks

lazy

🏗️ My solution

One thing to know about the docker healthcheck is it uses CLI on the image to perform it. This can be tricky to accomplish in an era of distroless images. The images that Immich puts out seem to be in that category (which is a good thing). The following common CLI that allow docker healthchecks to easily be done are absent on the image for both immich_server and immich_machine_learning

common commands used for Docker healthchecks

  • curl
  • wget
  • netcat or nc
  • pgrep

This means to support docker healthchecks you have to go one of two routes. Add wget from the busybox image. Or use a programming language to script a healthcheck command.

If the end image has node or python, like is the case here, then the later is the most slim and secure way to do this.

So, that's what I've added for the two images

🧪 Tests

Lint

I ran npm run lint but got the following error

ignore_lint

Healthchecks do need to follow the Docker spec and return exactly what Docker wants.

0: success - the container is healthy and ready for use
1: unhealthy - the container isn't working correctly
2: reserved - don't use this exit code

So, since using process.exit() is a requirement. I added this to the ignore patterns of eslint instead of resolving it.

Tests

passing
pass

Something I haven't tested

Healthchecks by default will run every 30 seconds. This in some apps could cause an excess of logs. I did not check for this. If anyone wants to give that concern a look. That would be appreciated.

😶‍🌫️Keep Optional

It would be possible to add these to the installation docker-compose.yml file. But I don't think it would be good to force this out (by adding it to /docker/docker-compose.yml). It can always be something that people optionally add onto their compose file if they want it (it's more for people who know what they are doing anyways). For example postgres and redis have a built in way to do healthchecks. Which I'm adding to my docker compose file.

I guess something in the docs would be useful for this. But curious what the maintainers think on this.

An example of what my compose file looks like with healthchecks

services:
  immich-server:
    ...
    healthcheck:
      test: npm run healthcheck

  immich-microservices:
    ...
      # I don't even know where this code is. So, I didn't add a healthcheck

  machine-learning:
    ...
    healthcheck:
      test: python3 healthcheck.py

  redis:
    ...
    healthcheck:
      test: redis-cli ping

  database:
    ...
    healthcheck:
      test: ["CMD-SHELL", "pg_isready", "-d", "${DB_DATABASE_NAME}"]

@mmomjian
Copy link
Contributor

I think this is a great idea. Currently I run healthchecks by using a static curl binary mounted inside the container. A built in healthcheck would be great.

Copy link
Contributor

@mertalev mertalev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation! I like the idea of using scripts to avoid adding dependencies.

immich-microservices doesn't have a separate codebase; it shares the same codebase as the server. It's also just been removed, so no need to worry about it :)

const http = require("http");

const options = {
host: "localhost",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to use the /api/server-info/ping endpoint?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and we should probably also assert on the response body, not just the status code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected value

I first tested with a wget call for a successful call to /api/server-info/ping gave. It responded with this:

2024-05-19 20:47:35 URL: http://localhost:3001/api/server-info/ping 200 OK

Next I attempted to read that data inside the healthcheck.ts file

What I tried

I'm having some trouble using node.js http.request() function for the purpose of getting the response body. Looking at their docs. It seems something like this should work. When placed inside the callback function of http.request():

res.on('data', (chunk) => {
  console.log(`BODY: ${chunk}`);
});

but this never gets triggered. Not sure what I'm doing wrong here. If anyone is familiar with http.request(), I need some help on reading the body. 🙇

Alternative

I didn't see any http requesting packages. So, I would like to make http.request() work here.

This only other useful piece of data I saw in the response was statusMessage. Which has the value of "OK". This seems to be like it could be used as a replacement for the body. Any thoughts on this idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use fetch instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice fetch does work. Seems I have to do a weird signal thing to implement a timeout 😮‍💨

But I was able to read the body this way.

port = os.getenv("IMMICH_PORT", 3003)

try:
response = requests.get(f"http://localhost:{port}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly /ping here?

port = os.getenv("IMMICH_PORT", 3003)

try:
response = requests.get(f"http://localhost:{port}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Node.js health check has a timeout set, but this one doesn't. They should probably be the same.

@mertalev
Copy link
Contributor

Looks like you need to format the node script through Prettier and the Python script through ruff.

@mmomjian
Copy link
Contributor

mmomjian commented May 19, 2024

I have drafted some documentation. According to the PG docs pg_isready can return values other than 0 or 1. Can we add a || exit 1 to that test?

I see some threads saying redis-cli may have the same issue, may be smart to add the same there.

I personally end all of my healthchecks (not just Immich) with || exit 1.

server/.eslintrc.js Outdated Show resolved Hide resolved
server/package.json Outdated Show resolved Hide resolved
server/src/utils/healthcheck.ts Show resolved Hide resolved
server/src/utils/healthcheck.ts Outdated Show resolved Hide resolved
@CodaBool
Copy link
Contributor Author

CodaBool commented May 19, 2024

Thanks for the help everyone. I have implemented your suggestions. With the exception of @bo0tzz. I was not able to find a way to read the response body with node.js http.request() callback. I left a reply to your comment with more details

@mertalev
Copy link
Contributor

I believe you can fix the process.exit linting error by adding #!/usr/bin/env node to the top of the file.

@mertalev
Copy link
Contributor

May as well add it to the Python script too for consistency (except python3 instead of node).

bo0tzz
bo0tzz previously requested changes May 21, 2024
Copy link
Member

@bo0tzz bo0tzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase on main to fix the status checks?

I'm also thinking we should probably document the existence of these healthchecks somewhere, though I'm not sure where would be a good place at the moment 🤔

@zackpollard
Copy link
Contributor

zackpollard commented May 21, 2024

Can you rebase on main to fix the status checks?

I'm also thinking we should probably document the existence of these healthchecks somewhere, though I'm not sure where would be a good place at the moment 🤔

Dockerfiles have a HEALTHCHECK instruction, we should add these endpoints into there so that by default everyone gets healthchecks. I don't see a reason anyone wouldn't want them and if they specifically didn't they could override and disable them. @CodaBool

@jrasm91
Copy link
Contributor

jrasm91 commented May 21, 2024

@zackpollard @bo0tzz #9590

@mmomjian
Copy link
Contributor

@zackpollard I agree with adding it to the Dockerfile. That won't cover Redis or Postgres. One option (my preference) is to add it to the docker-compose.yml for those two services (non-breaking). Thoughts?

@mmomjian
Copy link
Contributor

mmomjian commented May 21, 2024

I have added an enhanced PG healthcheck in #9590 that uses database checksums. My preference would still be to modify that PR into one that just adds the checks to the docker-compose.yml

@CodaBool
Copy link
Contributor Author

One option is to add it to the docker-compose.yml

I'm happy either way

@zackpollard
Copy link
Contributor

I have added an enhanced PG healthcheck in #9590 that uses database checksums. My preference would still be to modify that PR into one that just adds the checks to the docker-compose.yml

Happy to add that into the docker compose and have our own ones directly in the dockerfiles

@mmomjian
Copy link
Contributor

I have added an enhanced PG healthcheck in #9590 that uses database checksums. My preference would still be to modify that PR into one that just adds the checks to the docker-compose.yml

Happy to add that into the docker compose and have our own ones directly in the dockerfiles

#9590 updated.

if (response.ok) {
const body = await response.json();
if (body.res === 'pong') {
console.log('Server is running');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this write to the docker logs every time the HC is run (default 30 seconds)? Is that desired?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a watch on the immich_server container for 2 minutes and nothing showed up (healthchecks should by default happen every 30 seconds). There is no other logs than the container's stdout right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, interesting. I guess that should be fine then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo just remove it anyway, if it doesn't get written anywhere then there's no point having it

Copy link
Contributor Author

@CodaBool CodaBool May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed 2 of the 3 console calls. I left the last one in because there is a very minor usefulness to them. If healthchecks are failing when they shouldn't be you could exec in and run the file with node. Then look at the output for debugging.

Let me know what you think. I think either way is fine. Docker is likely redirecting stdout to dev/null anyways but that's speculation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for making all these changes!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really important here, but for future reference, healthcheck logs are captured separately by docker and listed in the container status next to healthcheck status changes.

@zackpollard zackpollard dismissed bo0tzz’s stale review May 22, 2024 16:34

changes were made

@zackpollard zackpollard changed the title feat(server, ml) add ability to docker healthcheck feat(server, ml): add ability to docker healthcheck May 22, 2024
@zackpollard zackpollard changed the title feat(server, ml): add ability to docker healthcheck feat: add ability to docker healthcheck May 22, 2024
@zackpollard zackpollard changed the title feat: add ability to docker healthcheck feat: add docker healthchecks to server and ml May 22, 2024
@zackpollard zackpollard enabled auto-merge (squash) May 22, 2024 16:48
@zackpollard zackpollard merged commit 6a4c2e9 into immich-app:main May 22, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants