Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Healthcheck" for image config #749

Open
vbatts opened this issue Aug 28, 2018 · 34 comments
Open

"Healthcheck" for image config #749

vbatts opened this issue Aug 28, 2018 · 34 comments

Comments

@vbatts
Copy link
Member

vbatts commented Aug 28, 2018

In https://github.com/opencontainers/image-spec/blob/master/config.md#properties
There is a healthcheck property carrying over from docker's HEALTHCHECK.
It is commonly seen and I've now been asked at a couple of conferences howcome OCI image spec does not account for it.

@runcom
Copy link
Member

runcom commented Aug 28, 2018

I believe it was because many people thought it was incorrect to put application-specific health checks in the image itself, rather, they wanted them defined elsewhere. There must be a docker issue discussing it (maybe @thaJeztah knows)

@thaJeztah
Copy link
Member

Hm, not sure about an issue discussing health checks, but I can see this being application-specific, so not part of the specs for that reason. Possibly @stevvooe knows more about that

@hairyhenderson
Copy link

To play devil's advocate - how is Healthcheck application-specific, but ExposedPorts not application-specific?

What does "application-specific" mean in this context?

@runcom
Copy link
Member

runcom commented Aug 28, 2018

To play devil's advocate - how is Healthcheck application-specific, but ExposedPorts not application-specific?

the thing was: let's not add another application-specific thing to the spec, no one said exposedports is not application specific :)

@hairyhenderson
Copy link

@runcom ah - a nuance that was unclear 😉

@vbatts
Copy link
Member Author

vbatts commented Aug 29, 2018

It seems like a generally useful check. Liveness is just that the pid is up, but a health check could be applied across most container runtime use-cases.

@mgoltzsche
Copy link

mgoltzsche commented Oct 4, 2018

+1 for healthcheck. An image provides enough information to derive a runnable bundle from it - with additional defaults of the runtime. I like this approach because it follows the convention over configuration principle. A healthcheck in an OCI image would be a useful optional default value for a bundle as well.
At least for services the requirement to configure a healthcheck is pretty common. A healthcheck is indeed application-specific but if you don't support it users have to provide the same healthcheck in another configuration again and again when they cannot provide it with the image.
Also an OCI image cannot be fully converted from a docker image without the healthcheck information.

@stevvooe
Copy link
Contributor

While healthcheck is a part of docker images, I don't think the design has proven to be very effective. We've had a number of bugs in docker due to the design of healthcheck. Also, since healthchecks are executed so close to the running container, the actual utility is very low.

If we are looking for terminology to divide this, the port is specific to the running service (a bound port on a process), while the behavior of the healthcheck is going to be dependent on the infrastructure (network, disks, etc.). In general, we can guarantee that a port gets exposed with much higher certainty than disks or network are available to run a service. Even with that argument, there could be an argument that ports shouldn't live in the image, either.

In practice, the methodology used by Kubernetes has proven that healthchecks belongs with the orchestration or infrastructure environment. In the case of kubernetes, they integrate with the load balancing system to ensure liveness/readiness when serving traffic. The healthchecks there are designed with the semantics of that system and work well in that context.

However, if we want to adopt healthchecks from Docker images, the impact is minimal. It can be done in a backwards compatible way and doesn't require a major version change to OCI. Hence, we don't have to rev the mediatype. While I am not a huge fan of the Docker image healthchecks, I don't see harm in importing the design if that is what people want to use.

@cyphar
Copy link
Member

cyphar commented Oct 11, 2018

I agree with @stevvooe on both points -- that the utility of healthcheck information in an image is quite low, but that importing it is acceptable if that's what people want. However, I would like that any language that includes a healthcheck annotation or section in Config explain the problems with it (to ensure implementors decide whether they want to support it and whether they should look at better solutions to the problem).

@reitzig
Copy link

reitzig commented Sep 19, 2019

I'm not sure about Docker's implementation of "healtch checks"; I would agree that they are less useful than it might seem.

What I miss most often is some form of "ready" flag. Take your average microservice: between the container being "up" and the application running and being able to serve requests -- let's call this state "ready" -- significant amounts of time can pass. I'm usually interested in waiting for "ready", be it manually, in docker-compose settings (depends_on), or provisioning containers with Ansible and such.

While "health checks" as implemented in Docker can be used to do that (if implemented with consistent semantics), none of the tools I've used include using it, which may just have to do with it being a non-standard feature.

@bademux
Copy link

bademux commented Feb 11, 2020

Nice to have when use ECS

@vbatts
Copy link
Member Author

vbatts commented Apr 2, 2021

I'm closing this as the discussions properly capture the backstory on why healthchecks we're not carried forward

@gvenzl
Copy link

gvenzl commented Sep 3, 2021

I agree with @reitzig that an application readiness command instruction or similar would be very useful and sometimes is necessary to guarantee the correct order of orchestration. Not knowing the history nor the intent of the Docker HEALTHCHECK instruction myself, I still can say that it is often necessary to know whether the application inside a container is (still) ready to service requests. A container is relatively quick up and running but as @reitzig says, between that and the actual application inside the container being ready to service requests, a lot of time can pass.

I am the author of images that provide a free Oracle Database XE inside containers: https://hub.docker.com/r/gvenzl/oracle-xe
I can tell you that I do get the question of how somebody can tell whether the database is already up and running or still up and running all the time, especially in the CI/CD environment where the container is used for regression testing and those tests cannot be kicked off before the database is ready.

Even GitHub Actions are leveraging Docker health checks to determine whether a container is ready to service requests before executing the rest of the actions pipeline, see for example the Postgres Service Container:

 # Set health checks to wait until postgres has started
  options: >-
    --health-cmd pg_isready
    --health-interval 10s
    --health-timeout 5s
    --health-retries 5

Perhaps health checks were never intended to be used for such a purpose but they certainly are abused for that. However, right now, there is no way to accomplish something like that with OCI based images. It would be great if an image provider (like myself) could also provide a way to tell the user (and orchestration layer) whether the software inside the container is (still) ready to service requests.

@reitzig
Copy link

reitzig commented Sep 3, 2021

In practice, the methodology used by Kubernetes has proven that healthchecks belongs with the orchestration or infrastructure environment.

I don't know about you, but I find that liveness/readiness checks require knowledge about what's running in the container, to the point of very tight coupling. So while the semantics and use of the checks is rightfully defined by Kubernetes, I usually find the requests to parrot something the containered application exposes. A standardized interface, if there is a feasible one, in between would make things simpler (as in, less coupled).

@gvenzl
Copy link

gvenzl commented Sep 3, 2021

If we are looking for terminology to divide this, the port is specific to the running service (a bound port on a process), while the behavior of the healthcheck is going to be dependent on the infrastructure (network, disks, etc.). In general, we can guarantee that a port gets exposed with much higher certainty than disks or network are available to run a service. Even with that argument, there could be an argument that ports shouldn't live in the image, either.

Actually, reading up on the Docker documentation, I don't think that Docker classifies healthcheck as an infrastructure check either:

The HEALTHCHECK instruction tells Docker how to test a container to check that it is still working. This can detect cases such as a web server that is stuck in an infinite loop and unable to handle new connections, even though the server process is still running.

@kiview
Copy link

kiview commented Sep 20, 2021

I'd like to add my point of view as a maintainer of the testcontainers-java library (https://github.com/testcontainers/testcontainers-java/), a library for instrumenting containers as part of integration tests and in which we support the usage of HEALTHCHECK to wait for a container to be in a usable state. Please note that this is my personal opinion and it should not be considered the official opinion of the Testcontainers projects.

While a lot of the argumentation in this issue is very reasonable, I have a feeling this is mostly informed by considering the operation of containers, specifically by orchestrators such as k8s. However, the use case that has also been outlined by @gvenzl, e.g. using it to start dependent applications for development, testing, and CI/CD pipelines, is a widespread use case in the developer community.

With Testcontainers, we provide a feature, that allows users to define the healthcheck that indicates the readiness of the application inside the container as part of our API (we call them WaitStrategies). While this is easily possible if you are the developer of the used image and have knowledge about the implementation of the application and stable indicators of readiness, this becomes very hard if you are simple a user of a readily available image, e.g. any of the official database images out there (like Oracle or PostgreSQL, as mentioned by @gvenzl). In my opinion, the developers of those images are in a much better position to formulate and implement a healthcheck, as compared to the user.

Also note, that while Testcontainers supports user-defined WaitStrategies, we also support the usage of the Docker HEALTHCHECK as the wait strategy. This means, that this might work for users using Docker for testing, while it will fail for users that use different container engines, thereby reducing interoperability and in general making it much harder for other vendors to develop "drop-in replacements" for Docker.
In addition, a HEALTHCHECK as maintained by the image authors is a very stable integration point that can be automatically updated as part of image updates, something which is not easily done if users have to formulate their own custom checks, that can end up as very brittle integration points, e.g. binding themselves to easily observable indicators of readiness such as certain log messages.

I will completely respect the decision of the specification authors, but I hope I could add some additional context for the usefulness of HEALTHCHECK in non-ops use cases.

@gvenzl
Copy link

gvenzl commented Sep 30, 2021

Hey @vbatts, with the additional context given, could we get this issue reopened again and continue the discussion?

@vbatts
Copy link
Member Author

vbatts commented Sep 30, 2021

fair!
I'm personally not against it. I dislike the format that the one from docker ended up with, being that the healthcheck effective must exist as a CMD or CMD-SHELL inside the container, rather that being say an ["HTTP", "<port>", "/healthcheck", 200] or similar that the inspector/runtime/orchestrator can call at the container to see it it running, but that might could evolve.
Please consider opening a PR.

@thaJeztah @justincormack might we work from the definition in https://github.com/moby/moby/blob/master/image/spec/v1.2.md ?

@vbatts vbatts reopened this Sep 30, 2021
@justincormack
Copy link

Yeah, tend to agree that shell was not a great idea as it requires an exec into the container. We could add an http method into Docker. Interop is clearly an issue as nothing right now except Docker supports this now.

@blargism
Copy link

blargism commented Oct 1, 2021

TL;DR: Yes, HEALTHCHECK is inelegant, but it helps in situations were doing things "properly" and "elegantly" is not possible or simple.

So in practical terms, and I'm probably "doing it wrong", it is helpful when I have something that wants to run as a systemd process. It is simple to have something that runs periodically to make sure that thing is still up and going.

While this isn't an optimal situation, it's also a good tool to throw something into a container that wasn't specifically designed to be in a container. It lets me know that everything is working properly in an un-opinionated way. I can write a shell script that checks exactly what I need it to check from the confines of the container. I can search logs, run a loopback request to the service, whatever the specific application calls for.

It's not an elegant solution, but it's a solution that allows me to deal with inelegant situations.

@jonjohnsonjr
Copy link
Contributor

I'll add another weak argument in favor: it would be nice to have a minimal dependency implementation of the healthcheck parser for things like bazelbuild/rules_docker#1742

@gvenzl
Copy link

gvenzl commented Oct 19, 2021

Yeah, tend to agree that shell was not a great idea as it requires an exec into the container. We could add an http method into Docker. Interop is clearly an issue as nothing right now except Docker supports this now.

Perhaps a good compromise could be to allow other methods than CMD to be passed on as a parameter as @vbatts outlined above (["HTTP", "<port>", "/healthcheck", 200]) but still default to CMD for interoperability purposes with Docker?

@gvenzl
Copy link

gvenzl commented Oct 17, 2022

Hey @vbatts et al,

It has been pretty much exactly a year since the last update on this issue, so I wanted to bounce it up again.

I would still love to have such a healtcheck functionality that I can use to signal when my containers are ready to use.
Hopefully, such a feature will make it into the spec and downstream container runtimes soon! :)

Thanks,

@reitzig
Copy link

reitzig commented Dec 28, 2022

Question: Should we first talk about extending the runtime-spec?
Meaning that, maybe the actual problem is that status: running is not really fine-grained enough to represent application state.

That said, I would understand if the answer was: "that would muddle layers".

Still, it seems to me that container states and health/readiness checks are linked on a conceptual level.

@gvenzl
Copy link

gvenzl commented Feb 5, 2023

I'm fine either way, as long as this issue starts to be looked at. I still think that just providing an infrastructure-like check of "the container is up" is pretty useless to the ones wanting to consume the container. It's like having a website and saying the server is up, but not checking whether the webserver itself is servicing HTTP requests.
How do you propose to proceed?

@gvenzl
Copy link

gvenzl commented Jul 9, 2023

Hey @reitzig, @vbatts, et al,

Just wanted to check in on this again.
I still think it would be great to have some way of registering and running a healthcheck that reports the state of the software inside the container.

@vbatts
Copy link
Member Author

vbatts commented Jul 10, 2023 via email

@tianon
Copy link
Member

tianon commented Jul 10, 2023

It not being implemented by any other runtime/orchestrator (and k8s having several separate but related concepts) still makes me wonder if it's really appropriate to include in the specification documenting interoperability. 😅

@tianon
Copy link
Member

tianon commented Jul 10, 2023

In other words: I'm (soft, not blocking) opposed both because it's not been implemented by anything but Docker after all these years and because I think it's probably not been specifically because it was found not to be sufficient for the use case of checking container state (hence k8s having several related concepts to track this intent more granularly instead of using this one).

@gvenzl
Copy link

gvenzl commented Jul 22, 2023

Hey @vbatts,

Thanks a lot for your response!

I'm happy to try but I am a total beginner in this space, just a consumer of the tech not really anything like an expert when it comes to the internals. So I'd have to kindly ask for your guidance in that matter, right now I don't even know where to start with the PR.

As to the @tianon comments, I dug around for K8s specifically, and yes, K8s uses liveness probes to accomplish the same but there was also a discussion to support specifically the Docker HEALTHCHECK features (see kubernetes/kubernetes#25829) which seems to have just gotten stale and was eventually closed. From what I'm reading in between the lines, K8s had liveness probes before Docker introduced HEALTHCHECK, if that was the other way around, they may have settled just for that, but that's just pure speculation at this point.

There is one more issue (kubernetes/kubernetes#50703) where they actively deactivated the Docker HEALTHCHECK not to interfere with the liveness probes but note their comment for the desire to support it eventually:

kubernetes/kubernetes#50703 (comment)

However, I couldn't find any issue where they actually state the desire for adding the support, so I guess for K8s folks this use case is solved with liveness probes.

Perhaps HEALTHCHECK isn't the right solution here but there still seems to be a desire for container runtimes to provide some sort of mechanism to verify the health of a container, whether that's k8s via liveness probes or Docker with HEALTHCHECK.

One could argue that having such a mechanism also defined for opencontainers may be worthwhile and valuable to the end users even if it ends up being different than to what Docker did.

@Trim
Copy link

Trim commented Mar 24, 2024

As a container image author, I find useful to be able to inform users about the default health check method as I'm able to provide they the default command to run the service of the container.

This meaning is especially visible when I define the health check with the HEALTHCHECK CMD ["myapp", "run", "some", "test"] style just aside the CMD ["myapp", "run"].

Yeah, tend to agree that shell was not a great idea as it requires an exec into the container. We could add an http method into Docker. Interop is clearly an issue as nothing right now except Docker supports this now.

Perhaps a good compromise could be to allow other methods than CMD to be passed on as a parameter as @vbatts outlined above (["HTTP", "<port>", "/healthcheck", 200]) but still default to CMD for interoperability purposes with Docker?

I like this idea as it will help to inform about how to check the application state without having to integrate the tools inside the container itself. For the "HTTP" idea above, it would be better to be able to define the "<http method>", to send for example a HEAD request and close when the HTTP headers are received.

mguaylam added a commit to mguaylam/communautofinder_telegrambot that referenced this issue Jul 12, 2024
OCI does not support HEALTHCHECK yet : opencontainers/image-spec#749 might need to try docker format : containers/podman#18904
mguaylam pushed a commit to mguaylam/communautofinder_telegrambot that referenced this issue Jul 14, 2024
- Change dependencies.
- Change library.
- Adding logging.
- Add healthcheck.
- Provide environment thru os env.
- Create Buildah pipeline : OCI does not support HEALTHCHECK yet : opencontainers/image-spec#749 forced to docker format : containers/podman#18904
- Update README.
@AlexBaranowski
Copy link

With startup, readiness and liveness probes being much more fine-grained and granular, HEALTHCHECK seems like a relic of the past.

Users could be informed of potential probes with a simple README that most popular registries support.

Helm and other solutions also support adding the probes out of the box.

Also, the documentation label can be used to inform users where to find documentation for the image and how to write probes/healthchecks.

Finally, since even Docker builds OCI images by default, it is literally obsolete and IMO should be marked as such.

@sudo-bmitch
Copy link
Contributor

sudo-bmitch commented Aug 15, 2024

With startup, readiness and liveness probes being much more fine-grained and granular, HEALTHCHECK seems like a relic of the past.

Docker still supports the healthcheck, and the other Kubernetes specific probes are not part of the image config today.

Users could be informed of potential probes with a simple README that most popular registries support.

Helm and other solutions also support adding the probes out of the box.

That's necessary because Kubernetes doesn't have a way to read the values from the image today.

Also, the documentation label can be used to inform users where to find documentation for the image and how to write probes/healthchecks.

Finally, since even Docker builds OCI images by default, it is literally obsolete and IMO should be marked as such.

Docker's OCI images will include the healthcheck if defined. Reserving and documenting a field that is in active use by a popular implementation avoids having it treated as an undefined field that other implementations could use in an incompatible way.

Note that since the original issue was opened, #817 added this as a reserved field.

In the long run, I think it would be useful to define these sorts of checks in the OCI image-spec rather than having competing implementation specific approaches.

@thaJeztah
Copy link
Member

thaJeztah commented Aug 15, 2024

Slightly off-topic, but definitely related, and to expand on @sudo-bmitch comment above. (apologies upfront for lengthy comment)

As some here know, the OCI image specification was based on Docker's image "specification" and most parts were adopted by the OCI specs.

Some parts were either too specific for Docker, or not necessary for interoperability, and for that reason not included in the OCI specification. Backward (and forward) compatibility was essential for the OCI specs to succeed, so
instead of including those parts in the OCI specification, the spec allows for implementations to have extensions; healthchecks being one of those.

I know that historically there's been confusion around "docker images are not OCI images", which I would consider both "correct" and "incorrect"; the Docker image specification defines a superset of OCI images; docker images are OCI images, making use of the OCI's option to use extension fields (those that the OCI chose not to adopt).

Most of those extensions are not essential, which means that for implementations that decided not to implement them, Docker images are OCI images (in some cases with a different mediatype).

However, for implementations that do want to implement the extension fields, Docker's image "specification" was hard to find; it was captured in a subdirectory of the docker repository (https://github.com/moby/moby/tree/v20.10.8/image/spec), and the (Golang) implementation was integral part of the code base without a canonical definition (fun fact; at some point the actual implementation lived in the BuildKit repository).

I also not unintentionally put "specification" in quotes; while there were attempts to define the specification (linked above), those documents being part of the Docker code base itself meant that things evolved organically. Large parts were ill-defined other than "the implementation IS the specification", and tracing back history not all changes having found their way into the specs itself.

As specifications go; all credits to the OCI maintainers (past and current) for transforming the above into an actual specification.

To cut a long story short; we recently published Docker's image spec as a separate repository (and Go module); https://github.com/moby/docker-image-spec

The repository was extracted from the Docker code base; we tried to preserve history where possible; some trickery was needed to amend / backfill omissions (implementation changes not reflected in the specs).

In addition to the specs markdown, we also added a Go implementation, mirroring the same structure as the OCI image specs (which could (potentially) be used as a drop-in(ish) replacement - names of types differ though); https://github.com/moby/docker-image-spec/tree/v1.3.1/specs-go

The module is implemented using the OCI types; it uses Go's ability to compose types, using wrapper structs to compose the OCI types and the extension fields; https://github.com/moby/docker-image-spec/blob/v1.3.1/specs-go/v1/image.go#L12

It's not perfect (far from), but a best effort to make the content discoverable, and in a more consumable way, and I hope it helps those looking to implement parts that were not included in the OCI specs itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests