Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for startup healthchecks #13909

Merged
merged 1 commit into from
Nov 30, 2022

Conversation

mheon
Copy link
Member

@mheon mheon commented Apr 18, 2022

TODO: Needs to be wired into podman inspect, needs tests written, needs manpages

Add support for startup healthchecks. A startup healthcheck is a healthcheck that runs before the main healthcheck, to accomodate slow-starting containers; inspiration is mostly from K8S startup probes, and we'll be using this as a backend for supporting those in podman play kube once this merges.

Containers can now have startup healthchecks, allowing a command to be run to ensure the container is fully started before the regular healthcheck is activated.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 18, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 18, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 18, 2022
@mheon mheon force-pushed the startup_probe branch 2 times, most recently from 69a3442 to 27b97a1 Compare April 18, 2022 18:12
@edsantiago
Copy link
Member

@mheon this isn't even going to get to tests until you add the new options to man pages (they can be stubs) and add tests or the magic no-new-tests flag

@mheon
Copy link
Member Author

mheon commented Apr 18, 2022

@edsantiago I'm still trying to get it working fully working locally, will get to manpages once that's done

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 26, 2022
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 27, 2022
@mheon mheon force-pushed the startup_probe branch 4 times, most recently from adf94ab to 3551aaa Compare June 27, 2022 19:53
@mheon mheon removed the stale-pr label Jun 27, 2022
@mheon mheon changed the title [WIP] Add support for startup healthchecks Add support for startup healthchecks Jun 27, 2022
@openshift-ci openshift-ci bot added do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 27, 2022
@mheon
Copy link
Member Author

mheon commented Jun 27, 2022

Want to add a few more tests but otherwise fundamentally complete

@mheon
Copy link
Member Author

mheon commented Jun 27, 2022

@edsantiago Can you tell what the documentation errors are saying? I think I've added the relevant bits to the manpages

@@ -451,6 +451,38 @@ The number of retries allowed before a healthcheck is considered to be unhealthy
The initialization time needed for a container to bootstrap. The value can be expressed in time format like
**2m3s**. The default value is **0s**.

#### **-health-startup-cmd**=*"command"* | *'["command", "arg1", ...]'*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing dash: should be --health-etc

A value of **0** means that any success will begin the regular healthcheck.
The default is **0**.

#### **--health-startup-timeout=*timeout*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing asterisks: should be timeout** (timeout, star, star, equals)

@@ -408,6 +408,38 @@ The number of retries allowed before a healthcheck is considered to be unhealthy
The initialization time needed for a container to bootstrap. The value can be expressed in time format like
`2m3s`. The default value is `0s`

#### **-health-startup-cmd**=*"command"* | *'["command", "arg1", ...]'*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

A value of **0** means that any success will begin the regular healthcheck.
The default is **0**.

#### **--health-startup-timeout=*timeout*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@mheon mheon force-pushed the startup_probe branch 4 times, most recently from d0649bc to b964eee Compare November 16, 2022 04:35
@mheon
Copy link
Member Author

mheon commented Nov 16, 2022

Tests finally green. @containers/podman-maintainers PTAL

@rhatdan
Copy link
Member

rhatdan commented Nov 16, 2022

LGTM
@umohnani8 @vrothberg @baude @flouthoc PTAL

@openshift-ci openshift-ci bot added release-note and removed do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Nov 17, 2022
@mheon
Copy link
Member Author

mheon commented Nov 17, 2022

@vrothberg PTAL

@vrothberg
Copy link
Member

@vrothberg PTAL

I only find time reviewing small PRs at the moment as I am OOO. If you want my pair of eyes, I'd be happy to take a look on Tuesday.

@mheon
Copy link
Member Author

mheon commented Nov 17, 2022

Fine by me

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

Would love to see a test or two in the system tests as they are excecuted in gating.

cmd/podman/common/create.go Outdated Show resolved Hide resolved
// consecutive successes.
func (c *Container) incrementStartupHCSuccessCounter(ctx context.Context) {
if !c.batched {
c.lock.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Shouldn't we lock irrespective of if it is batched or not ? We are incrementing SuccessCounter below without any lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A batched command holds a single lock for multiple operations (for performance reasons... guarantees no changes to the container while we're doing a series of operations, so no need to hit the DB to check for updates). As such, locking during batching is guaranteed to deadlock as the lock is already being held.

@rhatdan
Copy link
Member

rhatdan commented Nov 28, 2022

@mheon Sadly this needs a rebase.

Startup healthchecks are similar to K8S startup probes, in that
they are a separate check from the regular healthcheck that runs
before it. If the startup healthcheck fails repeatedly, the
associated container is restarted.

Signed-off-by: Matthew Heon <[email protected]>
@mheon
Copy link
Member Author

mheon commented Nov 28, 2022

Rebased

@mheon
Copy link
Member Author

mheon commented Nov 28, 2022

I think this is ready. Once CI passes, at least.

@rhatdan
Copy link
Member

rhatdan commented Nov 30, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 30, 2022
@openshift-merge-robot openshift-merge-robot merged commit c00d8a2 into containers:main Nov 30, 2022
dfr added a commit to dfr/podman that referenced this pull request Dec 2, 2022
Extra function arguments were added in containers#13909.

[NO NEW TESTS NEEDED]

Signed-off-by: Doug Rabson <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/api-change Change to remote API; merits scrutiny lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants