Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automatic shutdown after inactivity #1590

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mgjm
Copy link
Contributor

@mgjm mgjm commented Aug 15, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

This adds a new option to enable automatic shutdown: After a configurable period of inactivity the server automatically shuts itself down.

The following activity is tracked and prevents shutdown:

  • A connected RPC client
  • A running ReapableChild of the child reaper (= a running container)
  • A running cleanup process

The new FD socket server also shuts itself down after 3 seconds of inactivity (= no connected clients).

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

The FD socket inactivity period is hardcoded as 3 seconds. This is enough time for the client to connect to the FD socket path, after the client invoked the StartFdSocket RPC method. This could also be configurable.

Does this PR introduce a user-facing change?

Add an option to enable automatic shutdown after a configurable period of inactivity

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 15, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mgjm
Once this PR has been reviewed and has the lgtm label, please assign haircommander for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


use tokio::sync::{futures::Notified, Notify};

/// Track activity and reacto to inactivity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Track activity and reacto to inactivity.
/// Track activity and react to inactivity.

value_name("DELAY"),
)]
/// Automatically stop conmon-rs after DELAY seconds of inactivity. (0 = disabled)
shutdown_delay: f64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this something like Option<humantime::Duration>

Copy link
Contributor Author

@mgjm mgjm Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... the string representation of a go time.Duration is not compatible with the humantime::Duration parser. But we could write a custom "to string" method in go. Do we expect the conmon server to be invoked by humans or only by the go client?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should only support go client

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And even for humans it is still relatively ergonomic to specify the duration in (fractional) seconds.
1.5 vs 1s 500ms
Especially considering that I expect the most common durations to be in the range of a few full seconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is specifying milliseconds really a use case? I'm also fine with using seconds only.

@@ -95,6 +95,11 @@ type ConmonServerConfig struct {

// Tracing can be used to enable OpenTelemetry tracing.
Tracing *Tracing

// ShutdownDelay an be used to automatically stop the conmonrs server after DELAY seconds of inactivity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document what inactivity means? Like, which actions are considering conmon-rs to be active?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what inactivity means.

What use cases is this feature trying to solve?

@mgjm
Copy link
Contributor Author

mgjm commented Aug 15, 2023

I just realised that the go client does not prevent the server from shutting itself down if no containers are running.

The go client creates a new connection for each invoked method. Therefore in between method calls no connection is kept open and the server could shut itself down if no containers are running. A ConmonClient would then start to return errors in future method calls.

This won't happen with podman (podman itself is short lived) and CRI-O manages the conmon lifetime itself (does not need/use the automatic shutdown).

But nevertheless I would expect that the ConmonClient keeps the server alive.

Is there a reason we open a new connection for each RPC call? If not, the solution would be open a connection in the ConmonClient constructor and always use that connection.

@saschagrunert
Copy link
Member

Is there a reason we open a new connection for each RPC call? If not, the solution would be open a connection in the ConmonClient constructor and always use that connection.

I think the main intention was for simplicity. If the connection dies then the client would have to create a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants