Skip to content
This repository has been archived by the owner on Jul 30, 2024. It is now read-only.

shutdown and startup scripts #8

Closed
rmkraus opened this issue May 6, 2020 · 0 comments
Closed

shutdown and startup scripts #8

rmkraus opened this issue May 6, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@rmkraus
Copy link
Member

rmkraus commented May 6, 2020

Make the cluster easier to safely start and stop.

Reference:
https://www.openshift.com/blog/enabling-openshift-4-clusters-to-stop-and-resume-cluster-vms

@rmkraus rmkraus added the enhancement New feature or request label May 6, 2020
rmkraus added a commit that referenced this issue May 7, 2020
1) Control plane nodes are powered on
2) wait for 3 minutes for nodes to come up
3) approve any pending CSRs

This addresses #8
rmkraus added a commit that referenced this issue May 8, 2020
1) The bootstrap certs are restored on app nodes.
2) The Cert CA is regenerated on the control plane.
3) The nodes are powered off.

This addresses #8
rmkraus added a commit that referenced this issue May 8, 2020
rmkraus added a commit that referenced this issue May 9, 2020
Debugged cluster install process.
Added startup and shutdown commands.

* Fixed permission issues in container image.

The .ssh config file permissions were bad. The Docker file now ensures
that these will be valid.

* Fixed builder files.

Dockerfile - fixed permissions settings for root ssh keys.
Makefile - Added publish_dev action.

* Bumped version

* Fixed issues with hypervisor role.

* Made documentation more explicit during the install process.

* Added cluster create command to CLI.

* Removed dedicated check for API to come up.

The check was unreliable as the API can return different status codes
at different times. The openshift-install wait-for command will
check for the API anyway.

* Removed unneeded commands from farosctl script.

Context, forget, and run were all relics from another project that
are mush less useful for this faros. Removing them so they do not
cause chaos.

This addresses issue #2.

* Made the bastion bootstrap script more user friendly.

Added better messaging and alerting when there is a failure.
Added a success message if no failures.

This addresses issue #1

* Added startup cli command.

1) Control plane nodes are powered on
2) wait for 3 minutes for nodes to come up
3) approve any pending CSRs

This addresses #8

* Added shutdown cli command.

1) The bootstrap certs are restored on app nodes.
2) The Cert CA is regenerated on the control plane.
3) The nodes are powered off.

This addresses #8

* Bumped version.

* Fixed some issues with startup and shutdown scripts.

This addresses #8

* Updated README with shutdown and startup docs.

* Fixes to shutdown roles.

The daemonset to restore bootstrap certs should be removed when it
is done.

* Increased idle timeout on load balancer to 10 minutes.

Was 30 seconds, but rsh sessions were getting disconnected far
too quickly.

* Defaulted back to dense output.
rmkraus added a commit that referenced this issue May 19, 2020
* Fixed permission issues in container image.

The .ssh config file permissions were bad. The Docker file now ensures
that these will be valid.

* Fixed builder files.

Dockerfile - fixed permissions settings for root ssh keys.
Makefile - Added publish_dev action.

* Bumped version

* Fixed issues with hypervisor role.

* Made documentation more explicit during the install process.

* Added cluster create command to CLI.

* Removed dedicated check for API to come up.

The check was unreliable as the API can return different status codes
at different times. The openshift-install wait-for command will
check for the API anyway.

* Removed unneeded commands from farosctl script.

Context, forget, and run were all relics from another project that
are mush less useful for this faros. Removing them so they do not
cause chaos.

This addresses issue #2.

* Made the bastion bootstrap script more user friendly.

Added better messaging and alerting when there is a failure.
Added a success message if no failures.

This addresses issue #1

* Added startup cli command.

1) Control plane nodes are powered on
2) wait for 3 minutes for nodes to come up
3) approve any pending CSRs

This addresses #8

* Added shutdown cli command.

1) The bootstrap certs are restored on app nodes.
2) The Cert CA is regenerated on the control plane.
3) The nodes are powered off.

This addresses #8

* Bumped version.

* Fixed some issues with startup and shutdown scripts.

This addresses #8

* Updated README with shutdown and startup docs.

* Fixes to shutdown roles.

The daemonset to restore bootstrap certs should be removed when it
is done.

* Increased idle timeout on load balancer to 10 minutes.

Was 30 seconds, but rsh sessions were getting disconnected far
too quickly.

* Defaulted back to dense output.

* Changed the startup operator heal timeout 15 minutes from 5.

Seems like more time is occasionally necessary.

* Added poweron and poweroff aliases.

* Fixed verbosity issues with dense callback plugin.

* Added VIP configuration for load balancer.

Added a VIP with keepalived allows the IP address to be moved to the
cluster when the cluster is done booting.

* Configured Cluster DNS entries to point to the loadbalancer VIP.

Also squashed a bug where wildcard entries were not properly updated
if they already existed.

* Added recipe for hosted-loadbalancer installation.

This addresses issue #3.

* Added code for deploy cli command.

Also removed /app/deploy.sh as it's logic was folded into the
cli command.

This adddreses issue #3

* Added KUBECONFIG to default profile.

* Bumped version, renamed cookbook directory.

* Fixed permissions issue with hosted loadbalancer operator.
@rmkraus rmkraus closed this as completed May 20, 2020
rmkraus added a commit that referenced this issue May 25, 2020
* Fixed permission issues in container image.

The .ssh config file permissions were bad. The Docker file now ensures
that these will be valid.

* Fixed builder files.

Dockerfile - fixed permissions settings for root ssh keys.
Makefile - Added publish_dev action.

* Bumped version

* Fixed issues with hypervisor role.

* Made documentation more explicit during the install process.

* Added cluster create command to CLI.

* Removed dedicated check for API to come up.

The check was unreliable as the API can return different status codes
at different times. The openshift-install wait-for command will
check for the API anyway.

* Removed unneeded commands from farosctl script.

Context, forget, and run were all relics from another project that
are mush less useful for this faros. Removing them so they do not
cause chaos.

This addresses issue #2.

* Made the bastion bootstrap script more user friendly.

Added better messaging and alerting when there is a failure.
Added a success message if no failures.

This addresses issue #1

* Added startup cli command.

1) Control plane nodes are powered on
2) wait for 3 minutes for nodes to come up
3) approve any pending CSRs

This addresses #8

* Added shutdown cli command.

1) The bootstrap certs are restored on app nodes.
2) The Cert CA is regenerated on the control plane.
3) The nodes are powered off.

This addresses #8

* Bumped version.

* Fixed some issues with startup and shutdown scripts.

This addresses #8

* Updated README with shutdown and startup docs.

* Fixes to shutdown roles.

The daemonset to restore bootstrap certs should be removed when it
is done.

* Increased idle timeout on load balancer to 10 minutes.

Was 30 seconds, but rsh sessions were getting disconnected far
too quickly.

* Defaulted back to dense output.

* Changed the startup operator heal timeout 15 minutes from 5.

Seems like more time is occasionally necessary.

* Added poweron and poweroff aliases.

* Fixed verbosity issues with dense callback plugin.

* Added VIP configuration for load balancer.

Added a VIP with keepalived allows the IP address to be moved to the
cluster when the cluster is done booting.

* Configured Cluster DNS entries to point to the loadbalancer VIP.

Also squashed a bug where wildcard entries were not properly updated
if they already existed.

* Added recipe for hosted-loadbalancer installation.

This addresses issue #3.

* Added code for deploy cli command.

Also removed /app/deploy.sh as it's logic was folded into the
cli command.

This adddreses issue #3

* Added KUBECONFIG to default profile.

* Bumped version, renamed cookbook directory.

* Fixed permissions issue with hosted loadbalancer operator.

* Added deploy container-storage command.

1) Added the logic to the deploy container storage. This addresses
   issue #4
2) Rearranged deploy folder to be more flexible
3) Added ability for ansible to save stats to a yaml file

* Bumped version number.

* Initial pass at cluster version pinning.

This addresses #18

* Squashing some bugs.

* Cleaned up my_dense stdout callback plugin.

1) broke out the post message logic to a new plugin
2) broke out the save stats logic to a new plugin
3) fixed the error handling
4) cleaned up the runtime messages

This addresses issue #21

* Resolving issue with bootstrap node not being shutdown.

This addresses issue #22

* Resolved issue with errors shutting down.

The K8s API would briefly drop out of service which caused the
Ansible to crash. The code is not more tolerant of this.

This addresses issue #20

* Fixed issue with status messages not being correct.

This addresses issue #21

* Final touches on container storage deployement recipe.

This addresses issue #4

* Added verbosity to a failed install because of cert age.

This addresses issue #23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant