There are a lot of reasons to stop and resume OpenShift Cluster VMs:

  • Save money on cloud hosting costs
  • Use a cluster only during daytime hours - for example for
    exploratory or development work. If this is a cluster for just one
    person it does not need to be running when the only user is not
    using it.

  • Deploy a few clusters for students ahead of time when teaching a
    workshop / class. And making sure that the clusters have
    prerequisites installed ahead of time.

Background

When installing OpenShift 4 clusters a bootstrap certificate is created
that is used on the master nodes to create certificate signing requests
(CSRs) for kubelet client certificates (one for each kubelet) that will
be used to identify each kubelet on any node.

Because certificates can not be revoked, this certificate is made with a
short expire time and 24 hours after cluster installation, it can not be
used again. All nodes other than the master nodes have a service account
token which is revocable. Therefore the bootstrap certificate is only
valid for 24 hours after cluster installation. After then again every 30
days.

If the master kubelets do not have a 30 day client certificate (the
first only lasts 24 hours), then missing the kubelet client certificate
refresh window renders the cluster unusable because the bootstrap
credential cannot be used when the cluster is woken back up.
Practically, this requires an OpenShift 4 cluster to be running for at
least 25 hours after installation before it can be shut down.

The following process enables cluster shutdown right after installation.
It also enables cluster resume at any time in the next 30 days.

Note that this process only works up until the 30 day certificate
rotation. But for most test / development / classroom / etc. clusters
this will be a usable approach because these types of clusters are
usually rather short lived.

Preparing the Cluster to support stopping of VMs

These steps will enable a successful restart of a cluster after its VMs
have been stopped. This process has been tested on OpenShift 4.1.11 and
higher - including developer preview builds of OpenShift 4.2.

  1. From the VM that you ran the OpenShift installer from create the
    following DaemonSet manifest. This DaemonSet pulls down the same
    service account token bootstrap credential used on all the
    non-master nodes in the cluster.
cat << 'EOF' >$HOME/kubelet-bootstrap-cred-manager-ds.yaml.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kubelet-bootstrap-cred-manager
namespace: openshift-machine-config-operator
labels:
k8s-app: kubelet-bootrap-cred-manager
spec:
replicas: 1
selector:
matchLabels:
k8s-app: kubelet-bootstrap-cred-manager
template:
metadata:
labels:
k8s-app: kubelet-bootstrap-cred-manager
spec:
containers:
- name: kubelet-bootstrap-cred-manager
image: quay.io/openshift/origin-cli:v4.0
command: ['/bin/bash', '-ec']
args:
- |
#!/bin/bash

set -eoux pipefail

while true; do
unset KUBECONFIG

echo "---------------------------------"
echo "Gather info..."
echo "---------------------------------"
# context
intapi=$(oc get infrastructures.config.openshift.io cluster -o "jsonpath={.status.apiServerInternalURI}")
context="$(oc --config=/etc/kubernetes/kubeconfig config current-context)"
# cluster
cluster="$(oc --config=/etc/kubernetes/kubeconfig config view -o "jsonpath={.contexts[?(@.name==\"$context\")].context.cluster}")"
server="$(oc --config=/etc/kubernetes/kubeconfig config view -o "jsonpath={.clusters[?(@.name==\"$cluster\")].cluster.server}")"
# token
ca_crt_data="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.ca\.crt}" | base64 --decode)"
namespace="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.namespace}" | base64 --decode)"
token="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.token}" | base64 --decode)"

echo "---------------------------------"
echo "Generate kubeconfig"
echo "---------------------------------"

export KUBECONFIG="$(mktemp)"
kubectl config set-credentials "kubelet" --token="$token" >/dev/null
ca_crt="$(mktemp)"; echo "$ca_crt_data" > $ca_crt
kubectl config set-cluster $cluster --server="$intapi" --certificate-authority="$ca_crt" --embed-certs >/dev/null
kubectl config set-context kubelet --cluster="$cluster" --user="kubelet" >/dev/null
kubectl config use-context kubelet >/dev/null

echo "---------------------------------"
echo "Print kubeconfig"
echo "---------------------------------"
cat "$KUBECONFIG"

echo "---------------------------------"
echo "Whoami?"
echo "---------------------------------"
oc whoami
whoami

echo "---------------------------------"
echo "Moving to real kubeconfig"
echo "---------------------------------"
cp /etc/kubernetes/kubeconfig /etc/kubernetes/kubeconfig.prev
chown root:root ${KUBECONFIG}
chmod 0644 ${KUBECONFIG}
mv "${KUBECONFIG}" /etc/kubernetes/kubeconfig

echo "---------------------------------"
echo "Sleep 60 seconds..."
echo "---------------------------------"
sleep 60
done
securityContext:
privileged: true
runAsUser: 0
volumeMounts:
- mountPath: /etc/kubernetes/
name: kubelet-dir
nodeSelector:
node-role.kubernetes.io/master: ""
priorityClassName: "system-cluster-critical"
restartPolicy: Always
securityContext:
runAsUser: 0
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 120
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 120
volumes:
- hostPath:
path: /etc/kubernetes/
type: Directory
name: kubelet-dir
EOF
     2. Deploy the DaemonSet to your cluster.
oc apply -f $HOME/kubelet-bootstrap-cred-manager-ds.yaml.yaml
     3. Delete the secrets csr-signer-signer and csr-signer from the
openshift-kube-controller-manager-operator namespace
oc delete secrets/csr-signer-signer secrets/csr-signer -n openshift-kube-controller-manager-operato

This will trigger the Cluster Operators to re-create the CSR signer
secrets which are used when the cluster starts back up to sign the
kubelet client certificate CSRs. You can watch as various operators
switch from Progressing=False to Progressing=True and back to
Progressing=False. The operators that will cycle are
kube-apiserver, openshift-controller-manager,
kube-controller-manager and monitoring.

watch oc get clusteroperators

Sample Output

NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication 4.2.14 True False False 8d
cloud-credential 4.2.14 True False False 8d
cluster-autoscaler 4.2.14 True False False 8d
console 4.2.14 True False False 25m
dns 4.2.14 True False False 8d
image-registry 4.2.14 True False False 45m
ingress 4.2.14 True False False 8d
insights 4.2.14 True False False 8d
kube-apiserver 4.2.14 True True False 8d
kube-controller-manager 4.2.14 True False False 8d
kube-scheduler 4.2.14 True False False 8d
machine-api 4.2.14 True False False 8d
machine-config 4.2.14 True False False 17h
marketplace 4.2.14 True False False 24m
monitoring 4.2.14 True False False 48m
network 4.2.14 True False False 8d
node-tuning 4.2.14 True False False 24m
openshift-apiserver 4.2.14 True False False 30m
openshift-controller-manager 4.2.14 True False False 8d
openshift-samples 4.2.14 True False False 52m
operator-lifecycle-manager 4.2.14 True False False 8d
operator-lifecycle-manager-catalog 4.2.14 True False False 8d
operator-lifecycle-manager-packageserver 4.2.14 True False False 25m
service-ca 4.2.14 True False False 8d
service-catalog-apiserver 4.2.14 True False False 8d
service-catalog-controller-manager 4.2.14 True False False 8d
storage 4.2.14 True False False 58m

Once all Cluster Operators show Available=True,
Progressing=False and Degraded=False the cluster is ready
for shutdown.

Workaround for Monitoring Component not noticing the new certificate

In current versions of OpenShift there is a bug where some monitoring components do not pick up the newly rotated certificate. These will break after a few hours.

To force the Monitoring operators to rotate its certificates edit an empty line at the end of one of the certificates in the config map extension-apiserver-authentication in project kube-system.

The following Ansible playbook will do that (pass user and cluster_name to the playbook):

---
- name: Force Monitoring Cert rotation
hosts: localhost
become: no
gather_facts: false
environment:
KUBECONFIG: /home/{{ user }}/{{ cluster_name }}/auth/kubeconfig
tasks:
- name: Get Config Map Definition
shell: oc get configmap extension-apiserver-authentication -n kube-system -o yaml >/tmp/extension-apiserver-authentication.yaml
- name: Add an empty line to config map file
lineinfile:
path: /tmp/extension-apiserver-authentication.yaml
firstmatch: true
insertafter: '-----END CERTIFICATE-----'
line: ''
- name: Update Config Map with new file
k8s:
state: present
src: /tmp/extension-apiserver-authentication.yaml

Stoppping the cluster VMs

Use the tools native to the cloud environment that your cluster is
running on to shut down the VMs.

The following command will shut down the VMs that make up a cluster on
Amazon Web Services.

Prerequisites:

  • The Amazon Web Services Command Line Interface, awscli, is
    installed.
  • $HOME/.aws/credentials has the proper AWS credentials available to
    execute the command.

  • REGION points to the region your VMs are deployed in.

  • CLUSTERNAME is set to the Cluster Name you used during
    installation. For example cluster-${GUID}.

export REGION=us-east-2
export CLUSTERNAME=cluster-${GUID}

aws ec2 stop-instances --region ${REGION} --instance-ids $(aws ec2 describe-instances --filters "Name=tag:Name,Values=${CLUSTERNAME}-*" "Name=instance-state-name,Values=running" --query Reservations[*].Instances[*].InstanceId --region ${REGION} --output text)

Starting the cluster VMs

Use the tools native to the cloud environment that your cluster is
running on to start the VMs.

The following commands will start the cluster VMs in Amazon Web
Services.

export REGION=us-east-2
export CLUSTERNAME=cluster-${GUID}

aws ec2 start-instances --region ${REGION} --instance-ids $(aws ec2 describe-instances --filters "Name=tag:Name,Values=${CLUSTERNAME}-*" "Name=instance-state-name,Values=stopped" --query Reservations[*].Instances[*].InstanceId --region ${REGION} --output text)

Recovering the cluster

If the cluster missed the initial 24 hour certicate rotation some nodes
in the cluster may be in NotReady state. Validate if any nodes are in
NotReady. Note that immediately after waking up the cluster the nodes
may show Ready - but will switch to NotReady within a few seconds.

oc get nodes

Sample Output.

NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-132-82.us-east-2.compute.internal NotReady worker 18h v1.14.0+b985ea310
ip-10-0-134-223.us-east-2.compute.internal NotReady master 19h v1.14.0+b985ea310
ip-10-0-147-233.us-east-2.compute.internal NotReady master 19h v1.14.0+b985ea310
ip-10-0-154-126.us-east-2.compute.internal NotReady worker 18h v1.14.0+b985ea310
ip-10-0-162-210.us-east-2.compute.internal NotReady master 19h v1.14.0+b985ea310
ip-10-0-172-133.us-east-2.compute.internal NotReady worker 18h v1.14.0+b985ea310

If some nodes show NotReady the nodes will start issuing Certificate
Signing Requests (CSRs). Repeat the following command until you see a
CSR for each NotReady node in the cluster with Pending in the
Condition column.

oc get csr

Once you see the CSRs they need to be approved. The following command
approves all outstanding CSRs.

oc get csr -oname | xargs oc adm certificate approve

When you double check the CSRs (using oc get csr) you should now see
that the CSRs have now been Approved and Issued (again in the
Condition column).

Double check that all nodes now show Ready. Note that this may take a
few seconds after approving the CSRs.

oc get nodes

Sample Output.

NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-132-82.us-east-2.compute.internal Ready worker 18h v1.14.0+b985ea310
ip-10-0-134-223.us-east-2.compute.internal Ready master 19h v1.14.0+b985ea310
ip-10-0-147-233.us-east-2.compute.internal Ready master 19h v1.14.0+b985ea310
ip-10-0-154-126.us-east-2.compute.internal Ready worker 18h v1.14.0+b985ea310
ip-10-0-162-210.us-east-2.compute.internal Ready master 19h v1.14.0+b985ea310
ip-10-0-172-133.us-east-2.compute.internal Ready worker 18h v1.14.0+b985ea310

Your cluster is now fully ready to be used again.

Ansible Playbook to recover cluster

The following Ansible Playbook should recover a cluster after wake up.
Note the 5 minute sleep to give the nodes enough time to settle, start
all pods and issue CSRs.

Prerequisites:

  • Ansible installed
  • Current user either has a .kube/config that grants cluster-admin
    permissions or a KUBECONFIG environment variable set that points
    to a kube config file with cluster-admin permissions.

  • OpenShift Command Line interface (oc) in the current user’s PATH.

- name: Run cluster recover actions
hosts: localhost
connection: local
gather_facts: False
become: no
tasks:
- name: Wait 3 minutes for Nodes to settle and pods to start
pause:
minutes: 3

- name: Get CSRs that need to be approved
k8s_facts:
api_version: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
register: r_csrs

- when: r_csrs.resources | length > 0
name: Approve all Pending CSRs
command: "oc adm certificate approve {{ item.metadata.name }}"
# when: item.status.conditions[0].type == "Pending"
loop: "{{ r_csrs.resources }}"

Summary

Following this process enables you to stop OpenShift 4 Cluster VMs right
after installation without having to wait for the 24h certificate
rotation to occur.

It also enables you to resume Cluster VMs that have been stopped while
the 24h certificate rotation would have occurred.

The most recent updates to this process (and properly formatted YAML) can be found at https://github.com/redhat-cop/openshift-lab-origin/blob/master/OpenShift4/Stopping_and_Resuming_OCP4_Clusters.adoc