Enabling OpenShift 4 Clusters to Stop and Resume Cluster VMs

There are a lot of reasons to stop and resume OpenShift Cluster VMs:

  • Save money on cloud hosting costs

  • Use a cluster only during daytime hours – for example for
    exploratory or development work. If this is a cluster for just one
    person it does not need to be running when the only user is not
    using it.

  • Deploy a few clusters for students ahead of time when teaching a
    workshop / class. And making sure that the clusters have
    prerequisites installed ahead of time.

Background

When installing OpenShift 4 clusters a bootstrap certificate is created
that is used on the master nodes to create certificate signing requests
(CSRs) for kubelet client certificates (one for each kubelet) that will
be used to identify each kubelet on any node.

Because certificates can not be revoked, this certificate is made with a
short expire time and 24 hours after cluster installation, it can not be
used again. All nodes other than the master nodes have a service account
token which is revocable. Therefore the bootstrap certificate is only
valid for 24 hours after cluster installation. After then again every 30
days.

If the master kubelets do not have a 30 day client certificate (the
first only lasts 24 hours), then missing the kubelet client certificate
refresh window renders the cluster unusable because the bootstrap
credential cannot be used when the cluster is woken back up.
Practically, this requires an OpenShift 4 cluster to be running for at
least 25 hours after installation before it can be shut down.

The following process enables cluster shutdown right after installation.
It also enables cluster resume at any time in the next 30 days.

Note that this process only works up until the 30 day certificate
rotation. But for most test / development / classroom / etc. clusters
this will be a usable approach because these types of clusters are
usually rather short lived.

Preparing the Cluster to support stopping of VMs

These steps will enable a successful restart of a cluster after its VMs
have been stopped. This process has been tested on OpenShift 4.1.11 and
higher – including developer preview builds of OpenShift 4.2.

  1. From the VM that you ran the OpenShift installer from create the
    following DaemonSet manifest. This DaemonSet pulls down the same
    service account token bootstrap credential used on all the
    non-master nodes in the cluster.
cat << 'EOF' >$HOME/kubelet-bootstrap-cred-manager-ds.yaml.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kubelet-bootstrap-cred-manager
namespace: openshift-machine-config-operator
labels:
k8s-app: kubelet-bootrap-cred-manager
spec:
replicas: 1
selector:
matchLabels:
k8s-app: kubelet-bootstrap-cred-manager
template:
metadata:
labels:
k8s-app: kubelet-bootstrap-cred-manager
spec:
containers:
- name: kubelet-bootstrap-cred-manager
image: quay.io/openshift/origin-cli:v4.0
command: ['/bin/bash', '-ec']
args:
- |
#!/bin/bash

set -eoux pipefail

while true; do
unset KUBECONFIG

echo "----------------------------------------------------------------------"
echo "Gather info..."
echo "----------------------------------------------------------------------"
# context
intapi=$(oc get infrastructures.config.openshift.io cluster -o "jsonpath={.status.apiServerInternalURI}")
context="$(oc --config=/etc/kubernetes/kubeconfig config current-context)"
# cluster
cluster="$(oc --config=/etc/kubernetes/kubeconfig config view -o "jsonpath={.contexts[?(@.name==\"$context\")].context.cluster}")"
server="$(oc --config=/etc/kubernetes/kubeconfig config view -o "jsonpath={.clusters[?(@.name==\"$cluster\")].cluster.server}")"
# token
ca_crt_data="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.ca\.crt}" | base64 --decode)"
namespace="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.namespace}" | base64 --decode)"
token="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.token}" | base64 --decode)"

echo "----------------------------------------------------------------------"
echo "Generate kubeconfig"
echo "----------------------------------------------------------------------"

export KUBECONFIG="$(mktemp)"
kubectl config set-credentials "kubelet" --token="$token" >/dev/null
ca_crt="$(mktemp)"; echo "$ca_crt_data" > $ca_crt
kubectl config set-cluster $cluster --server="$intapi" --certificate-authority="$ca_crt" --embed-certs >/dev/null
kubectl config set-context kubelet --cluster="$cluster" --user="kubelet" >/dev/null
kubectl config use-context kubelet >/dev/null

echo "----------------------------------------------------------------------"
echo "Print kubeconfig"
echo "----------------------------------------------------------------------"
cat "$KUBECONFIG"

echo "----------------------------------------------------------------------"
echo "Whoami?"
echo "----------------------------------------------------------------------"
oc whoami
whoami

echo "----------------------------------------------------------------------"
echo "Moving to real kubeconfig"
echo "----------------------------------------------------------------------"
cp /etc/kubernetes/kubeconfig /etc/kubernetes/kubeconfig.prev
chown root:root ${KUBECONFIG}
chmod 0644 ${KUBECONFIG}
mv "${KUBECONFIG}" /etc/kubernetes/kubeconfig

echo "----------------------------------------------------------------------"
echo "Sleep 60 seconds..."
echo "----------------------------------------------------------------------"
sleep 60
done
securityContext:
privileged: true
runAsUser: 0
volumeMounts:
- mountPath: /etc/kubernetes/
name: kubelet-dir
nodeSelector:
node-role.kubernetes.io/master: ""
priorityClassName: "system-cluster-critical"
restartPolicy: Always
securityContext:
runAsUser: 0
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 120
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 120
volumes:
- hostPath:
path: /etc/kubernetes/
type: Directory
name: kubelet-dir
EOF
  1. Deploy the DaemonSet to your cluster.
oc apply -f $HOME/kubelet-bootstrap-cred-manager-ds.yaml.yaml
  1. Delete the secrets csr-signer-signer and csr-signer from the
    openshift-kube-controller-manager-operator namespace
oc delete secrets/csr-signer-signer secrets/csr-signer -n openshift-kube-controller-manager-operator

This will trigger the Cluster Operators to re-create the CSR signer
secrets which are used when the cluster starts back up to sign the
kubelet client certificate CSRs. You can watch as various operators
switch from Progressing=False to Progressing=True and back to
Progressing=False. The operators that will cycle are
kube-apiserver, openshift-controller-manager,
kube-controller-manager and monitoring.

watch oc get clusteroperators

Sample Output.

NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
cloud-credential 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
cluster-autoscaler 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
console 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
dns 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
image-registry 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
ingress 4.2.0-0.nightly-2019-08-27-072819 True False False 3h46m
insights 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
kube-apiserver 4.2.0-0.nightly-2019-08-27-072819 True True False 18h
kube-controller-manager 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
kube-scheduler 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
machine-api 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
machine-config 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
marketplace 4.2.0-0.nightly-2019-08-27-072819 True False False 3h46m
monitoring 4.2.0-0.nightly-2019-08-27-072819 True False False 3h45m
network 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
node-tuning 4.2.0-0.nightly-2019-08-27-072819 True False False 3h46m
openshift-apiserver 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
openshift-controller-manager 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
openshift-samples 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
operator-lifecycle-manager 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
operator-lifecycle-manager-catalog 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
operator-lifecycle-manager-packageserver 4.2.0-0.nightly-2019-08-27-072819 True False False 3h46m
service-ca 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
service-catalog-apiserver 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
service-catalog-controller-manager 4.2.0-0.nightly-2019-08-27-072819 True False False 18h
storage 4.2.0-0.nightly-2019-08-27-072819 True False False 18h

Once all Cluster Operators show Available=True,
Progressing=False and Degraded=False the cluster is ready
for shutdown.

Stoppping the cluster VMs

Use the tools native to the cloud environment that your cluster is
running on to shut down the VMs.

The following command will shut down the VMs that make up a cluster on
Amazon Web Services.

Prerequisites:

  • The Amazon Web Services Command Line Interface, awscli, is
    installed.

  • $HOME/.aws/credentials has the proper AWS credentials available to
    execute the command.

  • REGION points to the region your VMs are deployed in.

  • CLUSTERNAME is set to the Cluster Name you used during
    installation. For example cluster-${GUID}.

export REGION=us-east-2
export CLUSTERNAME=cluster-${GUID}

aws ec2 stop-instances --region ${REGION} --instance-ids $(aws ec2 describe-instances --filters "Name=tag:Name,Values=${CLUSTERNAME}-*" "Name=instance-state-name,Values=running" --query Reservations[*].Instances[*].InstanceId --region ${REGION} --output text)

Starting the cluster VMs

Use the tools native to the cloud environment that your cluster is
running on to start the VMs.

The following commands will start the cluster VMs in Amazon Web
Services.

export REGION=us-east-2
export CLUSTERNAME=cluster-${GUID}

aws ec2 start-instances –region ${REGION} –instance-ids $(aws ec2 describe-instances –filters “Name=tag:Name,Values=${CLUSTERNAME}-” “Name=instance-state-name,Values=stopped” –query Reservations[].Instances[*].InstanceId –region ${REGION} –output text)

Recovering the cluster

If the cluster missed the initial 24 hour certicate rotation some nodes
in the cluster may be in NotReady state. Validate if any nodes are in
NotReady. Note that immediately after waking up the cluster the nodes
may show Ready – but will switch to NotReady within a few seconds.

oc get nodes

Sample Output.

NAME STATUS ROLES AGE VERSION
ip-10-0-132-82.us-east-2.compute.internal NotReady worker 18h v1.14.0+b985ea310
ip-10-0-134-223.us-east-2.compute.internal NotReady master 19h v1.14.0+b985ea310
ip-10-0-147-233.us-east-2.compute.internal NotReady master 19h v1.14.0+b985ea310
ip-10-0-154-126.us-east-2.compute.internal NotReady worker 18h v1.14.0+b985ea310
ip-10-0-162-210.us-east-2.compute.internal NotReady master 19h v1.14.0+b985ea310
ip-10-0-172-133.us-east-2.compute.internal NotReady worker 18h v1.14.0+b985ea310

If some nodes show NotReady the nodes will start issuing Certificate
Signing Requests (CSRs). Repeat the following command until you see a
CSR for each NotReady node in the cluster with Pending in the
Condition column.

oc get csr

Once you see the CSRs they need to be approved. The following command
approves all outstanding CSRs.

oc get csr -oname | xargs oc adm certificate approve

When you double check the CSRs (using oc get csr) you should now see
that the CSRs have now been Approved and Issued (again in the
Condition column).

Double check that all nodes now show Ready. Note that this may take a
few seconds after approving the CSRs.

oc get nodes

Sample Output.

NAME STATUS ROLES AGE VERSION
ip-10-0-132-82.us-east-2.compute.internal Ready worker 18h v1.14.0+b985ea310
ip-10-0-134-223.us-east-2.compute.internal Ready master 19h v1.14.0+b985ea310
ip-10-0-147-233.us-east-2.compute.internal Ready master 19h v1.14.0+b985ea310
ip-10-0-154-126.us-east-2.compute.internal Ready worker 18h v1.14.0+b985ea310
ip-10-0-162-210.us-east-2.compute.internal Ready master 19h v1.14.0+b985ea310
ip-10-0-172-133.us-east-2.compute.internal Ready worker 18h v1.14.0+b985ea310

Your cluster is now fully ready to be used again.

Ansible Playbook to recover cluster

The following Ansible Playbook should recover a cluster after wake up.
Note the 5 minute sleep to give the nodes enough time to settle, start
all pods and issue CSRs.

Prerequisites:

  • Ansible installed

  • Current user either has a .kube/config that grants cluster-admin
    permissions or a KUBECONFIG environment variable set that points
    to a kube config file with cluster-admin permissions.

  • OpenShift Command Line interface (oc) in the current user’s PATH.

- name: Run cluster recover actions
hosts: localhost
connection: local
gather_facts: False
become: no
tasks:
- name: Wait 5 minutes for Nodes to settle and pods to start
pause:
minutes: 5

- name: Get CSRs that need to be approved
command: oc get csr -oname
register: r_csrs
changed_when: false

- name: Approve all Pending CSRs
when: r_csrs.stdout_lines | length > 0
command: "oc adm certificate approve {{ item }}"
loop: "{{ r_csrs.stdout_lines }}"

Summary

Following this process enables you to stop OpenShift 4 Cluster VMs right
after installation without having to wait for the 24h certificate
rotation to occur.

It also enables you to resume Cluster VMs that have been stopped while
the 24h certificate rotation would have occurred.

Categories
OpenShift Container Engine, OpenShift Container Platform
Tags
, ,