Enabling OpenShift 4 Clusters to Stop and Resume Cluster VMs

There are a lot of reasons to stop and resume OpenShift Cluster VMs:

  • Save money on cloud hosting costs

  • Use a cluster only during daytime hours – for example for
    exploratory or development work. If this is a cluster for just one
    person it does not need to be running when the only user is not
    using it.

  • Deploy a few clusters for students ahead of time when teaching a
    workshop / class. And making sure that the clusters have
    prerequisites installed ahead of time.

Background

When installing OpenShift 4 clusters a bootstrap certificate is created
that is used on the master nodes to create certificate signing requests
(CSRs) for kubelet client certificates (one for each kubelet) that will
be used to identify each kubelet on any node.

Because certificates can not be revoked, this certificate is made with a
short expire time and 24 hours after cluster installation, it can not be
used again. All nodes other than the master nodes have a service account
token which is revocable. Therefore the bootstrap certificate is only
valid for 24 hours after cluster installation. After then again every 30
days.

If the master kubelets do not have a 30 day client certificate (the
first only lasts 24 hours), then missing the kubelet client certificate
refresh window renders the cluster unusable because the bootstrap
credential cannot be used when the cluster is woken back up.
Practically, this requires an OpenShift 4 cluster to be running for at
least 25 hours after installation before it can be shut down.

The following process enables cluster shutdown right after installation.
It also enables cluster resume at any time in the next 30 days.

Note that this process only works up until the 30 day certificate
rotation. But for most test / development / classroom / etc. clusters
this will be a usable approach because these types of clusters are
usually rather short lived.

Preparing the Cluster to support stopping of VMs

These steps will enable a successful restart of a cluster after its VMs
have been stopped. This process has been tested on OpenShift 4.1.11 and
higher – including developer preview builds of OpenShift 4.2.

  1. From the VM that you ran the OpenShift installer from create the
    following DaemonSet manifest. This DaemonSet pulls down the same
    service account token bootstrap credential used on all the
    non-master nodes in the cluster.

    cat << EOF >$HOME/kubelet-bootstrap-cred-manager-ds.yaml.yaml
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: kubelet-bootstrap-cred-manager
      namespace: openshift-machine-config-operator
      labels:
        k8s-app: kubelet-bootrap-cred-manager
    spec:
      replicas: 1
      selector:
        matchLabels:
          k8s-app: kubelet-bootstrap-cred-manager
      template:
        metadata:
          labels:
            k8s-app: kubelet-bootstrap-cred-manager
        spec:
          containers:
          - name: kubelet-bootstrap-cred-manager
            image: quay.io/openshift/origin-cli:v4.0
            command: ['/bin/bash', '-ec']
            args:
              - |
                #!/bin/bash
    
                set -eoux pipefail
    
                while true; do
                  unset KUBECONFIG
    
                  echo "----------------------------------------------------------------------"
                  echo "Gather info..."
                  echo "----------------------------------------------------------------------"
                  # context
                  intapi=$(oc get infrastructures.config.openshift.io cluster -o "jsonpath={.status.apiServerInternalURI}")
                  context="$(oc --config=/etc/kubernetes/kubeconfig config current-context)"
                  # cluster
                  cluster="$(oc --config=/etc/kubernetes/kubeconfig config view -o "jsonpath={.contexts[?(@.name==\"$context\")].context.cluster}")"
                  server="$(oc --config=/etc/kubernetes/kubeconfig config view -o "jsonpath={.clusters[?(@.name==\"$cluster\")].cluster.server}")"
                  # token
                  ca_crt_data="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.ca\.crt}" | base64 --decode)"
                  namespace="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token  -o "jsonpath={.data.namespace}" | base64 --decode)"
                  token="$(oc get secret -n openshift-machine-config-operator node-bootstrapper-token -o "jsonpath={.data.token}" | base64 --decode)"
    
                  echo "----------------------------------------------------------------------"
                  echo "Generate kubeconfig"
                  echo "----------------------------------------------------------------------"
    
                  export KUBECONFIG="$(mktemp)"
                  kubectl config set-credentials "kubelet" --token="$token" >/dev/null
                  ca_crt="$(mktemp)"; echo "$ca_crt_data" > $ca_crt
                  kubectl config set-cluster $cluster --server="$intapi" --certificate-authority="$ca_crt" --embed-certs >/dev/null
                  kubectl config set-context kubelet --cluster="$cluster" --user="kubelet" >/dev/null
                  kubectl config use-context kubelet >/dev/null
    
                  echo "----------------------------------------------------------------------"
                  echo "Print kubeconfig"
                  echo "----------------------------------------------------------------------"
                  cat "$KUBECONFIG"
    
                  echo "----------------------------------------------------------------------"
                  echo "Whoami?"
                  echo "----------------------------------------------------------------------"
                  oc whoami
                  whoami
    
                  echo "----------------------------------------------------------------------"
                  echo "Moving to real kubeconfig"
                  echo "----------------------------------------------------------------------"
                  cp /etc/kubernetes/kubeconfig /etc/kubernetes/kubeconfig.prev
                  chown root:root ${KUBECONFIG}
                  chmod 0644 ${KUBECONFIG}
                  mv "${KUBECONFIG}" /etc/kubernetes/kubeconfig
    
                  echo "----------------------------------------------------------------------"
                  echo "Sleep 60 seconds..."
                  echo "----------------------------------------------------------------------"
                  sleep 60
                done
            securityContext:
              privileged: true
              runAsUser: 0
            volumeMounts:
              - mountPath: /etc/kubernetes/
                name: kubelet-dir
          nodeSelector:
            node-role.kubernetes.io/master: ""
          priorityClassName: "system-cluster-critical"
          restartPolicy: Always
          securityContext:
            runAsUser: 0
          tolerations:
          - key: "node-role.kubernetes.io/master"
            operator: "Exists"
            effect: "NoSchedule"
          - key: "node.kubernetes.io/unreachable"
            operator: "Exists"
            effect: "NoExecute"
            tolerationSeconds: 120
          - key: "node.kubernetes.io/not-ready"
            operator: "Exists"
            effect: "NoExecute"
            tolerationSeconds: 120
          volumes:
            - hostPath:
                path: /etc/kubernetes/
                type: Directory
              name: kubelet-dir
    EOF
    
  2. Deploy the DaemonSet to your cluster.
    oc apply -f $HOME/kubelet-bootstrap-cred-manager-ds.yaml.yaml
    
  3. Delete the secrets csr-signer-signer and csr-signer from the
    openshift-kube-controller-manager-operator namespace

    oc delete secrets/csr-signer-signer secrets/csr-signer -n openshift-kube-controller-manager-operator
    

    This will trigger the Cluster Operators to re-create the CSR signer
    secrets which are used when the cluster starts back up to sign the
    kubelet client certificate CSRs. You can watch as various operators
    switch from Progressing=False to Progressing=True and back to
    Progressing=False. The operators that will cycle are
    kube-apiserver, openshift-controller-manager,
    kube-controller-manager and monitoring.

    watch oc get clusteroperators
    

    Sample Output.

    NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
    authentication                             4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    cloud-credential                           4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    cluster-autoscaler                         4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    console                                    4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    dns                                        4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    image-registry                             4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    ingress                                    4.2.0-0.nightly-2019-08-27-072819   True        False         False      3h46m
    insights                                   4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    kube-apiserver                             4.2.0-0.nightly-2019-08-27-072819   True        True          False      18h
    kube-controller-manager                    4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    kube-scheduler                             4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    machine-api                                4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    machine-config                             4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    marketplace                                4.2.0-0.nightly-2019-08-27-072819   True        False         False      3h46m
    monitoring                                 4.2.0-0.nightly-2019-08-27-072819   True        False         False      3h45m
    network                                    4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    node-tuning                                4.2.0-0.nightly-2019-08-27-072819   True        False         False      3h46m
    openshift-apiserver                        4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    openshift-controller-manager               4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    openshift-samples                          4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    operator-lifecycle-manager                 4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    operator-lifecycle-manager-packageserver   4.2.0-0.nightly-2019-08-27-072819   True        False         False      3h46m
    service-ca                                 4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    service-catalog-apiserver                  4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    service-catalog-controller-manager         4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    storage                                    4.2.0-0.nightly-2019-08-27-072819   True        False         False      18h
    

    Once all Cluster Operators show Available=True,
    Progressing=False and Degraded=False the cluster is ready
    for shutdown.

Stoppping the cluster VMs

Use the tools native to the cloud environment that your cluster is
running on to shut down the VMs.

The following command will shut down the VMs that make up a cluster on
Amazon Web Services.

Prerequisites:

  • The Amazon Web Services Command Line Interface, awscli, is
    installed.

  • $HOME/.aws/credentials has the proper AWS credentials available to
    execute the command.

  • REGION points to the region your VMs are deployed in.

  • CLUSTERNAME is set to the Cluster Name you used during
    installation. For example cluster-${GUID}.

export REGION=us-east-2
export CLUSTERNAME=cluster-${GUID}

aws ec2 stop-instances --region ${REGION} --instance-ids $(aws ec2 describe-instances --filters "Name=tag:Name,Values=${CLUSTERNAME}-*" "Name=instance-state-name,Values=running" --query Reservations[*].Instances[*].InstanceId --region ${REGION} --output text)

Starting the cluster VMs

Use the tools native to the cloud environment that your cluster is
running on to start the VMs.

The following commands will start the cluster VMs in Amazon Web
Services.

export REGION=us-east-2
export CLUSTERNAME=cluster-${GUID}

aws ec2 start-instances --region ${REGION} --instance-ids $(aws ec2 describe-instances --filters "Name=tag:Name,Values=${CLUSTERNAME}-*" "Name=instance-state-name,Values=stopped" --query Reservations[*].Instances[*].InstanceId --region ${REGION} --output text)

Recovering the cluster

If the cluster missed the initial 24 hour certicate rotation some nodes
in the cluster may be in NotReady state. Validate if any nodes are in
NotReady. Note that immediately after waking up the cluster the nodes
may show Ready – but will switch to NotReady within a few seconds.

oc get nodes

Sample Output.

NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-132-82.us-east-2.compute.internal    NotReady worker   18h   v1.14.0+b985ea310
ip-10-0-134-223.us-east-2.compute.internal   NotReady master   19h   v1.14.0+b985ea310
ip-10-0-147-233.us-east-2.compute.internal   NotReady master   19h   v1.14.0+b985ea310
ip-10-0-154-126.us-east-2.compute.internal   NotReady worker   18h   v1.14.0+b985ea310
ip-10-0-162-210.us-east-2.compute.internal   NotReady master   19h   v1.14.0+b985ea310
ip-10-0-172-133.us-east-2.compute.internal   NotReady worker   18h   v1.14.0+b985ea310

If some nodes show NotReady the nodes will start issuing Certificate
Signing Requests (CSRs). Repeat the following command until you see a
CSR for each NotReady node in the cluster with Pending in the
Condition column.

oc get csr

Once you see the CSRs they need to be approved. The following command
approves all outstanding CSRs.

oc get csr -oname | xargs oc adm certificate approve

When you double check the CSRs (using oc get csr) you should now see
that the CSRs have now been Approved and Issued (again in the
Condition column).

Double check that all nodes now show Ready. Note that this may take a
few seconds after approving the CSRs.

oc get nodes

Sample Output.

NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-132-82.us-east-2.compute.internal    Ready    worker   18h   v1.14.0+b985ea310
ip-10-0-134-223.us-east-2.compute.internal   Ready    master   19h   v1.14.0+b985ea310
ip-10-0-147-233.us-east-2.compute.internal   Ready    master   19h   v1.14.0+b985ea310
ip-10-0-154-126.us-east-2.compute.internal   Ready    worker   18h   v1.14.0+b985ea310
ip-10-0-162-210.us-east-2.compute.internal   Ready    master   19h   v1.14.0+b985ea310
ip-10-0-172-133.us-east-2.compute.internal   Ready    worker   18h   v1.14.0+b985ea310

Your cluster is now fully ready to be used again.

Ansible Playbook to recover cluster

The following Ansible Playbook should recover a cluster after wake up.
Note the 5 minute sleep to give the nodes enough time to settle, start
all pods and issue CSRs.

Prerequisites:

  • Ansible installed

  • Current user either has a .kube/config that grants cluster-admin
    permissions or a KUBECONFIG environment variable set that points
    to a kube config file with cluster-admin permissions.

  • OpenShift Command Line interface (oc) in the current user’s PATH.

- name: Run cluster recover actions
  hosts: localhost
  connection: local
  gather_facts: False
  become: no
  tasks:
  - name: Wait 5 minutes for Nodes to settle and pods to start
    pause:
      minutes: 5

  - name: Get CSRs that need to be approved
    command: oc get csr -oname
    register: r_csrs
    changed_when: false

  - name: Approve all Pending CSRs
    when: r_csrs.stdout_lines | length > 0
    command: "oc adm certificate approve {{ item }}"
    loop: "{{ r_csrs.stdout_lines }}"

Summary

Following this process enables you to stop OpenShift 4 Cluster VMs right
after installation without having to wait for the 24h certificate
rotation to occur.

It also enables you to resume Cluster VMs that have been stopped while
the 24h certificate rotation would have occurred.

Categories
OpenShift Container Engine, OpenShift Container Platform
Tags
, ,