Long Live the App: Maintenance and Upgrades in OpenShift 3.6+ with PodDisruptionBudgets

September 5, 2017mrobson@redhat.com

As we work harder to automate cluster administration activities like OpenShift upgrades and OS patching, it becomes more difficult to ensure the availability requirements of applications. In large clusters, the Ops team may not have a detailed understanding of which pods represent an application. They also may not be able to ensure their minimum capacity requirements are maintained. Without that knowledge, you may inadvertently bring down or inhibit multiple applications through a simple rolling server restart during server maintenance.

Introduced as Tech Preview in OpenShift 3.4 and now fully supported in OpenShift 3.6, PodDisruptionBudgets (henceforth PDBs) provide a concise way for the application team to communicate enforceable operating requirements to the cluster. Simply put, a PDB allows the application owner to define a minimum number of pods that should be available for that application to operate in a stable manner. Any action that leverages the eviction API (such as drain) will provide that minimum at any given time.

Let’s take a look at how to create a PDB and what enforcement looks like from inside OpenShift.

The PodDisruptionBudget Object

To illustrate this, we will use an Openshift Router as our example pod. What the below object tells us is that we are creating a PDB called router-pdb that uses a selector to match pods with the label router: router and to ensure that there will be at least one pod available.

# cat router-pdb.yaml

apiVersion: policy/v1beta1

kind: PodDisruptionBudget

metadata:

  name: router-pdb

spec:

  selector:

    matchLabels:

      router: router

  minAvailable: 1

To create the PDB object, we need 2 pieces of information:

The selector
An appropriate minimum

Note: minAvailable can be expressed as an integer or as a percentage of total pods. If an application had 2 replicas, minAvailable: 1 and minAvailable: 50% would achieve the same goal.

Take a look at the router DeploymentConfig for that information:

# oc describe deploymentconfig router

Name: router

Namespace: default

Labels: router=router

Selector: router=router

Replicas: 2

---

---

Here we find our label and that the current number of replicas is two. Setting minAvailable to one gives us a disruption budget of one. That means only one of the two pods can be unavailable at any given time.

PodDisruptionBudgets in Practice

The first step is to create the PDB using the YAML we created above. Make sure you create the PDB in the project where the pods run:

# oc create -f router-pdb.yaml

poddisruptionbudget "router-pdb" created

Looking at the PDB we can see that, as noted above, the allowable-disruption is one and the minimum available is one.

# oc get poddisruptionbudget

NAME        MIN-AVAILABLE   ALLOWED-DISRUPTIONS AGE

router-pdb  1               1                   13m

In more detail, the created PDB object looks like this:

# oc describe poddisruptionbudget router-pdb

Name: router-pdb

Min available: 1

Selector: router=router

Status:

Allowed disruptions: 1

Current: 2

Desired: 1

Total: 2

Next, let’s drain a node and see what happens:

# oc adm drain mrinfra1.example.com --grace-period=10 --timeout=10s

node "mrinfra1.example.com" cordoned

pod "router-2-t0z9g" evicted

node "mrinfra1.example.com" drained

Our infra node was successfully cordoned, the router pod was evicted, and the drain completed successfully.

Viewing the pods, we can see that one router is still running and one is pending.

# oc get pods

NAME             READY   STATUS    RESTARTS  AGE

router-2-kjs96   1/1     Running   0         42d

router-2-lbjbk   0/1     Pending   0         <invalid>

The second router is pending because mrinfra1.example.com is still SchedulingDisabled from the drain.

# oc get nodes

NAME                    STATUS                    AGE

mrmaster1.example.com   Ready,SchedulingDisabled  54d

mrmaster2.example.com   Ready,SchedulingDisabled  54d

mrmaster3.example.com   Ready,SchedulingDisabled  54d

mrinfra1.example.com    Ready,SchedulingDisabled  54d

mrinfra2.example.com    Ready                     54d

mrnode1.example.com     Ready                     54d

mrnode2.example.com     Ready                     54d

mrnode3.example.com     Ready                     54d

mrnode4.example.com     Ready                     54d

Inspecting the the PDB, we can see that our allowed disruptions have gone from one to zero, indicating the application or service can no longer tolerate additional pods being down.

# oc get poddisruptionbudget

NAME         MIN-AVAILABLE   ALLOWED-DISRUPTIONS  AGE

router-pdb   1               0                    23m

What happens if we try to drain our other infrastructure node? I have added a grace period and timeout to show the failure:

# oc adm drain mrinfra2.example.com --grace-period=10 --timeout=10s

node "mrinfra2.example.com" cordoned

There are pending pods when an error occurred: Drain did not complete within 10s

pod/router-2-kjs96

error: Drain did not complete within 10s

The drain operation failed as there was no room in the PDB. If you look at the logs, you would see the eviction request return a HTTP 429 - Too Many Requests, which in the case of PDBs, means the request failed, but may be retried and succeed at another time.

# journalctl -u atomic-openshift-node.service | grep 'router-8-1zm7k'

---

I0830 11:00:36.593260   12112 panics.go:76] POST /api/v1/namespaces/default/pods/router-8-1zm7k/eviction: (11.416163ms) 429

---

Running the same drain again with no timeout, you would see it waiting indefinitely to try complete:

# oc adm drain mrinfra2.example.com node "mrinfra2.example.com" cordoned <WAITING>

pod "router-2-kjs96" evicted node "mrinfra2.example.com" drained

While the above drain is waiting, make mrinfra1.example.com schedulable again:

# oc adm manage-node mrinfra1.example.com --schedulable

NAME                   STATUS   AGE

mrinfra1.example.com   Ready    54d

Watching your pods as that happens, you see router-2-kjs96 is still running. After that, router-2-lbjbk goes from pending to creating to running. Provided that the new pod is Running, the available disruption budget will go back to one and the drain will terminate router-2-kjs96. If the pod successfully terminates, the drain completes. When mrinfra2.example.com is marked schedulable again the second router replica will redeploy as well.

# oc get pods -o wide -w

NAME             READY    STATUS             RESTARTS   AGE   NODE

router-2-kjs96   1/1      Running            0          42d   mrinfra2.example.com

router-2-lbjbk   0/1      Pending            0          5m    <none>

router-2-lbjbk   0/1      ContainerCreating  0          6m    mrinfra1.example.com

router-2-lbjbk   0/1      Running            0          6m    mrinfra1.example.com

router-2-lbjbk   1/1      Running            0          6m    mrinfra1.example.com

router-2-kjs96   1/1      Terminating        0          42d   mrinfra2.example.com

router-2-gqhh6   0/1      Pending            0          0s    <none>

router-2-kjs96   0/1      Terminating        0          42d   mrinfra2.example.com

router-2-gqhh6   0/1      Pending            0          31s   <none>

router-2-gqhh6   0/1      ContainerCreating  0          31s   mrinfra2.example.com

router-2-gqhh6   0/1      Running            0          37s   mrinfra2.example.com

router-2-gqhh6   1/1      Running            0          51s   mrinfra2.example.com

As clusters continue to grow, PDBs offer an elegant way to define the needs of the application as a first class citizen. Now is a great time to start the discussion with your development teams!

About the author

mrobson@redhat.com

Browse by channel

Explore all channels

Platform products

Try & buy

Featured cloud services

By category

By organization type

By customer

Featured

Topics

Articles

More to explore

For customers

For partners

About us

Open source

Company details

Communities

Recommendations

Select a language

Select a language

Long Live the App: Maintenance and Upgrades in OpenShift 3.6+ with PodDisruptionBudgets

The PodDisruptionBudget Object

PodDisruptionBudgets in Practice

About the author

mrobson@redhat.com

More like this

Browse by channel

Products

Tools

Try, buy, & sell

Communicate

About Red Hat

Select a language

Red Hat legal and privacy links

Red Hat legal and privacy links