A Kubernetes DaemonSet ensures that an instance of a specific pod is running on all (or a selection of) nodes in a cluster. It creates pods on each node, and garbage collects pods when nodes are removed from the cluster.

The simplest use case is deploying a daemon on every node. However, you might want to split that up into multiple daemon sets. For example, if you have a cluster with nodes of varying hardware, they might need adaptation in the memory and/or cpu requests you include for the daemon.

As our approach fit with this use case, we decided to create a DaemonSet that would deploy pods running netperf’s netserver server-side binary in the background. We thought this might be useful for analyzing networking performance within the OpenShift Container Platform (OCP) cluster.

This post shows how we constructed a netperf DaemonSet from scratch.

Dockerfile

First of all, we need to create a custom docker image that will run the netserver binary.

FROM fedora:27
MAINTAINER josgonza@redhat.com

RUN \
dnf clean all && \
dnf install http://people.redhat.com/mcroce/packages/netperf-2.7.1-3.x86_64.rpm -y

USER 1001

ENTRYPOINT ["/usr/bin/netserver", "-D"]
EXPOSE 12865

NOTE: this container doesn’t need privileged rights so you won’t have to grant them Enable Container Images that Require Root.
This Dockerfile is just for testing purposes and to keep this example as simple as possible, but we strongly recommend following best practices when you create your containers:
- Container Image Guidelines
- 10 things to avoid in docker containers

To avoid the complexity of generating the binaries from scratch, we used the RPM netperf-2.7.1-3.x86_64.rpm, courtesy of Matteo Croce (former rpm from Fedora COPR repository teknoraver/netperf).

Once you have the Dockerfile you only need to build the image, ex: docker build -t netperf-fedora. For testing purposes, you could run it and connect to the container:

docker run -d --name netperf netperf-fedora
docker exec -ti netperf /bin/bash

Finally, tag and push the image to your image registry.

DaemonSet Manifest

Create a DaemonSet manifest with the following contents:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: netperf
namespace: <-your_project->
spec:
selector:
matchLabels:
name: netperf
template:
metadata:
labels:
name: netperf
app-name: netperf
spec:
nodeSelector:
type: NODE
stage: NON_PRODUCTION
containers:
- image: <-your_registry->/netperf-fedora:latest
imagePullPolicy: Always
name: netperf
ports:
- containerPort: 12865
protocol: TCP
resources:
limits:
memory: 256MB
requests:
memory: 256MB
resources: {}
terminationMessagePath: /dev/termination-log
terminationGracePeriodSeconds: 10

Note the .spec.nodeSelector tags. We decided to use non-production computing nodes (not masters or infra nodes) to avoid any impact on production workloads, while still being deployed inside the OCP cluster. Check the DaemonSet docs for details about DaemonSet manifests.

Deploy Daemonset in OCP

Once you have created the YAML for the DaemonSet manifest, login with rights/permissions to modify the selected project (.metadata.namespace in the manifest). Then you can:

  • Create/deploy the DaemonSet
    oc create -f netperf-daemonset.yml
  • Monitor it
    oc get daemonset
    oc get event --sort-by='.lastTimestamp'
  • Delete/Undeploy it
    oc delete daemonset netperf --cascade

Automation of the netperf tests

Now that you’ve deployed a netperf DaemonSet and its pods are running the netserver daemon, you can execute your netperf client tests from any point of the infrastructure within your OCP cluster.
This bash snippet loops through a list of nodes to collect statistics from the netserver daemon pod on each of them:

TSEC=30
ITERATIONS=5

for HOST in $(oc get nodes -o jsonpath='{range .items[?(.metadata.labels.stage=="NON_PRODUCTION")]}{.metadata.name}{"\n"}{end}');
do

...

for iteration in $(seq ${ITERATIONS})
do
yes | ssh $HOST "./netperf -t TCP_STREAM -cC -l ${TSEC} -H ${POD_IP} " | tee -a logs/${iteration}_TCP_STREAM.log
yes | ssh $HOST "./netperf -t TCP_MAERTS -cC -l ${TSEC} -H ${POD_IP} " | tee -a logs/${iteration}_TCP_MAERTS.log
yes | ssh $HOST "./netperf -t TCP_RR -cC -l ${TSEC} -H ${POD_IP} " | tee -a logs/${iteration}_TCP_RR.log
yes | ssh $HOST "./netperf -t TCP_CRR -cC -l ${TSEC} -H ${POD_IP} " | tee -a logs/${iteration}_TCP_CRR.log
done

...

done

...

NOTE: about the outer loop, it’s recommended to filter the OCP nodes (at least to discard the nodes where the DaemonSet has not been deployed). As the jsonpath option has a limited filtering functionality, you can use awk instead if you want a subset of nodes.

Variables

  • TSEC (30): This option controls the length of any one iteration of the requested test.
  • ITERATIONS (5): Number of iterations.
  • HOST: IP/FQDN of the host from you want to execute the tests (netperf binary must exists or the script have to copy it with a previous scp command, for example).
  • POD_IP: destination IP of the pod running the netserver binary listening for client requests.

See the Netperf documentation for more netperf options and features.

Here’s a quick way to parse the results files:

for i in $(ls -d *_TCP_MAERTS.log);do echo $i;awk '/Throughput/,/^[0-9]/{print $5}' $i | egrep -v "[a-zA-Z]"|sed '/^$/d';done
for i in $(ls -d *_TCP_STREAM.log);do echo $i;awk '/Throughput/,/^[0-9]/{print $5}' $i | egrep -v "[a-zA-Z]"|sed '/^$/d';done

Recommended Usage

I recommend having a bastion host with access to the entire OCP infrastructure, and using Ansible to automate the tests.

If you want a random selection of pods for each test rather than a static list, I suggest one of two approaches:

  1. Using OpenShift’s oc command line client and some classic UNIX CLI filter programs:
    POD=$(oc get po -o wide | grep netperf | awk {'print $6'} | shuf -n1)
  2. Using endpoints, so you need to create the netperf service:
    apiVersion: v1
    kind: Service
    metadata:
    labels:
    app-name: netperf
    name: netperf
    namespace: your_project
    spec:
    ports:
    - port: 12865
    protocol: TCP
    targetPort: 12865
    selector:
    app-name: netperf
    sessionAffinity: ClientIP
    type: ClusterIP

    And then POD=$(oc export -n <-your_project-> ep/netperf | grep ip | awk {'print $3'} | shuf -n1)

NOTE: tested with oc v3.6.0

I recommend the second approach, creating a SVC, because:

  • You can use the endpoints to choose OCP nodes. This is quite helpful when you want to test from the same node where the netperf POD IP is deployed:
    NODE=$(oc export -n <-your_project-> ep/netperf | grep -A1 "${POD_IP}" | grep 'nodeName:' | awk {'print $2'})
  • I tried to launch the test through the SVC but could not make it work (probably because TCP headers / NAT or combination of both).
    ./netperf -t TCP_STREAM -cC -l 30 -H ${ClusterIP}  #Failed with timeout

    Any thoughts about how to fix this would be very appreciated.

Other interesting tests would be:
- From one master (in multi-master environment) to a netperf POD IP: .

MASTER=$(oc get nodes -l type=MASTER --no-headers | awk '$2 == "Ready,SchedulingDisabled" {print $1}' | shuf -n1)
  • From one node in the OCP cluster to a netperf POD IP.
  • From any other host deployed in the OCP cluster (like bastion hosts, monitoring hosts ..)

Conclusion

With Kubernetes at its core, OpenShift is a powerful platform that lets you deploy complex or tedious systems/applications in an easy way.

You can use DaemonSets to create shared storage, to run a logging pod on every node in a cluster, or to deploy a monitoring agent on every node, such as Dynatrace.

DaemonSets on OpenShift are also great because they provide useful abstractions for:
- Monitoring and managing logs for daemons in the same way as applications.
- Configuring daemons with the same formats and tools as applications, e.g., Pod templates.
- Running daemons in containers with resource limits to increase isolation between daemons and app containers.