Managing the Lifecycle of OpenShift Clusters: Vetting OpenShift Installations

Andy Block is an architect in Red Hat Consulting. Mike Fiedler is on the OpenShift system test team.

Introduction

Whether installing a new release of a software package or just installing an update (such as a bug fix), it is wise to perform tests against the newly installed software in order to confirm that it is performing correctly in the target environment. This is especially true with OpenShift since it contains a number of open source components and can be deployed to a variety of environments, such as an on-prem datacenter, or a public or private cloud.

Both OpenShift and the underlying Kubernetes project contain an ecosystem of source code repositories, and the majority of these repositories contain the core set of functionality driving each product, several repositories have been created specifically for the purpose of testing and validating cluster health. This post introduces a tool that was developed by the OpenShift system test team and validated in actual OpenShift customer environments by the Red Hat’s Consulting team that any OpenShift administrator can utilize to validate the functionality of their own clusters.

The cluster-loader Tool

The OpenShift SVT repository on GitHub is one of the repositories specifically created to validate the health of an OpenShift cluster and contains a collection of tools and utilities to facilitate system testing of the online and enterprise version of OpenShift Container platform. One such utility found in this repositories is cluster-loader, a Python-based tool which allows OpenShift administrators the ability to automatically load a cluster with a variety of components, such as builds, services, routes, and pods.

Any resource that is supported by OpenShift can be leveraged by the cluster-loader tool within a single project, or across multiple projects. Aside from providing support for loading resources, cluster-loader also contains the ability to simulate traffic against pods once they are running to verify various components within the cluster are performing adequately.

Like OpenShift and Kubernetes, cluster-loader makes use of the declarative YAML format within a configuration file to specify the components being created. This file defines the projects and resources that should be created by the cluster-loader tool. The following is a simple example of a basic cluster-loader configuration file:

projects:
  - num: 1
    basename: testproject
    tuning: default
    quota: default
    users:
      - num: 2
        role: admin
        basename: demo
        password: demo
        userpassfile: /etc/origin/openshift-passwd
    pods:
      - total: 10
      - num: 40
        image: openshift/hello-openshift:v1.0.6
        basename: hellopods
        file: default
        storage:
          - type: none
      - num: 60
        image: rhscl/python-34-rhel7:latest
        basename: pyrhelpods
        file: default

quotas:
  - name: default
    file: default

tuningsets:
  - name: default
    pods:
      stepping:
        stepsize: 5
        pause: 10 s
      rate_limit:
        delay: 250 ms

This will create 2 users and 10 pods. 40% of the pods are hello-openshift and 60% are python. cluster-loader will stop for 10 seconds after every 5 pods are created.

The OpenShift system test team sometimes tries to push the limits of Kubernetes and OpenShift in order to validate the product can handle some of the most intensive workloads. The following example simulates the creation of 1000 OpenShift projects, each running a replication controller for 2 “hello-openshift” pods, 5 build configurations, 20 secrets and 2 routes. The analogous cluster-loader configuration file is as follows:

projects:
- num: 1000
  basename: testproject
  tuning: default
  rcs:
    - num: 1
      replicas: 2
      file: default
      basename: testrc
      image: openshift/hello-openshift:v1.0.6
  templates:
    - num: 5
      file: ./content/build-config-template.json
    - num: 20
      file: ./content/ssh-secret-template.json
    - num: 2
      file: ./content/route-template.json
tuningsets:
- name: default
  pods:
    stepping:
      stepsize: 5
      pause: 10 s
    rate_limit:
      delay: 250 ms
quotas:
- name: default

The “templates” section refers to some predefined OpenShift templates containing resources that should be deployed to the cluster from a location relative to the cluster-loader tool. These templates can be customized by modifying the parameters or you can replace them completely with a separate set of templates.

To run either of these examples in an OpenShift environment, execute the following command:

./cluster-loader.py -f <yaml file>

Partial output from the execution of the first configuration file is shown below:

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

to build a new example application in Ruby.

namespace "cakephp-mysql0" labeled

templates:  [{'num': 1, 'file': './content/quickstarts/cakephp/cakephp-mysql.json'}]
service "cakephp-mysql-example" created
route "cakephp-mysql-example" created
imagestream "cakephp-mysql-example" created
buildconfig "cakephp-mysql-example" created
deploymentconfig "cakephp-mysql-example" created
service "mysql" created
deploymentconfig "mysql" created

Parameter 'ifexists' not specified. Using 'default' value.
forking dancer-mysql0
Now using project "dancer-mysql0" on server "https://example.com:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

to build a new example application in Ruby.

namespace "dancer-mysql0" labeled

templates:  [{'num': 1, 'file': './content/quickstarts/dancer/dancer-mysql.json'}]
service "dancer-mysql-example" created
route "dancer-mysql-example" created
imagestream "dancer-mysql-example" created
buildconfig "dancer-mysql-example" created
deploymentconfig "dancer-mysql-example" created
service "database" created
deploymentconfig "database" created

Once the resources have been created within the OpenShift cluster, support is also available to automatically generate traffic against running applications that expose routes to verify they can be accessed successfully along with measuring application performance. This process is accomplished by executing the cluster-loader tool a second time by specifying the traffic generation option. The generator tool makes use of JMeter behind the scenes to measure performance metrics, such as success rate, hits/second, and amount of data transferred.

Once all testing is complete, all components can be removed by executing the following command:

oc delete project -l purpose=test

A Practical Use of the cluster-loader Tool

The flexibility of cluster-loader makes it a tool that can be used to verify real-world OpenShift installations, whether it be in a sandbox, pre-production or even a production environment. Recently, an OpenShift customer was looking for such a solution that they would be able to leverage in their own environment. This particular customer had a large deployment of OpenShift which spread across multiple clusters and contained applications that were critical to the day-to-day operation of the organization.

Since they had such a large deployment, they were looking for a process in which they would be able to perform tests to confirm the state of the environment through mechanisms similar to normal user interaction; such as building applications and deploying resources. Cluster-loader seemed like an ideal candidate for this task. However, since the tool is typically executed on one of the OpenShift masters and as a Python application, it requires additional packages and libraries to be installed on the host, the customer was apprehensive of the potential side effects that could compromise the overall health of the OpenShift environment.

An on-site team from Red Hat Consulting worked with the customer to address their concerns and to derive new solutions to add functionality to support the cluster-loader application. As mentioned previously, the customer’s’ primary concern was how cluster-loader would be executed in their environment. Since OpenShift itself makes use of containers to execute workloads in isolation on host machines within the cluster, a natural decision was made to containerize the cluster-loader application. This required a new Docker image to be created that would include all of the libraries and dependencies required by the tool.

FROM registry.access.redhat.com/rhel7/rhel:7.2

MAINTAINER Andrew Block <andrew_block@optum.com>

ENV SVT_GIT=https://github.com/openshift/svt.git 

RUN curl -o /tmp/epel-release-latest-7.noarch.rpm https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm && \
    yum clean all && \
    yum install -y /tmp/epel-release-latest-7.noarch.rpm && \
    yum-config-manager --enable --enable rhel-7-server-rpms || : && \
    yum-config-manager --enable --enable rhel-7-server-optional-rpms || : && \
    yum-config-manager --enable --enable rhel-7-server-extras-rpms || : && \
    yum-config-manager --enable --enable rhel-7-server-ose-3.4-rpms || : && \
    INSTALL_PKGS="atomic-openshift-tests python2-boto3 python-rbd python-flask PyYAML iproute" && \
    yum install -y $INSTALL_PKGS && \
    rpm -V $INSTALL_PKGS && \
    yum-config-manager --disable epel >/dev/null && \
    yum clean all && \
    mkdir -p /root/svt-git && \
    git clone $SVT_GIT -b $SVT_GIT_BRANCH /root/svt-git/svt


ADD bin/start.sh /root/
ADD lib/validation.py /root/svt-git/svt/openshift_scalability/

WORKDIR /root/svt-git/svt/openshift_scalability

ENTRYPOINT [ "/root/start.sh" ]

NOTE: This example enables the OpenShift 3.4 repository. Adjust the OpenShift version accordingly.

With a containerized environment available to execute cluster-loader, work could begin into how the tool would be put into practice. The initial goal was to make use of the two primary features of the tool: provisioning resources and performing load analysis. Even though cluster-loader has the ability to load an OpenShift cluster with resources, there was no functionality in place to validate the state of the environment after the resources were created. Did a build(s) complete successfully? Did an application deploy? Could the application be accessed once it was successfully running?

Each of these elements are crucial to determining whether an OpenShift cluster is healthy. The Red Hat Consulting team working in conjunction with the customers own development team created a validation tool that could be executed in conjunction with cluster-loader to assess whether the resources were successfully generated and that any resulting applications that were created were healthy and could be accessed from external consumers.

With all of the pieces now available: a tool that can load an OpenShift cluster full of resources, a utility that can validate the state of the execution, and a tool to perform detailed assessment against any running application. The final step was to orchestrate and coordinate the execution of these tasks within the newly created container. A wrapper script was developed and set as the entrypoint so that any time the image was started, the entire end to end process would be carried out.

After extensive testing and execution of the new cluster-loader containerized solution, the customer was satisfied with the results. Their final ask was to streamline how and when the solution would be executed. During the testing phase, the configuration file used to drive the execution of cluster-loader was manually created and maintained on each host and the container itself was manually instantiated. They were looking for a solution to manage the configuration file as it could differ depending on the target environment and to have a hands-off approach to the container execution.

Finally, they wanted to be alerted if any failures occurred at any point. Ansible was already in the customer’s environment as their utility for automating host configuration and along with Ansible Tower for providing centralized management of Ansible Playbooks. The incorporation of Ansible to manage the cluster-loader configuration file on a per host/environment basis along with the scheduling and notification functionality inherent with Ansible Tower provided the customer with a tailored solution they would be able to manage regardless of how large their OpenShift footprint grew.

Conclusion

Good open source project contain tests in addition to the normal project code. On top of the usual low-level unit tests, the Kubernetes and OpenShift projects provide tests which can be run by users in their own test environments to verify functionality and validate that their installation is performing correctly. The tests, along with tools like Ansible, help provide repeatable test procedures which can be used as part of the change management process.

Categories
OpenShift Container Platform, OpenShift Origin, Products
Tags
, , , , ,
  • Dan Yocum

    Andrew,

    This is a good utility for load testing a newly deployed cluster. It’s useful to see if pods can be launched.

    However, it’s missing more basic tests from a cluster administrator point of view.

    When we deploy OpenShift Dedicated Clusters for our customers, we test the following things (specific to AWS right now):

    check_cloud – is it AWS?
    check_region – is it in the region we expect?
    check_vm_size – m4.xlarge for compute & master, r4.xlarge for infra
    check_elbs – at least 3
    check_vpc_peering – no or yes and active?
    check_vpn – no or yes and active?
    check_registry_pods – at least 2?
    check_router_pods – at least 2?
    check_metrics – running?
    check_logging – running?
    check_sdn – correct?
    check_endpoints – console, api, logging, metrics for internal and external facing URLS.
    check_registry – accessible?
    check_image_streams – at least 20 (this is a “reasonable” value).

    HTH!
    Dan
    dyocum@redhat.com