Configuration Drift Prevention in OpenShift: Resource Locker Operator

Introduction

There are times in which we must be absolutely sure that a set of Red Hat OpenShift configurations “stay in place” lest an application, or potentially the entire cluster, becomes unstable.
The standard mechanism for preventing configuration drift in OpenShift is Role Based Access Control (RBAC); if only subjects with legitimate access can change the configuration, one should be certain that the configuration does not differ from the expectate state. RBAC works fine with this purpose, but there are some limitations:

  1. Complex and fine grained RBAC configurations are difficult to create, especially now that with operators and Custom Resource Definitions (CRD’s), the API surface is ever increasing. Few organizations have the discipline to create a comprehensive RBAC strategy for an OpenShift deployment.
  2. RBAC cannot prevent human error.

In this post, a different and complementary-to-RBAC approach to configuration drift prevention will be presented.

Resource Locker Operator

The purpose of the Resource Locker Operator is to prevent configuration drift on a set of resources. It achieves that goal not by preventing access to those resources, but by watching them and restoring any undesired change.
While the operator does not replace RBAC, it can be considered complementary to provide a more holistic solution.
With the Resource Locker Operator, one can lock-in two types of configurations:

  1. Resources: any kind of resource can be defined instructing the operator to create that resource and prevent configuration drift on it.
  2. Patches: patches on pre-existing resources can be defined. The Resource Locker Operator will enforce the patch, allowing the rest of the resource to change.

The Resource Locker Operator introduces the ResourceLocker CRD which allows a user to express the desire to lock a configuration. A representation is found below:

apiVersion: redhatcop.redhat.io/v1alpha1
kind: ResourceLocker
metadata:
name: test-simple-resource
spec:
resources:
- object:
apiVersion: v1
kind: ResourceQuota
metadata:
name: small-size
namespace: resource-locker-test
spec:
hard:
requests.cpu: "4"
requests.memory: "2Gi"
serviceAccountRef:
name: default

This ResourceLocker configuration defines a ResourceQuota object.
Let’s look in more detail at the different types of configurations that can be applied.

Locking Resource Configurations

The example shown in the previous section illustrated how to define resources within a ResourceLocker object The resource can be of any type available in the cluster. Given that the resources field is an array, multiple resources can be defined in a singleResourceLocker CR.
Kubernetes resources have fields that legitimately need to change. By default, the metadata and the status fields of all resources are subject to change by the controllers that monitor that specific resource. Instead, the spec portion of the resource is normally used to specify a desired state, which is what we want to lock down. However, there are instances where portions of the spec field have legitimate reason to change themselves. For example, the spec.replicas field in some resource types is considered acceptable to be modified.
The Resource Locker Operator provides a generic mechanism to specify which fields to exclude from consideration. These fields will be free to change, while changes on the other fields will be considered configuration drift and will be reset by the operator. Fields to be excluded from the watch can be specified in the excludedPaths field of the CR. In addition, metadata, status and spec.replicas are always added by default as excluded fields. This default configuration is designed to work without issue in most situations. Given these defaults, the previous example becomes:

spec:
resources:
- excludedPaths:
- .metadata
- .status
- .spec.replicas
object:
apiVersion: v1
kind: ResourceQuota
metadata:
name: small-size
namespace: resource-locker-test
spec:
hard:
requests.cpu: '4'
requests.memory: 2Gi

Locking Patch Configurations

Patches are useful when we need to modify an object that we don’t own. For example, node objects are in general owned by the cluster installer or the Machine API Operator. It is not uncommon for a cluster administrator to add labels to nodes.
This type of situation is common in sophisticated distributions of Kubernetes, such as OpenShift, in which the install operator (Cluster Version Operator in OpenShift’s case) pre-populates several resources and the administrator can choose to modify those resources to configure the cluster if a specific need arises (day two configuration). In this situation, the administrator does not actually own those resources as those resources are owned by the cluster.
When the cluster upgrades or a specific operator upgrades, they need to be able to change those resources. When the upgrade is completed, the administrator might have to reapply the day two configurations.
By being able to lock in a patch, we avoid having to reapply the day two configurations. As soon as the operator that owns the resource is finished applying the upgrades, the Resource Locker Operator will reapply the configured patch.
A patch can be represented by the following example:


patches: - targetObjectRef: apiVersion: v1 kind: ServiceAccount name: test namespace: resource-locker-test patchTemplate: | metadata: annotations: hello: bye

Here, we are patching the target object, which, in this case, is a service account with a constant annotation.
We can also patch resources based on values present in other resources:

- targetObjectRef:
apiVersion: v1
kind: ServiceAccount
name: test
namespace: resource-locker-test
patchTemplate: |
metadata:
annotations:
{{ (index . 0).metadata.name }}: {{ (index . 1).metadata.name }}
patchType: application/strategic-merge-patch+json
sourceObjectRefs:
- apiVersion: v1
kind: Namespace
name: resource-locker-test
- apiVersion: v1
kind: ServiceAccount
name: default
namespace: resource-locker-test

In the example above, we use the sourceObjectRefs field to specify a list of references to other resources that will become the parameters of our patchTemplate. The patch template field is then processed as a go template receiving as input the array of parameters. The result of the processed template is treated as a patch which is then applied to the target object.
Additional options are available for patches and can be found in the Resource Locker Operator GitHub repository.

Multitenancy and Security

The Resource Locker Operator runs with very limited privileges (it can only manipulate ResourceLocker resources). However, it still needs to be able to enforce resources and patches on any object type. How is that attained?
When creating a ResourceLocker, the user must also pass a reference to a service account located in the same namespace as the ResourceLocker CR. This service account will be used to manage the configurations as defined by the given ResourceLocker CR. With this approach, security escalations can be prevented since a user can only enforce actions on resource types for which they were previously granted permissions. This functionality also provides the ability to run a single Resource Locker Operator instance to control an entire cluster (as opposed to one per namespace).

Installation

It’s possible to install the Resource Locker Operator from the OperatorHub or via a Helm chart.
Details on both installation approaches can be found on the Resource Locker Operator GitHub repository.

Conclusion

The Resource Locker Operator provides mechanisms to prevent configuration drift which are complementary to RBAC. It partly overlaps in functionality with a GitOps operator (such as ArgoCD, for example). In fact, a GitOps operator is designed to detect and correct drifts. However, based on my observations, GitOps operators are good at enforcing the resources they own, but they do not provide the capabilities to manage patches.
As with any Open Source project, end user feedback is welcome in order to enhance the overall functionality of the Resource Locker Operator.

Categories
OpenShift Ecosystem
Tags