Collecting debugging information from a large set of nodes (such as when creating SOS reports) can be a time consuming task to perform manually. Additionally, in the context of Red Hat OpenShift 4.x and Kubernetes, it is considered a bad practice to ssh into a node and perform debugging actions. To better accomplish this type of operation in OpenShift Container Platform 4, there is a new command: oc adm must-gather, which will collect debugging information across the entire cluster (nodes and control plane). More detailed information on the must-gather command can be found in the platform documentation.
While using the must-gather command is fairly straightforward, the full end-to-end process to facilitate all of the available tasks can be time consuming. This process involves issuing the command, waiting for the associated tasks to complete, and then upload the resulting information to the Red Hat case management system.
A way to further streamline the process is to automate these actions.
The must-gather operator streamlines running the must-gather command and uploading the results to the Red Hat case management system. The must-gather operator is intended to be used only by the cluster administrator as it requires elevated permissions on the cluster. A must-gather run can be started by creating a MustGather custom resource (CR) similar to the following:
apiVersion: redhatcop.redhat.io/v1alpha1 kind: MustGather metadata: name: example spec: caseID: 'XXXXXXXX' caseManagementAccountSecretRef: name: case-management-creds serviceAccountRef: name: must-gather-admin
Within the MustGather CR, three parameters can be defined:
- caseID. Red Hat Support case to which the resulting output will be attached.
- caseManagementAccountSecretRef: secret containing the credentials needed to login and upload files to the Red Hat case management system.
- serviceAccountRef: service account with the cluster-admin role that is used to run the must-gather command. Running as a cluster-admin is a must-gather requirement.
When this CR is created, the operator creates a job that runs must-gather operations, and uploads the resulting information in a compressed file.
The must-gather operator watches only the namespace in which it is deployed. This should make it easier for a cluster administrator to configure limited access to that namespace. This is recommended as that namespace needs to contain a service account with cluster-admin privileges for the reason seen before and therefore needs to be properly protected.
Running Additional Must-Gather Images
The must-gather command supports the option of running multiple must-gather compatible images that can be used for collecting additional information. This option is typically limited to OpenShift addons, such as Kubevirt and OpenShift Container Storage (OCS). The must-gather operator supports this functionality by allowing these images to be specified as in the following example:
apiVersion: redhatcop.redhat.io/v1alpha1 kind: MustGather metadata: name: example-more-images spec: caseID: 'XXXXXXX' caseManagementAccountSecretRef: name: case-management-creds serviceAccountRef: name: must-gather-admin mustGatherImages: - quay.io/kubevirt/must-gather:latest - quay.io/ocs-dev/ocs-must-gather
As you can see, the mustGatherImages property is an array of strings representing images. When added to a must-gather CR, all the specified images in addition to the default must gather image will be run.
The project GitHub repository contains detailed information on how to install the must-gather operator.
Being able to provide diagnosis information in a consistent fashion makes it easier for Red Hat support to aid in the resolution of issues. A more streamlined and automatic information collecting process makes it more likely for the customer to be able to provide timely debugging information to Red Hat support. The must-gather operator aims to help in this space.