Using Kubernetes Operators to Manage Let’s Encrypt SSL/TLS Certificates for Red Hat OpenShift Dedicated

Overview

Red Hat OpenShift Dedicated is an enterprise Kubernetes application platform hosted on public cloud providers and managed by Red Hat Site Reliability Engineering (SRE). OpenShift Dedicated enables companies to implement a flexible, hybrid cloud IT strategy by connecting to their datacenter with minimal infrastructure and operating expenses.

Valid SSL certificates are part of the OpenShift Dedicated product offering. Each OpenShift Dedicated cluster is configured with valid SSL certificates to support HTTPS for various endpoints such as kube API server, OpenShift web console, logging, metrics dashboards, and the other hosted applications. Red Hat SREs can provide secure endpoints with 2 wildcard certificates security ofe all the default endpoints with 2 wildcard certificates.

In this post we dive into work our SRE team has done under the hood to make it easier for OpenShift Dedicated to run secure sites that enable secure browsing via a free, automated, and open certificate authority called Let’s Encrypt, brought to you by the non-profit Internet Security Research Group (ISRG). By using a more automated method for managing digital certificates with a Kubernetes Operator, called the Certman Operator, we can automatically manage SSL/TLS certificates for all OpenShift Dedicated clusters.

The Problem at Hand

Red Hat SREs were faced with 3 pressing certificate related challenges as we scaled our offering:

  1. Toil: The most common solution for certificate management is to purchase a new certificate and apply them to each cluster. However, manually purchasing and managing certificates does not scale beyond a handful of clusters.
  2. Cost and vendor lock-in: Certificates, even domain validated certificates, can be expensive and not all certificate authorities provide APIs to manage certificates without additional costs or long-term contracts. This locked us in to bespoke implementations against proprietary APIs making retooling that much harder.
  3. Lack of short-lived certificates: Most certificate authorities issue certificates valid for a minimum of one year. This meant Red Hat had to either absorb the cost of certificates or pass the cost on to customers wanting to spin up short-lived, proof-of-concept and demo clusters.  This meant lost opportunities or wasting precious dollars.

The above challenges along with our desire to centrally manage our certificates led us to   building a certificate management Operator. We just had to find a certificate authority that would allow us to automate certificate lifecycle in a standardized and more cost-effective fashion.

Let’s Encrypt to the rescue

In December 2015, Internet Security Research Group (ISRG), which includes partners from the Mozilla Foundation and the Electronic Frontier Foundation, launched a new certificate authority named Let’s Encrypt.

ISRG has also designed a communication protocol, Automated Certificate Management Environment (ACME), that covers the process of issuing and renewing certificates for TLS/SSL termination. Let’s Encrypt provides its service for free, and exclusively via an API that implements the ACME protocol.

Let’s Encrypt enabled support for wildcard certificates on March 18, 2018. Wildcard certificates enable users to secure, via SSL, all subdomains of a domain with a single certificate. In the case of OpenShift Dedicated clusters, a single wildcard certificate would secure all the applications exposed by the default route and a second certificate would secure the kube API server.

Let’s Encrypt uses an automated system that verifies you control the domain. Unlike regular certificates which can last for years, Let’s Encrypt certificates are only valid for 90 days. This encourages automation and can limit possible damage from key compromise or mis-issuance.  This also helps make certificate renewal is a “non-event” for SRE teams.

Our Solution

Red Hat SRE team chose to write an Operator called Certman to solve our challenges and scale OpenShift Dedicated. An Operator is a method of packaging, deploying, and managing a Kubernetes application. A Kubernetes application is an application that is both deployed on Kubernetes and managed using the Kubernetes APIs and kubectl tooling. The Certman Operator encapsulates our operator logic and uses an open source ACME client library written in Go. Certman Operator is designed to run on our management cluster and to centrally manage SSL/TLS certificates for all OpenShift Dedicated clusters. Certificates issued by Certman Operator are used to secure the kube API endpoint and default router only. To manage SSL/TLS certificates for your custom routers, we recommend using either openshift-acme or cert-manager.

How the Operator Works

  1. A Red Hat customer requests a new OpenShift Dedicated cluster.
  2. Certman Operator monitors the cluster installation progress. When the cluster status indicates that install completion, a CertificateRequest resource is created for that cluster.
  3. Certman Operator requests new certificates from Let’s Encrypt based on the domains configured for the cluster.
  4. Let’s Encrypt requires a “challenge” before issuing a certificate. Certman answers this challenge by adding entries in the cluster’s DNS zone with a TTL of 1 min so that entries can be updated in future and changes are propagated quickly.
  5. DNS propagation is verified using DNS over HTTPS service.
  6. After DNS change propagation is verified, the challenge is answered so Let’s Encrypt can verify that you are in control of the domain’s DNS.
  7. Let’s Encrypt issues certificates once the challenge is successfully completed.
  8. Certificates are stored in a Kubernetes Secret on the cluster management system.  
  9. The cluster management system copies the Secret to the OpenShift Dedicated cluster.
  10. Certman Operator reconciles all CertificateRequest. During the reconciliation loop, the Operator checks the validity of existing certificates. When certificate is set to expire in 45 days or less, the certificate is renewed and the Secret is updated. Renewing certificates early helps us avoid getting email notifications about certificate expiry from Let’s Encrypt.
  11. Updates to a Secret on certificate renewal triggers the management system’s reconciliation which copies the updated Secret to the OpenShift Dedicated cluster. OpenShift detects the Secret has changed and applies the new certificates to the cluster.
  12. When an OpenShift Dedicated cluster is decommissioned, all valid certificates are first revoked and then the Secret is deleted on the management cluster. The cluster management system then continues with deleting other cluster resources.

How OpenShift Dedicated Customers Benefit

If you use OpenShift Dedicated, the digital certificates are handled by our SRE team on your behalf. By moving to Let’s Encrypt and automating the management around digital certificates, clusters are provisioned faster. This automation under the hood can help OpenShift Dedicated be a more cost-effective solution for our customers.

OpenShift Container Platform customers with multiple clusters can take advantage of this work and other tooling built by Red Hat SRE team. We recommend using either openshift-acme or cert-manager to manage SSL/TLS certificates for a single cluster. You can also learn about using acme.sh for requesting and installing Let’s Encrypt certificates for OpenShift 4 from our previous post.

Future Work

We have come a long way from managing certificates manually. We now have a Kubernetes Operator that can automatically manage SSL/TLS certificates for all OpenShift Dedicated clusters. Certman Operator can allow us to manage certificates with little maintenance or human intervention. For the future, we plan to add capabilities to the Operator to gather metrics on our Let’s Encrypt usage and alert us on rate-limited items such as number of new certificate issued in a week, number of attempts made to renew certs and number of pending authorization.

Supporting Let’s Encrypt

At Red Hat, we believe that the work Let’s Encrypt is doing is important to the industry, and they have made it easier to run secure sites. We sponsor Let’s Encrypt because we support their mission, and recognize their work and the value that we have received from it.

Categories
News, OpenShift Dedicated
Tags
, , , ,