OpenShift 4.1 UPI environment deployment on Microsoft Azure Cloud

Red Hat released Red Hat OpenShift Container Platform 4.1 (OCP4) earlier this year, introducing installer provisioned infrastructure and user provisioned infrastructure approaches on Amazon Web Services (AWS). The Installer provisioned infrastructure method is quick and only requires you to have AWS credentials, access to Red Hat telemetry and domain name. Red Hat also released a user provisioned approach where OCP4 can be deployed by leveraging CloudFormation, templates and using the same installer to generate ignition configuration files.

Since Microsoft Azure is getting more and more business attention, natural question would be: when is OCP4 going to be released on Azure Cloud? At the time of the writing OCP4 on Azure using installer is in developer preview

One of the main challenges with running OCP4 on Azure with the installer provisioned infrastructure method is setting upcustom Ingress infrastructure (e.g. custom Network Security Groups or custom Load Balancer for routers), because the Cluster Ingress Operator creates a Public facing Azure Load Balancer to serve routers by default, and once the cluster is deployed, the Ingress Controller type cannot be changed.

If it is deleted, or the OpenShift router Service type is changed, the Cluster Ingress Operator will reconcile and recreate the default controller object.

Trying to alter Network Security Groups by whitelisting allowed IP ranges will cause Kubernetes to reconcile the configuration to it’s desired state.

One of the ways is to deploy OCP4 on Azure Cloud by creating the objects manually with the user provisioned infrastructure approach, and then recreating the default ingress controller object just after control plane is deployed.

Openshift Container Platform 4.1 components

Our cluster consists of 3 master and 2 compute nodes. Master nodes are fronted with 2 Load Balancers, 1 Public facing for external API calls, and 1 Private for internal cluster communication. Compute nodes are using the same Public facing Load Balancer as the masters, but if needed they can each have their own Load Balancer.

Figure 1. OCP 4.1 design diagram with user provisioned infrastructure on Azure Cloud

Instances sizes

The OpenShift Container Platform 4.1 environment has some minimum hardware requirements.

Instance type Bootstrap Control plane Compute nodes
D2s_v3 X
D4s_v3 X X X

Above VM sizes might change once Openshift Container Platform 4.1 is officially released for Azure.

Azure Cloud preparation for OCP 4.1 installation

The preparation steps here are the same as for Installer Provisioned Infrastructure. You need to complete these steps:

NOTE: The free Trial account is not enough and Pay As You Go is recommended with increased quota for vCPU

 

User Provisioned Infrastructure based OCP 4.1 installation

When using this method, you can:

  • Specify the number of masters and workers you want to provision
  • Change Network Security Group rules in order to lock down the ingress access to the cluster 
  • Change Infrastructure component names
  • Add tags

This Terraform based approach will split VMs across 3 Azure Availability Zones and will use 2 Zone Redundant Load Balancers (1 Public facing to serve OCP routers and api and 1 Private to serve api- int)

Deployment can be split into 4 steps:

  • Create the Control Plane (masters) and Surrounding Infrastructure (LB,DNS,VNET etc.) 
  • Set the default Ingress controller to type “HostNetwork”
  • Destroy Bootstrap VM
  • Create Compute (worker) nodes

This method uses the following tools:

  • terraform >= 0.12 • openshift-cli
  • git
  • jq (optional)

Prerequisites

We will deploy Red Hat Openshift Container Platform v4.1 on Microsoft Azure Cloud by using Terraform, since it is one of the most popular Infrastructure-as-Code tools.

Download Git repository content containing terraform scripts:

git clone https://github.com/JuozasA/ocp4-azure-upi.git

cd ocp4-azure-upi

Download the openshift-install binary and get the pull-secret. The OpenShift Installer binary and pull secret can be downloaded following this link.

 

Copy openshift-install binary to /usr/local/bin directory:

 cp openshift-install /usr/local/bin/

 

Generate install config files:

./openshift-install create install-config --dir=ignition-files
? SSH Public Key /home/user_id/.ssh/id_rsa.pub
? Platform azure
? azure subscription id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
? azure tenant id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
? azure service principal client id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
? azure service principal client secret xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
? Region <Azure region>
? Base Domain example.com
? Cluster Name <cluster name. this will be used to create subdomain, e.g. test.example.com>
? Pull Secret [? for help]

Edit the install-config.yaml file to set the number of compute, or worker, replicas to 0:

 compute:

  - hyperthreading: Enabled
    name: worker
    platform: {}
    replicas: 0

Generate Kubernetes manifests which defines the objects bootstrap nodes will have to create initially:

 openshift-install create manifests --dir=ignition-files

Remove the files that define the control plane machines and worker machinesets:

rm -f ignition-files/openshift/99_openshift-cluster-api_master-machines-*
rm -f ignition-files/openshift/99_openshift-cluster-api_worker-machineset-*

Because you create and manage the worker machines yourself, you do not need to initialize these machines.

Obtain the Ignition config files. More about Ignition utility here

openshift-install create ignition-configs --dir=ignition-files

Extract the infrastructure name from the Ignition config file metadata:

 jq -r .infraID ignition-files/metadata.json

Open terraform.tfvars file and fill in the variables:

azure_subscription_id = ""
azure_client_id = ""
azure_client_secret = ""
azure_tenant_id = ""
azure_bootstrap_vm_type = "Standard_D4s_v3" <- Size of the bootstrap VM
azure_master_vm_type = "Standard_D4s_v3" <- Size of the Master VMs
azure_master_root_volume_size = 64 <- Disk size for Master VMs
azure_image_id = "/resourceGroups/rhcos_images/providers/Microsoft.Compute/images/rhcostestimage" <- Location of coreos image
azure_region = "uksouth" <- Azure region (the one you've selected when creating install-config)
azure_base_domain_resource_group_name = "ocp-cluster" <- Resource group for base domain and rhcos vhd blob.
cluster_id = "openshift-lnkh2" <- infraID parameter extracted from metadata.json
base_domain = "example.com"
machine_cidr = "10.0.0.0/16" <- Address range which will be used for VMs
master_count = 3 <- number of masters

Open worker/terraform.tfvars and fill in information there as well. 

Start OCP v4.1 Deployment

Initialize Terraform directory:

terraform init

Run Terraform Plan and check what resources will be provisioned:

terraform plan

Once ready, run Terraform apply to provision Control plane resources:

terraform apply

Once the Terraform job is finished, run openshift-install. It will check when the bootstrapping is finished:

openshift-install wait-for bootstrap-complete --dir=ignition-files

Once the bootstrapping is finished, export the kubeconfig environment variable and replace the default Ingress Controller object with the one with the endpointPublishingStrategy of type HostNetwork. This will disable the creation of the Public facing Azure Load Balancer and will allow you to have custom Network Security Rules which won’t be overwritten by Kubernetes.

export KUBECONFIG=$(pwd)/ignition-files/auth/kubeconfig

oc delete ingresscontroller default -n openshift-ingress-operator

oc create -f ingresscontroller-default.yaml

Since we don’t need the bootstrap VM anymore, we can remove it:

terraform destroy -target=module.bootstrap

Now we can continue with provisioning the Compute nodes:

cd worker
terraform init
terraform plan
terraform apply
cd ../

Since we are provisioning Compute nodes manually, we need to approve kubelet CSRs:

worker_count=`cat worker/terraform.tfvars | grep worker_count | awk '{print $3}'`
while [ $(oc get csr | grep worker | grep Approved | wc -l) != $worker_count ]; do
  oc get csr -o json | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve
  sleep 3
done

Check openshift-ingress service type (it should be type: ClusterIP):

oc get svc -n openshift-ingress

NAME                    TYPE CLUSTER-IP EXTERNAL-IP          PORT(S) AGE               

router-internal-default   ClusterIP 172.30.72.53   <none> 80/TCP,443/TCP,1936/TCP   37m

Wait for installation to be completed. Run the openshift-install command: 

openshift-install wait-for install-complete --dir=ignition-files

Last command will output the cluster console url and kubeadmin username/password.

Scale Up

In order to add additional worker nodes, we use terraform scripts in the scaleup directory. Fill in other information in terraform vars:

azure_subscription_id = ""
azure_client_id = ""
azure_client_secret = ""
azure_tenant_id = ""
azure_worker_vm_type = "Standard_D2s_v3"
azure_worker_root_volume_size = 64
azure_image_id = "/resourceGroups/rhcos_images/providers/Microsoft.Compute/images/rhcostestimage"
azure_region = "uksouth"
cluster_id = "openshift-lnkh2"

Run terraform init and the script:

cd scaleup
terraform init
terraform apply

It will ask you to provide the Azure Availability Zone number where you would like to deploy new node and to provide the worker node number (if it is 4th node, then the number is 3 [indexing starts from 0 rather than 1])

Approving server certificates for nodes

To allow API server to communicate with the kubelet running on nodes, you need to approve the CSR generated by each kubelet.

You can approve all Pending CSR requests using:

oc get csr -o json | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

Conclusion

OpenShift Container Platform 4.1 Internet ingress access can be restricted by changing Network Security Groups on Azure Cloud if we inform the Ingress controller not to create a public facing Load balancer. Since we are using Terraform to provision infrastructure, multiple infrastructure elements are changeable and the whole OpenShift Container Platform 4.1 infrastructure provisioning can be added to the wider infrastructure provisioning pipeline, e.g. Azure DevOps.

It is worth mentioning that at the time of writing, Red Hat OpenShift Container Platform 4.1 deployed on user provisioned infrastructure is not yet supported on Microsoft Azure Cloud and some of the features might not work as you expect, e.g. Internal Image Registry is ephemeral and all images will be gone if the image registry pod get restarted. 

 

Categories
OpenShift Ecosystem
Tags
, ,