Software Defined Storage: The Next Killer App for Cloud

 

It’s never going to be possible to completely disconnect software from hardware. Indeed, hardware development is having a bit of a rebirth as young developers rediscover things like the 6502, homebrew computing, and 8-bit assembly languages. If this keeps going, in 20 years developers will reminisce fondly and build hobby projects in early IoT platforms, using 2007-era cloud APIs with old refrigerator-sized storage arrays.

In my experience, storage hardware has remained something of a legacy boat anchor in many enterprises: you don’t mess around when it comes to storing your company’s long term data or selecting storage providers for your lights-on, business critical applications. Governments demand it be retained, and data scientists are increasingly building new algorithms based on giant old datasets. For a time after the cloud revolution began in the late 2000’s it seemed that storage hardware wouldn’t be moving to x86 cloud-based virtual machines–much less Linux containers–anytime soon.

Back in 2011, when we acquired Gluster, we had a simple go to market plan. I would tell our customers “bring your own hardware,” as long as it is in the Red Hat Enterprise Linux Hardware Certification List (HCL). You buy the storage software from Red Hat, use your discounting that you have in place with your x86 server vendors, and create the storage solution with either Red Hat Consulting’s help or on your own.

I would ask customers to install Gluster on Red Hat Enterprise Linux in servers that were built for handling compute workloads. While this was an interesting idea, customers were still looking for some features they could only get from hardware vendors; disk replacement services, and the ability to deliver predictable performance for workloads that they cared about. Back then, the term “software-defined storage” did not even exist in the enterprise.

Fast forward 2 to 3 years. By 2015, the idea of ”Software-defined Storage” was beginning to gather buzz. x86 server vendors saw this as an opportunity, and started building servers that were optimized for serving out storage, especially for archival use-cases. I recall many calls with x86 vendor product managers designing server hardware optimized for running storage software. They were always looking for a special secret sauce that would differentiate their servers from others while running our software. But I stayed neutral, and the team did not design storage software that would run better on a specific x86 servers.

This was a breakthrough that helped us create reference architectures, as well as performance and sizing guides for x86 servers with Gluster and then Ceph (after our Inktank acquisition in 2014). We created a software-defined storage solution with Red Hat’s storage bits and industry standard common off the shelf (COTS) x86 servers for a variety of use-cases.

This post illustrates the library of reference architectures we have created over the years. With the advent of x86 servers that are optimized for storage serving, and our library of reference architectures, customer installations felt more concrete and implementation-ready. Despite generic, commodity hardware underneath, we were able to offer performance results for workloads our customers cared about.

In the meantime another interesting thing started happening. We saw customer interest in running our software-defined storage software in virtual machines which could serve out raw block storage in Storage Area Network (SAN) LUNs, or legacy storage arrays via a common mountpoint. I found that customers often had SANs or disk arrays that were not being used either because they did not have a file sharing capability, or for other reasons. They wanted to run our storage software in Virtual Machines (VMs) and serve out unused SAN LUNs to application teams that needed shared storage functionality. This was not possible with traditional storage gear.

I asked our quality engineering team to start testing our storage software stacks in virtual machines backed by SAN LUNs. Soon, we productized the instantiation of our software bits in VMs backed by SAN LUNs.

This provided an improvement in time to value as it allowed for faster storage provisioning and enabled re-use of traditional storage arrays which were unused and/or did not offer a file sharing persona. VMs became an important deployment method for our storage software to run select workloads giving software-defined storage a flexibility advantage. We expanded on this capability and launched Red Hat Gluster Storage in public clouds, including AWS, Azure and Google Cloud Platform. This added more environments in which users could run our storage software.

The ability to run software-defined storage functionality in these environments culminated in  2016 with Kubernetes and containers. Kubernetes and containers provide a cluster commoditization technology with a sophisticated scheduler which can be used in both public and private clouds to run containerized applications and microservices.

Applications that ran on Kubernetes and needed performant and stable persistent storage would also need it in environments where Kubernetes and Red Hat OpenShift were supported. This is a challenge for both traditional storage and public cloud native storage offerings. Neither of them could possibly provide a consistent storage consumption and management experience in all environments where an enterprise customer would want to run Red Hat OpenShift. The traditional storage vendors could not ship their storage gear to AWS and cloud storage systems like EBS and S3 are not available on-premise.

Software-Defined storage solutions from Red Hat like Gluster & Ceph offered a solution. Already used in active production deployments for several years, they were tested and ready for use. We have found what I believe to be the proverbial “killer app,” as persistent storage for containers. You can run our storage software everywhere OpenShift runs, and can provide a persistent storage management fabric with a consistent storage consumption and management experience therein.

Customers can consume raw block storage from multiple hosting environments and serve that out to containerized applications wherever they have an OpenShift cluster.

We even containerized our storage software stacks and run them on top of OpenShift. Storage has now become another podified application deployed, managed, scaled and upgraded using Kubernetes primitives. Once you have learned how to implement Red Hat’s OpenShift Container Storage you can implement it anywhere you run OpenShift. Software-defined storage has changed, found a sweet spot and can be a leading storage provider for Kubernetes. While workloads are now beginning to migrate to Kubernetes, it’s that next layer of problems that are beginning to surface in production environments. The more these problems are related to storage, the more compelling software-defined-storage can become as a solution to the problems of storing data in dynamically provisioned environments.

Categories
Containers, OpenShift Container Platform, OpenShift Ecosystem
Tags
, , , , , ,