Guide to Kubernetes Storage
Congratulations! You are ready to deploy your applications, from simple internal web sites to business-critical applications, on the best platform available, Equinix Metal.
Have you thought about storage?
Actually, what do we even mean by "storage"?
In this article, we will give an overview of storage in general, then storage for Kubernetes, and finally your options for storage for Kubernetes applications on Equinix Metal. While we won't quite make you a world-class expert, we will give you enough information to make good decisions, and know how and when to reach out for assistance from the world-class Equinix Metal support team.
Why do we need application storage? Somewhere down the line, your applications will need to store data. Your app might just be completely stateless, without any permanently stored information. If that is the case, you don't really need to care about storage... probably.
However, even if your app is stateless, it sends its data to a stateful application, possibly a database, which, in turn, needs to store its data.
Your application is going to need persistent storage.
While it may seem straightforward, there actually are two categories of storage, at least as far as your app is concerned.
- File Storage
- Object Storage
Let's look at each in turn.
File storage doesn't actually have to be on a disk. From your application's perspective, however, it appears as an operating system directory. This is the crucial characteristic of this kind of storage.
Your application expects the operating system to provide it with a directory.
How the operating system does this can come about in one of myriad ways. As a few examples, it can:
- Use a local directory
- Mount a block device, i.e. a locally attached disk
- Mount a block device from a disk image file, create a partition on it, create a filesystem on that partition, and mount it
- Mount a remote block device via a network block storage protocol, such as iSCSI or NVMe-over-TCP
- Mount a remote directory via a network file storage protocol, such as NFS or CIFS/SMB
There are many more options. The key characteristic of all of them is that the application is 100% ignorant of the source of the directory, and where the actual storage takes place. It relies on the operating platform to provide it with a directory which it uses.
Object storage stores objects, like files, in buckets. This is most commonly associated with, and popularized by, AWS Simple Storage Service (S3), but is provided by many cloud providers, as well as several run-it-yourself open-source and commercial closed-source versions. Almost all of them hew to the S3 protocol, which has become the standard.
While buckets look and feel like disk volumes, and objects look and feel like files, they often are radically different under the covers. From your application's perspective, to use object storage, it is acutely aware that it is using object storage, accessing it over a network protocol, over http(s), normally with an object storage SDK. This is the crucial characteristic of this kind of storage.
Your application connects with the storage directly over the network.
The platform on which your application is running has no responsibility for providing the storage to your application; indeed, it is likely to be unaware that your application is using storage at all.
As an aside, it is possible for the platform to take an object storage provider and make it look like a directory to your application, although it is not a common use case, and often not recommended for production applications unless necessary.
If your application expects to find directory space available to it, and for the data it places in there to be persistent, you are looking for your operating platform to make file storage available.
If your applications knows about object storage and expects to communicate with it directly, you are looking for object storage.
With the basics understood, Kubernetes storage is not all that different. We will look at our two options in reverse order, object and then file.
The simplest case is object storage. Since your application knows about the storage and how to connect to it over the network, Kubernetes doesn't have much involvement, at least as far as the application is concerned.
However, as your application is connecting over the network to a storage provider, whether running on the same Kubernetes cluster, on another cluster, or possibly even another cloud provider, there are a few Kubernetes-specific things to keep in mind:
- Is your NetworkPolicy set up to allow your application to communicate to the object storage provider?
- Is DNS set up to enable you to resolve the hostname(s) to the storage provider?
In general, with object storage, your application initiates the connection to the object storage provider, and not the other way around, so issues like Ingress are not relevant. You do, however, need to ensure that your application pod can reach the storage provider, and that it can find and resolve the IP address of that provider.
Let's turn now to file storage.
As discussed above, when an application expects file storage, it really expects its operating system, or more correctly its operating platform to make the directory backed by the storage available. Kubernetes has a role to play here.
Kubernetes provides persistent storage to pods using Persistent Volumes. These volumes, in turn, are created using a Storage Class. The storage class represents the underlying storage mechanism. For example, if I expose three different types of storage to applications on my Kubernetes cluster - high performance over iSCSI, low performance over iSCSI, and local SSD - then I would expose three Storage Classes. The application, when configured, would request a Persistent Volume from one of those classes for each Persistent Volume.
From the application's perspective, the steps are straightforward:
- Check what Storage Class your cluster administrator has made available
- Request a Persistent Volume from that Storage Class
Here is an example of an application requesting a persistent volume from the Storage Class named
kind: PersistentVolume apiVersion: v1 metadata: name: pv1 spec: storageClassName: cloud-standard persistentVolumeReclaimPolicy: Delete capacity: storage: 100Gi accessModes: - ReadWriteOnce # ....
In addition to declaring the Storage Class, you need to declare how big a volume you want, if it is accessible in parallel to multiple consumers, and possibly some Storage Class-specific options.
This, surprisingly, is the simplest of all to answer: it depends on your app.
If your app expects to find its persistent storage as a directory, where it can place and find files, then you need file storage, and you need your cluster administrator to enable it for you. If, on the other hand, your app knows about and can communicate directly with object storage, then you need object storage over the network, and need nothing from your cluster administrator.
And what if you are building your own app? Which path should you choose?
Unfortunately, that is the hardest question to answer, because it depends not on your Kubernetes cluster or your cloud provider, but on the specific requirements and behaviors needed from your very specific application. Our advice? Consult a really good architect.
Having spent some decent time and ink exploring how to consume storage in general, and specifically on Kubernetes, we now turn to how to deploy storage, so that your apps can consume it.
In general, you have three options when it comes to deploying storage, whether it is object storage or file storage:
With a service, you don't think how manage the storage; you simply consume one that a storage service provider makes available to you.
Storage providers normally have a Container Storage Interface provider that connects the storage to your cluster, and creates the Storage Class, referenced above, that you can use.
A storage service CSI will:
- Creates the volumes,
- Attaches them to the node your Kubernetes pod is running on,
- Creates a filesystem, if necessary,
- Connects it to your workload pod.
In the case of object storage, you create the object storage bucket, and then provide the credentials and URL to your application pod.
If you need or want to run your own standalone storage service, do the following:
- Pick what kind of storage you are going to run, and what software is going to provide it. While the application doesn't care if a directory is provided by NFS, iSCSI or carrier pigeon, you will,
- Deploy the storage nodes and software,
- In the case of file storage:
- deploy a Container Storage Interface driver for your particular type of storage
- deploy the appropriate Storage Classes,
- Deploy your workloads.
Don't forget, when running your own storage service, all of the data sits on the local disks of the storage nodes. Ensure you have a rock-solid replication and a reliable and tested backup and restore strategy.
Finally, you can deploy your storage service in Kubernetes itself. To some degree, this only partially solves your problem. After all, if you deploy your storage provider to Kubernetes, it, in turn, must store its data on disks... somewhere.
Yet, it can help in multiple ways. First, it can abstract the problem of storage to applications from the problem of storage software itself. Second, it can simplify - greatly - the challenge of deploying the storage service software itself. Third, it can make consuming the storage by applications easier, since discovery of the storage service and the connection between consuming workload application and providing storage service happens right there in the same cluster.
Many types of storage provider software can be deployed to Kubernetes, whether serving to application workloads in the same cluster, or exposed to workloads from outside that specific cluster.
The following is hardly exhaustive, but lists some typical use cases and what kind of storage to use.
- Database: File storage. Almost every database system requires operating-system provided file storage.
- Large documents: File storage or object storage, depending on your use case.
- Logging: Neither. Send your logs to stdout/stderr, have a per-node collector, like fluentd, gather them, and forward them to a log service.