Local storage: choosing the right solution for your cluster
Local storage is almost always the best answer if you are running stateful applications in your Kubernetes cluster. But with so many local storage options out there, choosing the right one is hardly straightforward. For this reason, we’ve just finished putting some of the leading solutions to the test....
Liquid Metal is Here: Supported, Multi-Cluster Kubernetes on micro-VMs and Bare Metal
You aren't Doing GitOps without Drift Detection
KubeCon and GitOpsCon EU, 2022 - Git Involved!
This blog and whitepaper has been written by Stuart Caine, Customer Reliability Engineer here at Weaveworks.
When it comes to storage, Kubernetes can be used comfortably with distributed file systems such as NFS or GlusterFS. But if you need to run stateful workloads in your cluster, the best solution is usually to go local. But with so many local storage options out there, choosing the right one is hardly straightforward.
For this reason, we’ve just finished putting some of the leading solutions to the test - and we’ve documented the results in a new white paper, which you can download for free here.
The paper explores the performance of several local storage solutions against a range of benchmarks – and crucially, it includes the actual output of the tests, so you can evaluate whether the results are going to be pertinent for you.
The solutions we explored at the outset included Linstor, StorageOS, Portworx, Ceph, Longhorn and Robin. We then narrowed them down to the following shortlist for extensive testing:
If you just want to know which one won, you can scroll down to the bottom. But we recommend reading this post in full, even if you don’t have time to pore over the detailed test results in the white paper. That’s because none of the solutions we examined were essentially bad. Some simply performed less well on the specific tests we chose to run.
What we tested for
Our goal was to identify one or more solutions that offer the best of three worlds: performance, resiliency and scalability. Specifically, we identified the following measures of success for each one:
- Ease of implementation
- Whether it can replicate data asynchronously
- Whether it is topology-aware and understands failure domains
- Whether It offers managed scheduling of stateful pods
- Whether it offers strong encryption in transit and at rest
The tests themselves
We deployed each of the shortlisted products into a Kubernetes cluster, then performed the following tests on each one:
- Kafka: we created lots of records, ranging from small chunks (to push the IOPS), to large chunks (to push the bandwidth).
- Elasticsearch: we used esRally, an Elasticsearch performance tool, to create real data on which to run our performance tests on it, measuring time, throughput and latency.
- Postgres: we used PGbench to simulate SQL commands so we could measure transactions per second and latency.
- FIO: We used this to measure IO workloads, sequential/random read/writes and bandwidth.
- Failover tests: planned and unplanned failovers were committed while Elasticsearch was running in the cluster and the same esRally tests were performed.
The big question: who won?
There’s a long way and a short way to answer that.
We’ll get to the short answer in a moment. But the longer answer starts with the caveat that we found all the solutions we tested to be credible and capable. All were clearly well designed and well supported. And all of them had their own pros and cons.
Longhorn delivered some inconsistent results (especially with FIO) and we also found it lacking when it came to sophisticated scheduling and encryption. Even that shouldn’t rule it out for the long term: it may be less mature right now, but it is definitely one to watch. Marginally, however, two of the products stood out: StorageOS and Linstor.
We found StorageOS slightly easier to use. Even its documentation was better. Linstor, on the other hand, offered more opportunity for fine tuning. Each product still had its faults. StorageOS needs to improve its scheduling, for example, to ensure pods and volumes are always located on the same nodes – its developers are working on this. Linstor needs to fix replica rescheduling – which currently requires manual work – and a GUI would be a nice addition. Both need to work on the reconciling of Kubernetes nodes when nodes are deleted.
So, finally, the short answer: We awarded StorageOS a narrow win, thanks to its combination of simplicity and high performance. But as we note above, all the options we examined had their merits. If you need to deploy a local storage solution for your Kubernetes cluster, we urge you to read the white paper in its entirety, so you can gain a full understanding of what will work best for your needs.