MLOps on Kubernetes with Portable Profiles
This post introduces a new feature called Profiles, which allows you to create a specific Kubernetes application platform to meet your business needs. We show how you can enable machine learning operations or MLOps with specific Profiles for two different types of Kubernetes instances - EKS and Kubernetes with Firekube.

If you ask an application developer what they want from Kubernetes, the answer often depends on what they wish to accomplish. For example, “I want to run my machine learning applications using Kubernetes and Tensorflow”, or “I am building a mobile application and I need a mobile backend in the cloud”.
In this post, we introduce a new feature called Profiles. A Profile allows you to create a specific Kubernetes application platform to meet your business needs. The Kubernetes add-ons for specific use cases that make up a Profile are managed and configured in git with GitOps. Since Profiles are portable across all conformant Kubernetes offerings, both in public clouds and in on-prem infrastructure stacks, a consistent developer experience can be ensured.
Profiles make it easy to say:
- Give me a machine learning platform
- Give me a Kubernetes platform for mobile development
To show you how Profiles work, we’ll review one we’ve developed to get developers quickly started on EKS. We’ll also demonstrate how you can enable machine learning operations or MLOps with Profiles. To establish the portability of Profiles, we’ll install the Profile onto two different types of Kubernetes instances:
- Amazon’s managed Kubernetes service
- Vanilla Kubernetes with Firekube, running on a laptop
Example #1: EKS Quickstart Profile
Before diving into a machine learning example, let’s consider something much simpler. The EKS Quickstart Profile is a way for developers to quickly provision an EKS cluster. The Quickstart is included as a feature of EKSctl that is implemented as a Profile. As a Profile, an application developer can provision Kubernetes with a basic application stack, consisting of some commonly used add-ons such as Helm, Prometheus and a Kubernetes dashboard.
Let’s look at what this means:
- The Profile is a bundle that allows developers to reliably deploy applications to EKS.
- The Profile’s add-ons are managed using GitOps. Kubernetes add-on applications are deployed from git repos that are managed with GitOps. GitOps provides a way to change the properties of the runtime stack by interacting with the config model kept in git. Keeping config in git and managing it with GitOps guarantees security, and recoverability with a full audit trail that meets compliance regulations.
- The Profile is portable and is not limited to EKS. Profiles are portable and can run across different Kubernetes stacks. An easy way to enable Profile portability on your own clusters or a laptop is to use Firekube. You can also use WKSctl to launch and manage Quickstarts anywhere; on-premise, bare metal, or in any public cloud.
We’ll say more about how this works later.
Example #2: Portable MLOps Profiles
Machine Learning Operations or MLOps provides the missing link between data scientists and the operations team. By using a Profile, you can automatically provision infrastructure and provide a consistent and reliable configuration on which our applications can run. And since GitOps can manage both application deployments and the cluster lifecycle including its configuration, it is a natural to also use GitOps for managing and deploying machine learning models.
Note the following two important points:
- In this demo, the MLOps Profile quickly provisions a machine learning stack and its cluster with GitOps.
- That same MLOps Profile can be used to set up additional clusters elsewhere to serve the finished model; either on-premise, in a public cloud or on your laptop.
The MLOps scenario
Most data scientists develop and train their machine learning models on EKS to take advantage of the virtually infinite computing resources available on AWS.
After the ML training experiments have completed, the best model from the training process is served by another cluster. This alternate cluster could be a cluster of containerized VMs spun up by Firekube onto your laptop for local development or QA or the newly trained model may be served by EKS for production or to any other cluster.
Provisioning an MLOps cluster
The MLOps Profile is built on top of Kubeflow, a machine learning toolkit for Kubernetes. This is an opinionated machine learning Profile that offers everything you wanted from Kubeflow in the GitOps way.
A Kubeflow-based MLOps cluster can be launched with the following three commands:
#1 Create an EKS cluster and include the MLOps Profile:
$ eksctl create cluster eks-kubeflow \ --node-type=m5.xlarge \ --nodes=2 \ --region=us-west-2 \ --node-volume-size=120 \ --timeout=40m
#2 Create an empty Github repository and pass the repository name to eksctl {repo, Profile} with this:
$ EKSCTL_EXPERIMENTAL=true eksctl \ enable repo \ --git-url=git@github.com:<user>/eks-Profile-demo \ --git-email=flux@users.noreply.github.com \ --cluster=eks-kubeflow \ --region=us-west-2
To give Flux access to the repository for deployments and enable GitOps, add the SSH public key that appears to the following:
https://github.com/<user>... Enable the MLOps Profile with:
$ EKSCTL_EXPERIMENTAL=true eksctl \ enable Profile \ --git-url=git@github.com:<user>/eks-Profile-demo \ --git-email=flux@users.noreply.github.com \ --cluster=eks-kubeflow \ --region=us-west-2 \ git@github.com:weaveworks/mlops-Profile
The GP2 storage is the default storage class for this EKS cluster, but we still need another storage class as the shared file system for the machine learning tasks. On production, AWS EFS would be a great one. On the Firekube systems, we’ll use CSI-S3 with MinIO-backend as an alternative.
From Jupyter Notebook to Kubeflow Pipeline
The following video demonstrates a new way to automate a machine learning pipeline. Thanks to the Jupyter Notebook plugin from the KALE team, it allows us to develop our machine learning model using a Jupyter Notebook and convert it to run as a Kubeflow pipeline. We extend a KubeCon KALE example by adding another pipeline step to choose the best model and store it on an S3 compatible storage system to later be served by another cluster.
How it works
This post has provided a manual, step by step guide to using the MLOps Profile. If you would like to know more, or if you want enterprise products to make this repeatable and robust please contact us to discuss our GitOps Manager functionality in Weave Kubernetes Platform.
With that said, let’s recap on what we showed above.
Profiles are:
- A bundle of add-ons delivering a consistent developer experience on Kubernetes.
- Deployed and managed using GitOps.
- Portable across different Kubernetes options.
Many people have remarked that Kubernetes is a platform to build platforms by design. With this in mind, a Profile is an abstraction layer to enable platforms on Kubernetes on your desktop, on-premise or in the cloud.

Jump Jump!! Platforms enable us to go higher!
The role of GitOps
GitOps gives us a way to manage Profiles. We can make sure that the whole infrastructure can be repeatedly provisioned. This is achieved using tools like EKSctl and WKSctl. Applications and services may be deployed on top of this, also using GitOps tools e.g. Argo, Flux, Flagger.
GitOps adds value to the ML use case. In practice, a git hosting service, like Github, is used to store our artifacts. We store all cluster configurations on Github, including the manifests provided by Profiles. For ML people, machine learning artifacts, like Jupyter notebooks, are also managed by Git. For datasets and ML models, we usually use S3 or its compatible object storages to store them. This object store is also managed as a part of GitOps process, similar to container image repositories.
Portable Profiles
As Kubernetes users, we all feel the same pain point when putting effort into moving workloads between different clusters.
Portable Profiles help us solve this problem. To run our workloads nicely on two different clusters, we have to accept the diversity of Kubernetes resources. And we need a practical abstraction layer, portable Profiles, for this diversity.
Profiles are now portable across two Kubernetes stacks, EKS with EKSctl and Firekube with WKSctl. Together with GitOps, we can easily commit code of our app, push it to the repo. Then our apps will be picked up by the GitOps process and go live on production with consistent behaviour, regardless of the Kubernetes cluster they are running on.
With Profiles, again we will be able to provision a whole new cluster with Firekube or WKP using the same MLOps Profile. So we have exactly the same set of ML cluster features but this time it runs on on-premise hardware.
First we clone from the Firekube quickstart repository:
$ git clone https://github.com/weaveworks/wks-quickstart-firekube
change each machine size to be 12G of RAM. Then call:
$ ./start.sh
After the cluster is up and running, we apply the same MLOps Profile using the `WKSctl` Profile command:
$ wksctl Profile enable \ --git-url=https://github.com/weaveworks/...
waiting for Kubeflow to be fully running on Firekube, we copy our “best model” to our Firekube’s object storage.
Note that there is a bug in Kubeflow 0.7 webhook that prevents the model server to start.
Please run the following command as a workaround:
$ kubectl delete \ mutatingwebhookconfigurations.admissionregistration.k8s.io \ inferenceservice.serving.kubeflow.org
Then we do GitOps steps here to add and commit our KFServing Inference Service to serve the model previously trained on an EKS cluster.
$ git add serve-model/serving.yaml $ git commit -am “add serving to server model” $ git push origin master
We wait for Flux to pull the new Git revision and provision the Inference Service. See the steps in more detail in the video below. The testing data (titanic.json) contains 10 instances to be predicted by our model.
The final result shows a set of prediction results in JSON, like this:
{"predictions": [0, 0, 0, 0, 0, 0, 1, 0, 1, 0]}
The roles of Argo and Flux in MLOps
The MLOps Profile is actually utilizing both Argo and Flux technologies together. With EKSctl and WKSctl tools, we provision and start Flux to do our GitOps process. Inside the MLOps Profile, it contains the Argo execution engine which is shipped via Kubeflow. We’re excited for future developments of the Argo Flux GitOps engine for powering future versions of our MLOps Profile.
Summary and looking ahead
Turning a complex set of components into a Profile and making it portable across Kubernetes distributions is a challenge. We now have many of them, such as Buildpacks, AppMesh and a machine learning Profile being developed. The future of Kubernetes should be a portable, easy, and ready-to-use platform for any type of application we’d like to build.
The Kubeflow-based MLOps Profile works on EKSctl, Firekube and also Weave Kubernetes Platform.
Please talk to our sales team if you want commercial support on this.