Paul Curtis ( @pfcurtis_NY), Weaveworks Solutions Architect based in New York City, recently presented an online talk on how GitOps and progressive delivery can be effectively used to manage machine learning models in Kubernetes. But before diving into how GitOps and Progressive Delivery helps deliver machine learning to your apps, Paul spent some defining a few of the terms used in his talk, including:
- Service meshes
- Progressive Delivery
Many already know containers and Kubernetes, but not a lot are aware of service meshes. A service mesh is a layer in your Kubernetes cluster that is a combination of an ingress controller and routing daemon. It also allows you to route network traffic and control how pods and namespaces are interconnected. Service meshes can do load balancing, traffic switching and they also have a dynamic and programmable control plane. When you want to do more advanced deployments like A/B testing and canaries for machine learning applications, a service mesh is both useful and often necessary.
Common service meshes out there include: Istio, linkerd, Envoy, App Mesh from EKS.
Progressive Delivery and Advanced Deployments
Progressive delivery encompasses a set of advanced deployments that refer to controlled methods of shifting traffic. They allow developers to gradually go from an old release to a new release that will eventually end up on production in your application. With a slow and controlled roll out (using a service mesh), you can avoid any major problems and can easily roll back if one is encountered.
Described below are a few of the more common deployment methods in use for machine learning. For a more comprehensive discussion on the different types of advanced and progressive deployments refer to the post, Kubernetes Deployment Strategies.
A canary is used for when you want to test some new functionality typically on the backend of your application. Traditionally you may have had two almost identical servers: one that goes to all users and another with the new features that gets rolled out to a subset of users and then compared. When no errors are reported, the new version can gradually roll out to the rest of the infrastructure.
Rather than launch a new feature for all users, with a A/B you can instead release it to a small set of users. The users are typically unaware they are being used as testers for the new feature, and so sometimes these types of deployments are referred to as a “dark” deployment.
In a blue/green deployment strategy the old version of the application (green) and the new version (blue) get deployed at the same time. When both of these are deployed, users only have access to the green; whereas, the blue is available to your QA team for test automation on a separate service or via direct port-forwarding.
What adopting GitOps can do for your team
GitOps is a method of automating application deployments as well as the management of Kubernetes itself. At its core, GitOps is these two things:
- An operating model for Kubernetes and other cloud native technologies, providing a set of best practices that unify deployment, management and monitoring for containerized clusters and applications.
- A path towards a developer experience for managing applications; where end-to-end CICD pipelines and git workflows are applied to both operations, and development.
Keeping the state of your entire system in Git has a few advantages:
- Fully auditable trail with security guarantees. Changes are triggered, stored and validated and fully auditable in Git.
- Any developer that can use git can make pull requests for both infrastructure and application changes. The same workflow can also be applied to both Development and Ops teams.
- Changes in state can be monitored and alerted on when there is a divergence between what’s kept in Git and what’s running in production.
When it comes to finance and trading applications who use machine learning, a fully audible and verified trail that you get with GitOps is especially valuable since you can see exactly what model was deployed when and by whom.
Machine learning pipelines and algorithms
Machine learning practitioners are already making extensive use of cloud native technology. These types of predictive, intelligent applications are pushing cloud technology forward and are the principal drivers for companies to make the move to Kubernetes in order to gain a competitive edge.
What are the use cases for machine learning?
Typical use cases for machine learning are shopping recommendation engines, traffic routing apps or financial applications such as risk assessment and trade recommendation engines and many other cases. These types of apps effectively make informed recommendations based on shopping habits as in the case of eCommerce or they can make intelligent decisions based on current traffic patterns relative to your position. With financial and trading applications, algorithms and machine learning models can make best guesses on trades based on levels of risk, desired returns and hedges and other criteria.
What does a machine learning pipeline look like?
Let’s have a look at how a machine learning pipeline might operate. To produce a model that can be used to make recommendations or other decisions, an algorithm runs results against a series of datasets. This is what’s referred to as training the algorithm to create a model.
The simplified diagram below is an overview of what a machine learning algorithm looks like. On the left hand side is the dataset inputs and the algorithm. This data gets applied to an algorithm to produce a useable model. The datasets keep getting applied, with parameter tweaks over and over again until the algorithm produces a model you can use in your application.
At the end of the training period, you end up with:
- Training Model
All of these components may need to be released on the fly to return results that you need for your given application.
If you were to train and release models with GitOps, the pipeline would look as follows:
From a GitOps perspective, you have a Data Engineer who is coming up with the datasets and then a Data Scientist who is writing the algorithm. Both of these components need to be checked into git. With Flux you can deploy both of those things to their own pod for testing and model training.
Once the model is produced, Flux is able to pick up that model, and deploy it to your application that is running in Kubernetes. Typically, the model that gets deployed includes an updated dataset that needs to be made available. Flux can ensure that this dataset is always up to date and made available to the model.
Benefits of GitOps for machine learning
Although we were discussing the process for the training models or pre-production, the same pipelines apply for running the algorithm in production. The same benefits are also gained, including:
View the talk in its entirety, including a walk through of how to use Flagger for progressive delivery with machine learning:
Join the Weave Online User Group for more talks and discussions like these.