Continuous Delivery to Kubernetes for Machine Learning with Michelle Casbon (Qordoba)
See how Qordoba doubled their productivity by using Weave Cloud for Continuous Delivery to deploy machine learning models to Kubernetes.
Last November, the Weave Online User Group guest speaker was Michelle Casbon, Director of Machine Learning at Qordoba Michelle discussed how Qordoba refactored its platform for microservices, run everything in Kubernetes on the Google Cloud Platform and then implemented Weave Cloud for Continuous Delivery which increased their productivity two-fold.
What Qordoba is solving
Qordoba is focused on the field of localization. They are using Machine Learning to check that when a phrase is translated into another language, it means what was originally intended.
When translations go wrong, they can really go wrong. A famous example is the “Got Milk?” campaign that was successful in the early nineties in the US, but when literally translated to Spanish, it means “Are you lactating?” and as a result was a less successful marketing campaign in Spanish-speaking countries.
To help flag these problematic phrases, Qordoba uses an Affect Detector -- a type of emotion AI -- that when trained can detect the emotion in phrases across languages. The Affect Detector is trained in Kubernetes on the Google Cloud Platform.
Qordoba System Requirements & Resource Footprints
The Qordoba platform is a set of microservices that provide standard functionality such as billing, a web interface for the localization process, a message centre and data storage. But unique to their product is the integration of Machine Learning into the platform.
At a very high-level, Machine Learning is a two-step process: one is the training of the algorithm and the second is the cross-validation phase. Both of these Machine Learning processes have very different resource footprints. The cross-validation phase alone takes up 10 times more resources than the training phase.
The diagram below shows a subset of the tools used at Qordoba. All of these pieces also have very different resource footprints. But by putting each microservice into its own container, resources for each service can be requested on demand and made available.
Everything in One Kubernetes Cluster
With Kubernetes, it’s quite simple to wrap a few YAML files around an image, request the resources they need and then deploy it to the cluster.
When Qordoba started using Kubernetes, they intended it as a single workflow for training Machine Learning models, but now their entire platform (including the content creation tools) runs on Kubernetes.
Having everything together in one cluster not only made cross-team communication easier but it also simplified development for Qordoba. Before Kubernetes, Qordoba ran and managed their own nodes on a self-managed Mesos cluster. One of the main issues they had to deal with was the many hard-coded IP addresses that constantly changed.
But with Kubernetes, they can directly refer to each microservice without any port mapping. This frees them up from having to keep track of which services are running on which ports and on which node. Common services like REST run on standardized ports and now they can talk directly to any of their services.
But with great improvements also came drawbacks….
Problem 1: Releases are Time Consuming
Although Qordoba found it straightforward to setup the infrastructure, releasing updates was still time consuming. Having to maintain all of those YAML files is labour intensive and it got in the way of their core business. But with Weave Cloud, Qordoba no longer have to maintain or keep track of YAML files across different clusters.
One of the best things about managed Kubernetes in the Google Cloud Platform is that you can spin up clusters as you need them. Qordoba take advantage of this feature and have several different clusters with specific configurations that don’t always talk to the same backend. And before they started using Weave Cloud, with all of their clusters, the process of making changes and managing where those changes were going was onerous.
“We create a feature branch, make our changes and merge that into our develop branch and that's it. After a merge, Weave Cloud takes over, notices the change, updates the manifests, versions them in Git and deploys them to our cluster.” --Michelle Casbon
With the change applied to the cluster, the Kubernetes objects called Deployments take over and they handle all of the rollout process.
With Weave Cloud, it’s a hands-off process, says Michelle:
“We absolutely love that in order to release something all we have to do is merge a feature branch. That's been really transformative for us.” -- Michelle Casbon
GitOps: Qordoba Build Pipeline
This is what the Qordoba process looks like:
- Engineer merges a feature. A Qordoba engineer sends a PR. After it's been reviewed it either goes into a develop branch or to the master branch.
- Jenkins tests the code, builds the image and sends it to Jfrog Artifactory
- Weave Cloud notices a new version in the container repository. Since Weave Cloud monitors the container repository, It knows which image belongs to what service and recognizes that the service has changed and that it needs to do something with it.
- Weave Cloud then writes the change to Github to indicate that the version has been incremented. The new change is then applied to the cluster. Literally, Weave Cloud does a `kubectl apply new definition.
- Kubernetes sees a new version of an image and kicks off the automated rollout process.
Problem 2: Configuration Files are Scattered
The other problem that Qordoba faced is that their config files were scattered everywhere.
With Git as the single source of truth, file structures are consistent across all services. This allows the Qordoba developers to apply the same set operations to all of their services.
For example, one repo contains all of the YAML definitions and configurations, config maps and it reflects the organization of their different clusters. Clusters are recreated by cloning a branch from the configuration repo. Also by setting up a branch per repo, Qordoba developers can easily compare their different clusters across branches.
Problem 3: Nodes Sometimes Die
Another common problem is that sometimes nodes stop working and need to be recreated. And in the worst case scenario, an entire cluster can go down in flames. But because Qordoba have a single source of truth living in a persistent state in Git, nodes and clusters are easily recreated by cloning the repo.
It’s easy to spin up a new cluster in Google Cloud Platform but combined with Weave Cloud Qordoba creates a new instance, and then connects the Weave Cloud agents by applying a tailored command for their cluster from Weave Cloud.
Qordoba then provides a deploy key that allows Weave Cloud to access their repos. Then they run a script that generates all of their config maps and objects that also applies all of their YAMLs. And because all of this is automated in Weave Cloud, it means that Qordoba recovers from a lost cluster in a much shorter amount of time.
Qordoba explained how they’ve architected their Machine Learning platform for localization. Michelle Casbon then described how Qordoba uses containers and Kubernetes and how they’ve implemented the GitOps methodology and how they use Weave Cloud to automate deployments in Kubernetes.
View the talk in its entirety below: