TensorFlow on Kubernetes: Troubleshooting a TensorFlow Predictive Model Microservice with Weave Cloud

By Weaveworks
October 05, 2017

Weave Cloud works alongside machine learning platforms such as Seldon’s. In this tutorial you will deploy a predictive service that recognizes drawn numbers from 0 to 9.

Related posts

Empowering Platform & Application Teams: A Closer Look at Weave GitOps Enterprise Features

Kubernetes Security - A Complete Guide to Securing Your Containers

Multi-cluster Application Deployment Made Easy with GitOpsSets

Seldon Core is a machine learning platform that helps your data science team deploy models into production. It provides an open-source data science stack that runs within a Kubernetes cluster. Seldon supports models built with TensorFlow, Keras, Vowpal Wabbit, XGBoost, Gensim or any other model-building tool.

It includes an API for two key endpoints:

  • Predict - Build and deploy supervised machine learning models created in any machine learning library or framework at scale using containers and microservices.
  • Recommend - High-performance user activity and content based recommendation engine with various algorithms ready to run out of the box.

Weave Cloud and Seldon

Weave Cloud works alongside machine learning platforms such as Seldon’s. It provides the tools to manage, troubleshoot and monitor machine learning models as well the infrastructure on which it is running

Weave Cloud consists of:

  • Deploy – plug the output of your CI system into a cluster so that you can ship features faster
  • Explore – visualize and understand what’s happening so that you can fix problems faster
  • Monitor – understand the behavior of running systems using Prometheus so that you can identify problems faster

Deploying the MNIST Microservice to Kubernetes

In this tutorial you will deploy a predictive service that recognizes drawn numbers from 0 to 9.  The predictive model was created using Tensorflow. This example describes how to deploy the pre-packaged Docker image that is available in the Seldon server.  

We’ll show you how to launch the service from Tensorflow to Kubernetes in Google Container Engine and then you’ll manage the outputs of the model and visualize the cluster with Weave Cloud.

Let’s get started!

Create the cluster in Google Container Engine (GKE)

  1. Create a GKE cluster with one node and an instance type with at least 4 CPU cores.
  2. Connect to your cluster by clicking the ‘Connect’ button in the GUI by copying the command to install the kubectl binaries in the Google Cloud Shell (or you can set up your command line to use with GKE).  
  3. Sign up to Weave Cloud and then install the agents by copying the command shown to you after selecting: Kubernetes → GKE from the setup screens.  Run the command in your Cloud Shell. wc_onboarding.png
  4. Once the agents are connected, click Explore to view the cluster:
  5. scope_ui.pngKubernetes Running in Weave Cloud

  6. Fork https://github.com/seldonio/seldon-server to your own Github directory.
  7. In the Google Cloud Shell clone the forked seldon-server repository: `git clone https://github.com/[your-github-profile]/seldon-server -b v1.4.7`
  8. Add the ~/seldon-server/kubernetes/bin to your path:  
    `export PATH=$PATH:/home/anita/seldon-server/kubernetes/bin`
    Next cd to seldon-server/kubernetes/conf` and run `vi Makefile` Because you are running in Google Cloud you’ll also need to set the `endpoint to loadbalancer` for the Seldon API before running a make clean:
    SELDON_SERVICE_TYPE=LoadBalancer

    SELDON_SERVER_SERVICE_TYPE=LoadBalancer
    You will also need to change the spark-ui username and password:
    SPARK_UI_USERNAME=someone

    SPARK_UI_PASSWORD=something
    Note: Depending on which shell you’re using you may need to run:  `sudo apt-get install apache2-utils` to install .htpassword (if needed).
  9. Run: `make clean conf`
  10. Launch the Seldon server with: `seldon-up`
  11. Depending on the speed of your network, it may take some time for all of the Docker images to download and deploy (up to 10-15 minutes in some cases).  In the meantime, go to Weave Cloud and click Explore → Pods to see the Seldon cluster appearing: WC_explore_seldon.png

Seldon Pods Deploying to Kubernetes

Deploy the Tensorflow Job and View it in Weave Cloud

  1. The MNIST microservice should already be running once you’ve installed the Seldon service.    Next, create the kubernetes job to load the model:

        cd kubernetes/conf/examples/tensorflow_deep_mnist
    
        kubectl create -f load-model-tensorflow-deep-mnist.json
        
    Once deployed the terminal should show:  
    `job "load-model-tensorflow-deep-mnist" created`

  2. View the job in Weave Cloud by using the Search field for MNIST: wc_explore_seldon2.png

Search for the model as it loads into the cluster

Create the Client & Send the Prediction Model to the Microservice

Create the client with:

```
start-microservice --type prediction --client deep_mnist_client -p tensorflow-deep-mnist /seldon-data/seldon-models/tensorflow_deep_mnist/1/ rest 1.0
```

And then service the predictions to the microservice by running:

```
start-microservice --type prediction --client deep_mnist_client -p tensorflow-deep-mnist /seldon-data/seldon-models/tensorflow_deep_mnist/1/ rest 1.0
```

Test the Output

The microservice takes as input a vector of 784 floats that corresponds to the pixels of a 28x28 image and returns a list of probabilities for each number between 0 and 9. To test it, use the flask webapp created for this purpose.

  1. Get the client key and secret:
    `seldon-cli keys  --client-name deep_mnist_client --scope all`
  2. Get the IP address of the Seldon Server:
    `kubectl get services seldon-server`
  3. Start the webapp using the key, secret and the IP obtained above:  

    `kubectl run deep-mnist-webapp --image=seldonio/deep_mnist_webapp:1.2 --port=80 --command -- "/run_webapp.sh" "<seldon-server-ip>" "<key>" "<secret>" `
  4. Expose the external IP:  
    `kubectl expose deployment/deep-mnist-webapp --type="LoadBalancer"`

Load the Demo App

User the <external-IP>: 80 to view the client as shown below:

seldon1.png

Troubleshoot Predictive Models with Weave Cloud

A useful troubleshooting feature in Weave Cloud is the ability to view real time logs while you are testing your system. Below the logs to the Kafka stream predictions remain open so that you can see the parameters being passed back and forth from the model.

seldon2.png

Kafka Streaming Predictions Log Available for Troubleshooting

Wrapping Up

For information on how to build your own predictive model, see the TensorFlow Digit Classifier Advanced Tutorial.

Check out the documentation for more information on Weave Cloud.

Join the Weave Online User Group and the TensorFlow London meetup for more talks like these.


Related posts

Empowering Platform & Application Teams: A Closer Look at Weave GitOps Enterprise Features

Kubernetes Security - A Complete Guide to Securing Your Containers

Multi-cluster Application Deployment Made Easy with GitOpsSets