No matter how well-engineered your software is, how ready it is to be scaled both vertically and horizontally, once your application is out in the wild, you need the tools and the means to keep a close eye on it so that you can understand how it behaves under real load, how to find the bottlenecks, and query problems and how to determine the points of failure when your system goes down.
The following topics are discussed in this tutorial:
- What to Measure?
- How to collect metrics from your infrastructure and applications?
- Before you Begin
- Sign up for Google Cloud
- Sign Up for Weave Cloud & Connect the Agents to Kubernetes
- Monitoring with Weave Cloud
- Deploy ‘The Sock Shop’ to Kubernetes
- Conclusions and Next Steps
Your system has so many moving parts that it is sometimes difficult to make a decision on where to start or what exactly to measure. This problem is somewhat easier once you understand The RED Method, but as you will learn, the RED method on its own doesn’t always provide you with the complete picture.
To summarize the RED method, for each of your services, you would gather:
- Rate: number of requests per second a.k.a. throughput
- Errors: number of errors per second a.k.a. number of requests per second that returned a 5xx error
- Duration: the amount of time that it takes for each request to be served
The RED Method encourages you to have a standard overview of your systems that enable you to not only analyze how the application is responding to user interaction, but that can help debug any future arising issues. This approach also helps you analyze how your application’s behaviour changes over time. But even though these metrics are a good starting point, they are just that: a starting point. They can bring some insight into your stack but they won’t tell you much about your infrastructure.
A good complement to the RED metrics is to include some metrics about your cluster:
- Total amount of CPU available & being used
- Total amount of RAM available & being used
- Network traffic/throughput
- Storage (disk I/O, disk usage, etc.)
When you collect these metrics, you can correlate, for example, the amount of RAM that your application is consuming with the number of visitors that you have. This also helps you detect memory leaks, determine if your disks are filling up or assess the utilization of your infrastructure.
For SQL and NoSQL databases you can measure:
- Current number of connections
- Query duration
- Number of queries per second
These metrics can detect slow running queries and enable you to correlate slow queries with slow rendering pages on your site.
Other metrics to consider:
- 3rd party API’ rate, errors and duration. This can help you detect if the bottleneck in an endpoint at a certain point in time is either on your side or on the external API’s side of things
- Job Queues. If you have a pool of background workers (for processing video, audio, images, bulk sending emails and tasks of that sort), measure the amount of jobs that you currently have in the queue. Bonus points if you can measure the amount of jobs processed since the last reading
There are several ways to get the metrics out of your system and many vaults where you can store them:
- The Push Method
- Metrics can be pushed directly from your application to your data store
- You can have an agent sitting right next to your application pulling the metrics out of it and then pushing them into the data store
- The Pull Method
- You can have an agent that remotely pulls the data out of your application and pushes the metrics into the data store
Special note on short-lived services: Some services are ephemeral and for these types of applications, the pull method might not be apt. You will then need to push the metrics from the service itself.
To make a decision on the kind of storage to use for our metrics, it’s important to establish the criteria and nature of the data:
- The data you’re collecting is immutable. Take as an example the metric: duration for each request on the endpoints of your API. You most likely will never update any of these records.
- Since you will collect this data over a constant number of seconds, you can safely assume that most of the dataset will already be sorted (by time!)
Kubernetes comes with a built-in endpoint for metrics which is supported by Prometheus.When you deploy the Weave Cloud probes on your cluster, these will automatically gather all the metrics from Kubernetes and later on you will be able to query and graph them using Weave Monitor.
You will use the hosted Kubernetes version that Google Cloud offers, as it is the easiest way to get a cluster up and running.
1. Login into your Google Cloud account and find the Container Engine section.
Click Create a container cluster and follow the instructions.
2. When asked about the details of the cluster:
- Pick a good name for your cluster
- Select a Zone that is close to your physical location
- We recommend a cluster of at least 7.5GB and at least 2 vCPUs for deploying The Sock Shop and the rest of the Weave probes.
- Since this cluster is only for testing purposes, we recommend disabling
- Automatic upgrades
- Automatic repair
After filling in the details, click the Create button.
3. Wait for your cluster to become available. This operation might take between 5 and 10 minutes.
4. Once your cluster is ready, click on its Connect button. This will bring a dialogue up with the Google Cloud CLI configuration command
5. Authenticate with Google Cloud from the cloud terminal:
`gcloud auth login`
The output should be something like this:
``` Go to the following link in your browser: https://accounts.google.com/o/oauth2/auth?redirect_uri=... Enter verification code: ```
In your browser, open the link that was provided by the previous command and follow the instructions.
6. Get the credentials required by
kubectl to authenticate with your
``` gcloud container clusters get-credentials <cluster-name> \ --zone <zone> --project <project-id> ```
Ensure that you replace the cluster-name, zone and project-id placeholders with the values that the Google Cloud is giving you.
The output should be as follows:
``` Fetching cluster endpoint and auth data. kubeconfig entry generated for cluster-1. ```
7. Verify that you can talk to the cluster:
``` kubectl cluster-info ``` The output should yield something like this: ``` Kubernetes master is running at https://126.96.36.199 GLBCDefaultBackend is running at https://188.8.131.52/api/v1/proxy/namespaces/kube-system/services/default-http-backend Heapster is running at https://184.108.40.206/api/v1/proxy/namespaces/kube-system/services/heapster KubeDNS is running at https://220.127.116.11/api/v1/proxy/namespaces/kube-system/services/kube-dns kubernetes-dashboard is running at https://18.104.22.168/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. ```
Before you can use Weave Monitor to monitor apps, you’ll need to:
1. Sign up for Weave Cloud.
2. From the set up screens that appear, choose Platform Kubernetes –> Environment GKE and then copy the command.
3. Paste this command into your GKE terminal.
After a few moments your cluster should be successfully connected to Weave Cloud.
Note: Weave Cloud requires elevated permissions which your user account will not be able to grant without additional configuration. These permissions are already set up in the command shown to you from the Weave Cloud set up screens.
For more information see, Cluster Role Binding
At this point, the Weave Cloud agents should have been deployed to your Kubernetes cluster. Monitor automatically collect several metrics from your cluster, including:
- Kubernetes built-in metrics
- Any metrics generated by your services as long as your services expose metrics on the standard
To visualize these metrics:
1. Go to your Weave Cloud instance and click Monitor. You will be presented with a list of preconfigured System Notebooks
2. Check the Node Resources notebook:
The Kubernetes notebook:
And the Weave Net notebook:
Notice that there are no metrics for this since GKE clusters use their own container networking plugin.
Deploy the microservices reference application, The Sock Shop, along with a load test service to generate some metrics that later on you will be able to visualize in Weave Cloud.
To install The Sock Shop, run the following in the Google terminal:
git clone https://github.com/microservices-demo/microservices-demo microservices-demo cd microservices-demo kubectl create namespace sock-shop kubectl apply -f deploy/kubernetes/manifests
It may take a few minutes before the application is completely ready and generating metrics that can be collected and displayed.
Go to Weave Cloud, click Monitor and create a new notebook for the Sock Shop. (Notebooks are what we refer to as collections of queries for a particular service, application or even incident. They can be shared with your colleagues as well. See,Understanding Prometheus Notebooks for more information on how to use them.
The following screen capture displays a sample query that shows the request rate for each of the services in the Shop, both for
status codes as well as for
HTTP 500 errors.
Up to here you have a source of truth to gauge how your system is behaving in production but this is only part of your job. You probably
also need to attend meetings, write some code, do some server maintainance, eat, sleep, etc. For these specific scenarios you need
an alerting system that can look into the data from your metrics, analyse it and react on a certain criteria.
For example, if in the past 5 minutes your front end application has been returning more
HTTP 500 status codes than
then you probably want this alerting system to notify you about it.
For this we have written the Configuring Alerts, Routes and Receivers with the RED Method tutorial.
Join the Weave Community
If you have any questions or comments you can reach out to us on our Slack channel. To invite yourself to the Community Slack channel, visit Weave Community Slack invite or contact us through one of these other channels at Help and Support Services.