Understanding the Kubernetes Event Horizon

March 02, 2021

Have you ever listened to what your cluster is telling you? Every part of Kubernetes sends out Events, which are small messages telling you what is happening inside that component. In this post we look at some data that 'kubectl describe' displays, for diagnosing specific problems.

Related posts

Extending GitOps Beyond Kubernetes with Terraform Controller

Kubernetes On-Premise - What You Need to Know

Understanding Core Kubernetes Concepts & Components

Have you ever listened to what your cluster is telling you? Kubernetes Events are small messages telling you what is happening inside a specific cluster.

Have you ever listened to what your cluster is telling you? Like, really listened?

I’m talking about Kubernetes Events. Every part of Kubernetes sends out Events, which are small messages telling you what is happening inside that component. Most people who have used Kubernetes will have seen some events, printed at the end of ‘kubectl describe’ output. Here is an example:

$ kubectl describe pod/podinfo-9f56d4b58-2jj8z
Name:         podinfo-9f56d4b58-2jj8z
Namespace:    default
Node:         ...
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  42s   default-scheduler  Successfully assigned default/podinfo-9f56d4b58-2jj8z to node1
  Normal  Pulled     41s   kubelet            Container image "ghcr.io/stefanprodan/podinfo:5.1.2" already present on machine
  Normal  Created    41s   kubelet            Created container podinfod
  Normal  Started    41s   kubelet            Started container podinfod

These events tell a little story: first, the pod was assigned to a node, then the kubelet on that node went through the steps to fetch the image, create a container, and start it. ‘kubectl describe’ prints events from oldest to newest, because it’s likely that the most recent ones are the most interesting ones. 

Events are purely for diagnostic purposes- they aren’t used by Kubernetes controllers to trigger any behavior. 

Here’s another example, where the image reference is changed to “example”, which does not exist:

Type     Reason     Age            From               Message
----     ------     ----           ----               -------
Normal   Scheduled  22s            default-scheduler  Successfully assigned default/podinfo-5487f6dc6c-gvr69 to node1
Normal   BackOff    20s            kubelet            Back-off pulling image "example"
Warning  Failed     20s            kubelet            Error: ImagePullBackOff
Normal   Pulling    8s (x2 over 22s)  kubelet         Pulling image "example"
Warning  Failed     6s (x2 over 20s)  kubelet         Failed to pull image "example": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/example": failed to resolve reference "docker.io/library/example§": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Warning  Failed     6s (x2 over 20s)  kubelet         Error: ErrImagePull

You will also see that the last three lines have “x2”, because Kubernetes bunches up events that happen repeatedly. For example, “Age: 6s (x2 over 20s)” means that the most recent instance of this event happened six seconds ago, and it has happened twice in 20 seconds.

Coming back to the same pod the next day, I get:

Type     Reason   Age                     From     Message
----     ------   ----                    ----     -------
Normal   BackOff  19m (x4524 over 17h)    kubelet  Back-off pulling image "example"
Warning  Failed   4m18s (x4591 over 17h)  kubelet  Error: ImagePullBackOff

So this illustrates Kubelet’s determination: it’s still trying after seventeen hours, and on the other hand Kubernetes’ concern over resource usage: events get deleted after one hour. Sadly this means we don’t see the details of the error message any more. 

Events are stored using the Kubernetes object model, just like Deployments and Pods are. We can look at what’s inside a Kubernetes Event using kubectl:

$ kubectl get event/podinfo-84b5bccbfd-rgb42.166185b1cd1dc668 -o yaml
apiVersion: v1
kind: Event
metadata:
  name: podinfo-84b5bccbfd-rgb42.166185b1cd1dc668
  namespace: default
type: Normal
count: 4636
firstTimestamp: "2021-02-07T16:59:00Z"
lastTimestamp: "2021-02-08T10:39:04Z"
message: Back-off pulling image "example"
reason: BackOff
source:
  component: kubelet
  host: node1
involvedObject:
  apiVersion: v1
  kind: Pod
  name: podinfo-84b5bccbfd-rgb42
  namespace: default

As you see, each Kubernetes Event is an object that lives in a namespace, has a unique name, and fields giving detailed information: 

  • Count(first and last timestamp): shows how much the event has repeated.
  • Message: human-readable text saying what happened
  • Reason: a short-form code that could be used for filtering.
  • Type: either ‘Normal’ or ‘Warning’.
  • Source: which Kubernetes component the event comes from.
  • InvolvedObject: a reference to another Kubernetes object, such as a Pod or Deployment.

(I omitted some fields for clarity) 

Now we can see that ‘kubectl describe’ is listing out events where the involvedObject reference matches the object being described. 

If you don’t know where there is a specific problem, you might just ask for all events:

$ kubectl get events -A
NAMESPACE   LAST SEEN   TYPE      REASON              OBJECT                                MESSAGE
...
default     46s         Normal    Scheduled           pod/podinfo-78bbb69b79-wfzrk    Successfully assigned default/podinfo-78bbb69b79-wfzrk to kind-control-plane
default     46s         Normal    Pulled              pod/podinfo-78bbb69b79-wfzrk    Container image "ghcr.io/stefanprodan/podinfo:5.1.1" already present on machine
default     46s         Normal    Created             pod/podinfo-78bbb69b79-wfzrk    Created container podinfod
default     46s         Normal    Started             pod/podinfo-78bbb69b79-wfzrk    Started container podinfod
default     47s         Normal    SuccessfulCreate    replicaset/podinfo-78bbb69b79   Created pod: podinfo-78bbb69b79-wfzrk
default     47s         Normal    ScalingReplicaSet   deployment/podinfo              Scaled up replica set podinfo-78bbb69b79 to 1

Here you can see that different events of a rollout are reported against the Pod, the ReplicaSet and Deployment. 

Warning: ‘kubectl get events’ can spew out a lot of information, especially as your cluster gets busier. Sadly it does not list the events in timestamp order, so you either have to have some idea of what you are looking for, or pipe the output to a file and analyze it with the Mk 1 eyeball. 

Stay tuned - Weaveworks has some ideas for tools to make this process easier!

Types of Kubernetes Events

When there is a change of status in the node there are many types of Kubernetes events created: failed events, evicted events, node- specific events, and storage specific events.

Failed Events

This event type happens when a container is not successfully created by Kubernetes. It is most commonly due to the system not being able to pull the docker image. Less common causes for failure consist of typing mistakes, insufficient permissions, and upstream build failures. It is important to be aware of these to prevent failed events in the future.

Evicted Events

Evicted Events occur often as Kubernetes can evict various containers and pods. These pods that Kubernetes terminates consume large amounts of resources that consume a lot of memory. Evicted Events can also inform you when pods are poorly distributed to new nodes.

Node-Specific Events

Events that are node-specific help identify when the system’s showing erratic behavior. A few of these examples include NodeHasSufficientMemory, NodeHasSufficientPID, NodeReady, NodeNotReady, Rebooted, and HostPortConflict. These can become tools in debugging your cluster.

Storage-Specific Events

Two common errors known as FailedMount and FailedAttatchVolume might show up in your path. These errors include instances when a pod fails to mount storage resources. It could also include when a node is not healthy enough to mount a volume.

Knowing these events will overall help you to better understand those small messages Kubernetes is telling you.

Stay tuned - Weaveworks has some ideas for tools to make this process easier! Meanwhile, check out our other articles to better understand software and development.

HOW GITOPS BOOSTS BUSINESS PERFORMANCE: THE FACTS


Related posts

Extending GitOps Beyond Kubernetes with Terraform Controller

Kubernetes On-Premise - What You Need to Know

Understanding Core Kubernetes Concepts & Components

How GitOps Boosts Business Performance: The Facts