Understanding the Kubernetes Event Horizon
Have you ever listened to what your cluster is telling you? Every part of Kubernetes sends out Events, which are small messages telling you what is happening inside that component. In this post we look at some data that 'kubectl describe' displays, for diagnosing specific problems.
Have you ever listened to what your cluster is telling you? Kubernetes Events are small messages telling you what is happening inside a specific cluster.
Have you ever listened to what your cluster is telling you? Like, really listened?
I’m talking about Kubernetes Events. Every part of Kubernetes sends out Events, which are small messages telling you what is happening inside that component. Most people who have used Kubernetes will have seen some events, printed at the end of ‘kubectl describe’ output. Here is an example:
$ kubectl describe pod/podinfo-9f56d4b58-2jj8z Name: podinfo-9f56d4b58-2jj8z Namespace: default Node: ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 42s default-scheduler Successfully assigned default/podinfo-9f56d4b58-2jj8z to node1 Normal Pulled 41s kubelet Container image "ghcr.io/stefanprodan/podinfo:5.1.2" already present on machine Normal Created 41s kubelet Created container podinfod Normal Started 41s kubelet Started container podinfod
These events tell a little story: first, the pod was assigned to a node, then the kubelet on that node went through the steps to fetch the image, create a container, and start it. ‘kubectl describe’ prints events from oldest to newest, because it’s likely that the most recent ones are the most interesting ones.
Events are purely for diagnostic purposes- they aren’t used by Kubernetes controllers to trigger any behavior.
Here’s another example, where the image reference is changed to “example”, which does not exist:
Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 22s default-scheduler Successfully assigned default/podinfo-5487f6dc6c-gvr69 to node1 Normal BackOff 20s kubelet Back-off pulling image "example" Warning Failed 20s kubelet Error: ImagePullBackOff Normal Pulling 8s (x2 over 22s) kubelet Pulling image "example" Warning Failed 6s (x2 over 20s) kubelet Failed to pull image "example": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/example": failed to resolve reference "docker.io/library/example§": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed Warning Failed 6s (x2 over 20s) kubelet Error: ErrImagePull
You will also see that the last three lines have “x2”, because Kubernetes bunches up events that happen repeatedly. For example, “Age: 6s (x2 over 20s)” means that the most recent instance of this event happened six seconds ago, and it has happened twice in 20 seconds.
Coming back to the same pod the next day, I get:
Type Reason Age From Message ---- ------ ---- ---- ------- Normal BackOff 19m (x4524 over 17h) kubelet Back-off pulling image "example" Warning Failed 4m18s (x4591 over 17h) kubelet Error: ImagePullBackOff
So this illustrates Kubelet’s determination: it’s still trying after seventeen hours, and on the other hand Kubernetes’ concern over resource usage: events get deleted after one hour. Sadly this means we don’t see the details of the error message any more.
Events are stored using the Kubernetes object model, just like Deployments and Pods are. We can look at what’s inside a Kubernetes Event using kubectl:
$ kubectl get event/podinfo-84b5bccbfd-rgb42.166185b1cd1dc668 -o yaml apiVersion: v1 kind: Event metadata: name: podinfo-84b5bccbfd-rgb42.166185b1cd1dc668 namespace: default type: Normal count: 4636 firstTimestamp: "2021-02-07T16:59:00Z" lastTimestamp: "2021-02-08T10:39:04Z" message: Back-off pulling image "example" reason: BackOff source: component: kubelet host: node1 involvedObject: apiVersion: v1 kind: Pod name: podinfo-84b5bccbfd-rgb42 namespace: default
As you see, each Kubernetes Event is an object that lives in a namespace, has a unique name, and fields giving detailed information:
- Count(first and last timestamp): shows how much the event has repeated.
- Message: human-readable text saying what happened
- Reason: a short-form code that could be used for filtering.
- Type: either ‘Normal’ or ‘Warning’.
- Source: which Kubernetes component the event comes from.
- InvolvedObject: a reference to another Kubernetes object, such as a Pod or Deployment.
(I omitted some fields for clarity)
Now we can see that ‘kubectl describe’ is listing out events where the involvedObject reference matches the object being described.
If you don’t know where there is a specific problem, you might just ask for all events:
$ kubectl get events -A NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE ... default 46s Normal Scheduled pod/podinfo-78bbb69b79-wfzrk Successfully assigned default/podinfo-78bbb69b79-wfzrk to kind-control-plane default 46s Normal Pulled pod/podinfo-78bbb69b79-wfzrk Container image "ghcr.io/stefanprodan/podinfo:5.1.1" already present on machine default 46s Normal Created pod/podinfo-78bbb69b79-wfzrk Created container podinfod default 46s Normal Started pod/podinfo-78bbb69b79-wfzrk Started container podinfod default 47s Normal SuccessfulCreate replicaset/podinfo-78bbb69b79 Created pod: podinfo-78bbb69b79-wfzrk default 47s Normal ScalingReplicaSet deployment/podinfo Scaled up replica set podinfo-78bbb69b79 to 1
Here you can see that different events of a rollout are reported against the Pod, the ReplicaSet and Deployment.
Warning: ‘kubectl get events’ can spew out a lot of information, especially as your cluster gets busier. Sadly it does not list the events in timestamp order, so you either have to have some idea of what you are looking for, or pipe the output to a file and analyze it with the Mk 1 eyeball.
Stay tuned - Weaveworks has some ideas for tools to make this process easier!Types of Kubernetes Events
When there is a change of status in the node there are many types of Kubernetes events created: failed events, evicted events, node- specific events, and storage specific events.
This event type happens when a container is not successfully created by Kubernetes. It is most commonly due to the system not being able to pull the docker image. Less common causes for failure consist of typing mistakes, insufficient permissions, and upstream build failures. It is important to be aware of these to prevent failed events in the future.
Evicted Events occur often as Kubernetes can evict various containers and pods. These pods that Kubernetes terminates consume large amounts of resources that consume a lot of memory. Evicted Events can also inform you when pods are poorly distributed to new nodes.
Events that are node-specific help identify when the system’s showing erratic behavior. A few of these examples include NodeHasSufficientMemory, NodeHasSufficientPID, NodeReady, NodeNotReady, Rebooted, and HostPortConflict. These can become tools in debugging your cluster.
Two common errors known as FailedMount and FailedAttatchVolume might show up in your path. These errors include instances when a pod fails to mount storage resources. It could also include when a node is not healthy enough to mount a volume.
Knowing these events will overall help you to better understand those small messages Kubernetes is telling you.
Stay tuned - Weaveworks has some ideas for tools to make this process easier! Meanwhile, check out our other articles to better understand software and development.