Understanding the Kubernetes Event Horizon
Have you ever listened to what your cluster is telling you? Every part of Kubernetes sends out Events, which are small messages telling you what is happening inside that component. In this post we look at some data that 'kubectl describe' displays, for diagnosing specific problems.
You aren't Doing GitOps without Drift Detection
KubeCon and GitOpsCon EU, 2022 - Git Involved!
ADAPT AND THRIVE: The Importance of Locking Down Your Software Development Pipeline
Have you ever listened to what your cluster is telling you? Like, really listened?
I’m talking about Events. Every part of Kubernetes sends out Events, which are small messages telling you what is happening inside that component. Most people who have used Kubernetes will have seen some events, printed at the end of ‘kubectl describe’ output. Here is an example:
$ kubectl describe pod/podinfo-9f56d4b58-2jj8z Name: podinfo-9f56d4b58-2jj8z Namespace: default Node: ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 42s default-scheduler Successfully assigned default/podinfo-9f56d4b58-2jj8z to node1 Normal Pulled 41s kubelet Container image "ghcr.io/stefanprodan/podinfo:5.1.2" already present on machine Normal Created 41s kubelet Created container podinfod Normal Started 41s kubelet Started container podinfod
These events tell a little story: first the pod was assigned to a node, then kubelet on that node went through the steps to fetch the image, create a container, and start it. ‘kubectl describe’ prints events from oldest to newest, because it’s likely that the most recent ones are the most interesting ones.
Events are purely for diagnostic purposes: they aren’t used by Kubernetes controllers to trigger any behaviour.
Here’s another example, where changed the image reference to “example”, which does not exist:
Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 22s default-scheduler Successfully assigned default/podinfo-5487f6dc6c-gvr69 to node1 Normal BackOff 20s kubelet Back-off pulling image "example" Warning Failed 20s kubelet Error: ImagePullBackOff Normal Pulling 8s (x2 over 22s) kubelet Pulling image "example" Warning Failed 6s (x2 over 20s) kubelet Failed to pull image "example": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/example": failed to resolve reference "docker.io/library/example§": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed Warning Failed 6s (x2 over 20s) kubelet Error: ErrImagePull
You will also see that the last three lines have “x2”, because Kubernetes bunches up events that happen repeatedly. For example “Age: 6s (x2 over 20s)” means that the most recent instance of this event happened six seconds ago, and it has happened twice in 20 seconds.
Coming back to the same pod the next day, I get:
Type Reason Age From Message ---- ------ ---- ---- ------- Normal BackOff 19m (x4524 over 17h) kubelet Back-off pulling image "example" Warning Failed 4m18s (x4591 over 17h) kubelet Error: ImagePullBackOff
So this illustrates on the one hand kubelet’s determination: it’s still trying after seventeen hours, and on the other hand Kubernetes’ concern over resource usage: events get deleted after one hour. Sadly this means we don’t see the details of the error message any more.
Events are stored using the Kubernetes object model, just like Deployments and Pods are. We can look at what’s inside an Event using kubectl:
$ kubectl get event/podinfo-84b5bccbfd-rgb42.166185b1cd1dc668 -o yaml apiVersion: v1 kind: Event metadata: name: podinfo-84b5bccbfd-rgb42.166185b1cd1dc668 namespace: default type: Normal count: 4636 firstTimestamp: "2021-02-07T16:59:00Z" lastTimestamp: "2021-02-08T10:39:04Z" message: Back-off pulling image "example" reason: BackOff source: component: kubelet host: node1 involvedObject: apiVersion: v1 kind: Pod name: podinfo-84b5bccbfd-rgb42 namespace: default
As you see, each event is an object that lives in a namespace, has a unique name, and fields giving detailed information:
- Count, first and last timestamp: shows how much the event has repeated.
- Message: human-readable text saying what happened
- Reason: a short-form code that could be used for filtering.
- Type: either ‘Normal’ or ‘Warning’.
- Source: which Kubernetes component the event comes from.
- InvolvedObject: a reference to another Kubernetes object, such as a Pod or Deployment.
(I omitted some fields for clarity)
Now we can see that ‘kubectl describe’ is listing out events where the involvedObject reference matches the object being described.
If you don’t know where there is a specific problem, you might just ask for all events:
$ kubectl get events -A NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE ... default 46s Normal Scheduled pod/podinfo-78bbb69b79-wfzrk Successfully assigned default/podinfo-78bbb69b79-wfzrk to kind-control-plane default 46s Normal Pulled pod/podinfo-78bbb69b79-wfzrk Container image "ghcr.io/stefanprodan/podinfo:5.1.1" already present on machine default 46s Normal Created pod/podinfo-78bbb69b79-wfzrk Created container podinfod default 46s Normal Started pod/podinfo-78bbb69b79-wfzrk Started container podinfod default 47s Normal SuccessfulCreate replicaset/podinfo-78bbb69b79 Created pod: podinfo-78bbb69b79-wfzrk default 47s Normal ScalingReplicaSet deployment/podinfo Scaled up replica set podinfo-78bbb69b79 to 1
Here you can see that different events of a rollout are reported against the Pod, the ReplicaSet and Deployment.
Warning: ‘kubectl get events’ can spew out a lot of information, especially as your cluster gets busier. Sadly it does not list the events in timestamp order, so you either have to have some idea what you are looking for, or pipe the output to a file and analyze it with the Mk 1 eyeball.
Stay tuned - Weaveworks has some ideas for tools to make this process easier!