Have you ever listened to what your cluster is telling you? Like, really listened?

I’m talking about Events. Every part of Kubernetes sends out Events, which are small messages telling you what is happening inside that component. Most people who have used Kubernetes will have seen some events, printed at the end of ‘kubectl describe’ output. Here is an example:

$ kubectl describe pod/podinfo-9f56d4b58-2jj8z
Name:         podinfo-9f56d4b58-2jj8z
Namespace:    default
Node:         ...
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  42s   default-scheduler  Successfully assigned default/podinfo-9f56d4b58-2jj8z to node1
  Normal  Pulled     41s   kubelet            Container image "ghcr.io/stefanprodan/podinfo:5.1.2" already present on machine
  Normal  Created    41s   kubelet            Created container podinfod
  Normal  Started    41s   kubelet            Started container podinfod

These events tell a little story: first the pod was assigned to a node, then kubelet on that node went through the steps to fetch the image, create a container, and start it. ‘kubectl describe’ prints events from oldest to newest, because it’s likely that the most recent ones are the most interesting ones. 

Events are purely for diagnostic purposes: they aren’t used by Kubernetes controllers to trigger any behaviour. 

Here’s another example, where changed the image reference to “example”, which does not exist:

Type     Reason     Age            From               Message
----     ------     ----           ----               -------
Normal   Scheduled  22s            default-scheduler  Successfully assigned default/podinfo-5487f6dc6c-gvr69 to node1
Normal   BackOff    20s            kubelet            Back-off pulling image "example"
Warning  Failed     20s            kubelet            Error: ImagePullBackOff
Normal   Pulling    8s (x2 over 22s)  kubelet         Pulling image "example"
Warning  Failed     6s (x2 over 20s)  kubelet         Failed to pull image "example": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/example": failed to resolve reference "docker.io/library/example§": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Warning  Failed     6s (x2 over 20s)  kubelet         Error: ErrImagePull

You will also see that the last three lines have “x2”, because Kubernetes bunches up events that happen repeatedly. For example “Age: 6s (x2 over 20s)” means that the most recent instance of this event happened six seconds ago, and it has happened twice in 20 seconds.

Coming back to the same pod the next day, I get:

Type     Reason   Age                     From     Message
----     ------   ----                    ----     -------
Normal   BackOff  19m (x4524 over 17h)    kubelet  Back-off pulling image "example"
Warning  Failed   4m18s (x4591 over 17h)  kubelet  Error: ImagePullBackOff

So this illustrates on the one hand kubelet’s determination: it’s still trying after seventeen hours, and on the other hand Kubernetes’ concern over resource usage: events get deleted after one hour. Sadly this means we don’t see the details of the error message any more. 

Events are stored using the Kubernetes object model, just like Deployments and Pods are. We can look at what’s inside an Event using kubectl:

$ kubectl get event/podinfo-84b5bccbfd-rgb42.166185b1cd1dc668 -o yaml
apiVersion: v1
kind: Event
metadata:
  name: podinfo-84b5bccbfd-rgb42.166185b1cd1dc668
  namespace: default
type: Normal
count: 4636
firstTimestamp: "2021-02-07T16:59:00Z"
lastTimestamp: "2021-02-08T10:39:04Z"
message: Back-off pulling image "example"
reason: BackOff
source:
  component: kubelet
  host: node1
involvedObject:
  apiVersion: v1
  kind: Pod
  name: podinfo-84b5bccbfd-rgb42
  namespace: default

As you see, each event is an object that lives in a namespace, has a unique name, and fields giving detailed information: 

  • Count, first and last timestamp: shows how much the event has repeated.
  • Message: human-readable text saying what happened
  • Reason: a short-form code that could be used for filtering.
  • Type: either ‘Normal’ or ‘Warning’.
  • Source: which Kubernetes component the event comes from.
  • InvolvedObject: a reference to another Kubernetes object, such as a Pod or Deployment.

(I omitted some fields for clarity) 

Now we can see that ‘kubectl describe’ is listing out events where the involvedObject reference matches the object being described. 

If you don’t know where there is a specific problem, you might just ask for all events:

$ kubectl get events -A
NAMESPACE   LAST SEEN   TYPE      REASON              OBJECT                                MESSAGE
...
default     46s         Normal    Scheduled           pod/podinfo-78bbb69b79-wfzrk    Successfully assigned default/podinfo-78bbb69b79-wfzrk to kind-control-plane
default     46s         Normal    Pulled              pod/podinfo-78bbb69b79-wfzrk    Container image "ghcr.io/stefanprodan/podinfo:5.1.1" already present on machine
default     46s         Normal    Created             pod/podinfo-78bbb69b79-wfzrk    Created container podinfod
default     46s         Normal    Started             pod/podinfo-78bbb69b79-wfzrk    Started container podinfod
default     47s         Normal    SuccessfulCreate    replicaset/podinfo-78bbb69b79   Created pod: podinfo-78bbb69b79-wfzrk
default     47s         Normal    ScalingReplicaSet   deployment/podinfo              Scaled up replica set podinfo-78bbb69b79 to 1

Here you can see that different events of a rollout are reported against the Pod, the ReplicaSet and Deployment. 

Warning: ‘kubectl get events’ can spew out a lot of information, especially as your cluster gets busier. Sadly it does not list the events in timestamp order, so you either have to have some idea what you are looking for, or pipe the output to a file and analyze it with the Mk 1 eyeball. 

Stay tuned - Weaveworks has some ideas for tools to make this process easier!