Why is Distributed Debugging Difficult?

Finding and debugging any problems in software is hard, but in a distributed application running in Kubernetes, it is even more challenging for the following reasons:

  • Communication between services can be unreliable and asynchronous which makes errors difficult to reproduce.
  • Services often interact with one another at intermittent times, and they can be written in different languages that are all running across several different nodes which makes transactions difficult to step through.
  • Most legacy debugging tools are not designed for distributed systems and the ones that do support microservices live outside of your day to day environment which means you have to do a lot of context switching. 

Essentially, the main problem with debugging and finding the root cause in a distributed system is being able to recreate the state of the system when the error occurred so that you can get a holistic view.  

OpenTracing tools assist with capturing timing, events, and tags of transactions between microservices.  The service mesh Istio can also help to identify service to service latency in real-time. 

Although OpenTracing tools are powerful, they can be resource intensive. Logging the state of the application during runtime can result in a performance overhead. Another problem with these tools is that they don’t provide run-time debugging within your IDE so that you can set breakpoints, step through your code and follow variables without having to context switch between environments to track down an issue.  

In the past, multi-threaded applications presented the same type of debugging problem as microservices do now. But multithreaded applications existed on a single machine; where you had debugger support for the different threads all from the convenience of your IDE.  

New tools are being developed that can help with your microservices debugging efforts and that allow you to remain in your local environment to perform runtime debugging.  Today we’re going to have a look at two of those tools: 


What is Squash?

Most debugging tools for traditional apps are integrated with IDEs.  But as mentioned, these tools can not perform runtime debugging on distributed applications.  This limitation is the main reason for the creation of Squash. 

With Squash, you can do things like set breakpoints in your code, evaluate variables on the fly, step through your code while following a transaction across different microservices, and change variables during run time. 

Squash is deployed to the cluster as a server and a daemonset with your IDE acting as the UI for Squash. Once the applications’ pods have been retrieved, use your IDE to attach to one of the running pods where you can select the service on which to start your debug session. 

For multiple microservices, open a window for each on your IDE and attach to the relevant service. 

The project is open source and encourages individuals from the community to contribute new IDE extensions. The following IDEs are currently supported:

Check out the Squash GitHub repo for more information. 

What is Telepresence?

In the same vein as Squash, Telepresence also allows you to use your preferred debugging tools while developing your service in an IDE.  

With Telepresence, a running service in Kubernetes is swapped out and replaced with a proxy to a local process running on your laptop.  This gives the local service you’re working on full access to the ConfigMap, secrets, as well as to the other services running on the remote cluster. 

In addition to local debugging, Telepresence also offers the following features for developers: 

  1. Fast, real-time local development of a single service, even if that service depends on others in the cluster. Make a change and see that change in action right away.
  2. Use of any local tool for testing/debugging and editing, including your own IDE.
  3. Your local development environment becomes a part of the cluster, so that you can test or run anything against it.

Telepresence is currently a sandbox project at the CNCF

From: Why Telepresence?

GitOps and Telepresence

You can also create an end to end GitOps workflow with Telepresence. 

Find out how by watching this demo video on Developing with Kubernetes: Seamless Dev Environments and GitOps.

Differences between Squash and Telepresence

Squash is dependant on an IDE. It currently only supports Visual Studio, and IntelliJ, but more integrations are pending.  Telepresence on the other hand, supports other debuggers besides only those that are integrated with your IDE and so in this sense, it is more flexible.  

Telepresence also goes beyond debugging and allows you to test and run services from your local environment on the fly, aka live coding, which is pretty slick. 

Alternative debug environments to Telepresence and Squash

Tools like Telepresence or Squash can save you a lot of time as well as maintenance overhead where instead you would need to resort to one of these configurations:

  • Running your entire distributed application locally with Docker Compose. This gives you a fast dev/debug cycle, but, it doesn’t reflect a real running cluster.  If you are using any cloud services (such as RDS) these also might not be easy to access locally.
  • Minikube. You can't do live coding/debugging with minikube by itself, but you can with Telepresence.
  • Run everything in a remote Kubernetes staging cluster where you can perform live coding/debugging against the remote Kubernetes cluster. You can do this with Telepresence but not Squash.

Final Thoughts

In this post we discussed the challenges of debugging microservices in a cluster. Then we looked at two debugging tools designed for the job: Squash and Telepresence.  Both approach the problem by allowing developers to use their local IDE environments, but Telepresence offers slightly more functionality in that it also provides developers with a way to test local code in real time against a running cluster. 

For a live discussion on Developer’s toolkits, tune in and participate in our online meetup “Developing with Kubernetes” Office Hours with Ilya every Thursday at 10:00 PT / 18:00 BST (British Standard Time).

Need Help?

We can help you accelerate your Kubernetes journey with our subscription service that supports installing production-grade Kubernetes on-premise, in AWS and GCP from dev to production. For the past 3 years, Kubernetes has been powering Weave Cloud, our operations as a service offering, so we’re taking our knowledge and helping teams embrace the benefits of cloud native tooling. We've focused on creating GitOps workflows - our approach uses developer-centric tooling (e.g. git) and a tested approach to help you install, set-up, operate and upgrade Kubernetes. Contact us for more details.