We’ve just added a Time Travel capability to Weave Cloud. With Time Travel you can view the state of your system at any point in time. Do you want to know what a service was doing 10 minutes ago, or yesterday, or last year - well now you can with Time Travel. 

Time Travel appears in the Explore area of Weave Cloud. By default it shows the current live status of the services and the connections between them. Time Travel is especially beneficial if you want to troubleshoot an incident or gain additional insights on a newly deployed service for your application.

A typical use case

At Weaveworks we’ve been using Time Travel to verify that newly deployed fixes have improved the functionality or performance of our app in the cluster. We deploy the new fix and then can explore how the service is performing, comparing it against the previous weeks performance. 

The problem statement

Memcache latencies in Weave Cloud’s Deploy function were much higher than expected. 

The Analysis

The team suspected that either Flux or memcache was on a node that was particularly heavily loaded. 

Time travel allowed the team to see on which nodes the processes were scheduled and how much CPU and memory were being used. Further they were able to prove that the Flux process actually died at one point and was restarted (the PID had changed in the process). 

Based on the above observations, the team then determined that the problem was likely with Flux as the memory and CPU consumption kept on rising over time, died and eventually was being restarted. Those data points were correlated even further with the memcache access latency graph through Weave Cloud’s monitoring tooling which resulted in the team discovering a goroutine leak in Flux. 

How to 

Filip and I created a detailed walkthrough of how to use the new Time Travel feature:  

To get to Time Travel click on the button in Explore:

There are a couple of different ways to control it. Use scroll to zoom and adjust the timespan of your controls.

You can also explore specific time periods. For example, say you deployed an image in January of this year, it simple to explore the cluster around that time:

Clicking on the label for January automatically visualizes the state of my system at the beginning of the month.

If you have a specific time stamp then putting this into the entry box will take you that specific point in time. This is really useful if you’ve found something of interest in the logs and want to explore at that precise moment. 

Time Travel is really useful for teams that need to understand their clusters at a specific point in time, but also how the system has evolved over time. It’s all about being able to understand how services are interacting. We’ve been using it for a while and have found it particularly useful during troubleshooting:

·   View resources from the current state with one in the past

·   View the state of your orchestrator at different points in time

·  View container configuration from one deployment to the next

What’s coming next?

Over the coming months we’ll be expanding Time Travel and enriching it with annotated events such as component deployments and monitoring alerts, which will increase the range of situations where you can compare the system.

Time Travel helps you to make the most of all the data Weave Cloud has gathered about your system. It lets you track down specific problems but also understand the nature of changes in the system over time. 

We hope you enjoy it and we’d love to get your feedback through Slack or Chat.