At Weaveworks, we containerise as much as possible, to simplify packaging and deployment. So, most of the time, whatever you ask our software to do will run inside a container.
One day while testing, one step of network set-up got stuck, so I hit Ctrl-C. And nothing happened. I’m naturally inquisitive, so I went looking for the reason why.
First guess: maybe Docker was blocking it? Ctrl-C is delivered by Linux as an out-of-band signal named SIGINT. No, according to their documentation:
--sig-proxy=true Proxy received signals to the process (non-TTY mode only).
=true indicates this option would be on by default, and we weren’t using TTY mode.
At this point my colleague Adam got interested too, and wrote some test programs to clarify what was happening. To our surprise, the answer was that the signal was being delivered, but wasn’t doing its job. Unix signals all have a default action if the process doesn’t install a handler and for SIGINT the default action is to terminate the program.
Knowing a bit more about the symptoms, we were able to find other people who had the same problem. For example these Docker issues:
On the Docker GitHub there is an extensive discussion at Issue #3240 which indicates the root cause of the mystery – inside the container our process is running as PID 1, and Linux has very special handling for PID 1 because it expects it to be a master process in charge of all other processes, and it expects you don’t want that process to die. Docker is shifting those rules, and Linux hasn’t shifted its behaviour to match.
I tracked down the specific line in the kernel where this happens – any signal with default handler is ignored for a process flagged with
SIGNAL_UNKILLABLE, which is true for PID 1.
Our solution was to create a wrapper process to run our set-up code, that catches SIGINT and SIGTERM and exits as expected. This will be in the 0.10 release of Weave. And to contribute a note to the Docker documentation to make this issue easier to understand for the next person.