Container Security with Dan Walsh
There are several factors to consider when securing containerized applications. Where containers are deployed, how they are isolated and which capabilities to disable are important steps to take to ensure that your dockerized applications are secure.
The topic for the Weave Online User Group meeting on June 13 was “Container Security with Daniel Walsh (Red Hat)”.
Before going into an overview of container security, Dan went over some fundamental concepts about running an application. The talk was based on his booklet “The Container Coloring Book. Who’s Afraid of the Big Bad Wolf”, which uses the three little pigs story and what they live in as metaphors for how containers work and their security considerations. Each pig is an ‘application’ and where they live is the ‘platform’ on which it can be deployed.
Where an application can be deployed
A house represents applications that are hosted individually on a physical machine, and it is the most secure way of deploying applications. If a house is broken into, the others are still safe. The drawback to this though is that we must deploy and maintain tons of physical machines.
A duplex represents traditional server machine virtualization where multiple virtual machines run on a single physical host. The cost of maintenance is high if we were to support multiple Operation Systems. Sharing resources of one physical host or a pool of resources with other applications can also be risky if the hypervisor is compromised.
An apartment represents a situation in which each container runs efficiently on the host, and because they share the same OS Kernel, there is a low cost of maintenance. The drawback though is that once the kernel is compromised, all containers are compromised.
In the hostel metaphor services are running in containers side-by-side on the same physical machine. There is limited isolation of resources in this scenario, and if one service is compromised, there’s a good chance they all will be. But Dan believes that it is somewhat safe if SELinux (Security-enhanced Linux) is turned on.
The park metaphor is the extreme case and Dan uses it to illustrate applications running directly on a physical machine with SELINUX turned off.
Container Deployment Platforms
When a user chooses to deploy applications in a container running on a physical host, there are 3 different deployment models (as illustrated with the apartment metaphor):
- Apartment built by straw - deploy the containers on a do-it-yourself platform.
- Apartment built by stick – deploy containers on a community distro.
- Apartment built by brick – deploy containers on an enterprise-ready platform such as RHEL or OpenShift
Ensuring container security
Treat containers as regular services
Before getting into an overview of how to secure a container, it is important to keep existing security practices on any non-container environment. A few examples of these best practices are:
It boils down to applying the principle of least privilege and to validate the source of the containers.
Keep containers contained
A Namespace is a good tool for container isolation but not all entities of the Linux Kernel support namespaces. For example, kernel file systems such as /sys, /sys/fs or /proc/sys do not support namespaces. If everything in Linux supported namespaces, then it would be like another version of KVM.
We must be aware that there are many ways that a container can break out and exploit vulnerabilities in the Linux kernel and compromise the host and/or other containers. This is why we need to look at container security in detail.
A primer on Docker container security
1. Read only mount points
Certain files such as
/proc/sysrq-trigger, /proc/irq and
are essential for containers; however, containers only need to read from them. Therefore, one way to secure a container is to mount these essential files as READ-ONLY.
Around the year 2000, the Linux root privileges were broken down into a series of 32 capabilities. These capabilities are defined in
Capabilities can be dropped or unnecessary capabilities can be disabled for a container. There are 2 important Linux capabilities that are removed from containers:
CAP_NET_ADMIN and CAP_SYS_ADMIN.
CAP_NET_ADMINhas the ability to configure the network and has huge implications for container security. If this was present it means that the container can setup the network to connect with other containers, for instance, setting up ip routes, firewall rules, etc.
CAP_SYS_ADMINis a catch-all for lots of capabilities, but the important one is the ability to mount and unmount a file. When we mount the essential files as READ-ONLY, a container can remount them as READ-WRITE. By removing the CAP_SYS_ADMIN capability, we can limit the container’s ability to re-mount file systems.
<pine-through;">A namespace allows a container its own isolated instances of the global resources. Currently, containers support the following namespaces:</pine-through;">
Dan briefly touched on PID and network namespace for container security. Individual containers do not see the PID of other containers. They also do not connect easily to other containers in different namespaces. This provides some level of isolation and security.
While a namespace provides containers with a view of global resources, a cgroup or control group limits the usage of the resources that a container can use. An important aspect of a cgroup is the ability to control which device node can be created and used within a namespace. Device nodes such as/dev/console,/dev/tty,/dev/urandom and /dev/random are used to configure the kernel and by default are not available to a container with the use of cgroup.
Dan summarized nicely SELinux with this slide:
Everything is a label and the kernel enforces the rules.
Digging deeper into SELinux, there are:
This slide from Dan shows how Type enforcement relates to containers:
And this slide from Dan illustrates how MCS enforcement relates to containers:
Check out the “The SELinux Coloring Book” from Dan for more detail.
Containers can interact with the kernel system calls (syscall). Enabling SECCOMP limits the number of syscalls and thus shrinks the attack surface. On a 64-bit machine, all the 32-bit syscalls are blocked. SECCOMP also eliminates the support of old networking protocols such as AppleTalk or DECnet, which are not being used but present a risk to the kernel.
7. User namespace
This user namespace is a very useful security feature that maps a root user ID in the container to a non-root user ID on the host. The problem with user namespaces is that currently the file system does not support this feature, so it is an experimental phase for Docker default.
For more information about securing your Dockerized Apps, we encourage you to download Dan Walsh’s fun and educational ebooks:
For a full recording of the event, watch the video:
Thank you for reading our blog. We build Weave Cloud, which is a hosted add-on to your clusters. It helps you iterate faster on microservices with continuous delivery, visualization & debugging, and Prometheus monitoring to improve observability.