In a recent episode of 'The Art of Modern Ops' (this time in a video format!) WeaveWorks CTO, Cornelia Davis, hosted a very interesting discussion featuring the Department of Defense (DoD) and how it approaches modern cloud-native operations. Davis was joined by Nicolas Chaillan, Chief Software Officer of the U.S. Air Force, (Nicolas is also the co-lead of the DoD Enterprise DevSecOps Initiative, which is effectively bringing DevSecOps across the Department of Defense.) Bhaarat Sharma CTO, Raft, and Paul Otto, Principal Engineer, Raft. Raft is a digital consulting firm that has been contracted by the DoD. Let's look at the key highlights from this discussion.
First, Chaillan gave us the big picture perspective on what software looks like in the DoD. The team Chaillan leads is responsible for managing the software and technology operations that power many branches of the department. A job of this magnitude takes tens of thousands of developers working both within the DoD and externally as contractors. Chaillan and his team are tasked with enabling these developers to work seamlessly towards a common purpose.
Raft joins DoD's mission
Raft are engaged with the DoD on this all-important mission of national security, and spoke to the success that Chaillan has had in aligning a large and often bureaucratic organization to this shared goal. Sharma and Otto were all-praise for the DoD's embracing of open source technologies and its innovative spirit.
Platform One, Big Bang & Party Bus
Over the past few years the DoD has transformed itself from a large traditional organization to a modern one built on open source and modern cloud-native technologies. A by-product of this is their Platform One, a Kubernetes powered software platform which gives developers self-service access to resources they need. These are resources to cloud providers like AWS and Azure, or internal data stores.
The DoD has battle-tested Platform One and open sourced an instance of it named Big Bang, allowing any organization to instantiate and realize the same benefits. That is, any organization can get a replica of Platform One in their environment. For teams within the DoD that would rather not deploy Big Bang on their own, they can leverage Party Bus, which is a SaaS offering of Big Bang that’s managed by Chaillan’s team.
DevSecOps is the evolution of DevOps
But Platform One and Big Bang go beyond developer enablement, helping users to implement a security first strategy - the primary goal of DevSecOps. Security is baked into the platform so that developers only need to apply tags to certain resources to have capabilities, like authentication, enabled. Application developers are able to consume these security resources without ever reaching out to the cybersecurity people outside of their team. This bakes-in security from the get go and allows a security first mindset even when developers are working on an MVP.
Chaillan comments that there are three pillars to DevSecOps: Zero trust architecture, monitoring, and GitOps.
Everything we do is based on Zero Trust architecture - both for ingress, egress, and east-west traffic. We use the Istio service mesh, and that’s the foundation of everything in terms of security.”
When it comes to monitoring, Chaillan notes that behavioral detection is the key. He explains that this involves
...continuously monitoring the behavior of containers and seeing any drift of behavior as problematic. Everything is founded on that infrastructure as GitOps mindset where your design stays in Git, and everything is in Git. Your production or staging environment will pull from Git continuously to implement whatever change you want to make, to ensure you have no drift and no change between production and your desired state...So we ensure that thousands of clusters who end up running in DoD don't drift between each other.
We’ll dig deeper into the third pillar, GitOps, in a bit, right after talking about sidecars.
Sidecars for baked-in security
Davis steered the conversation towards sidecars, which is at the core of how security is implemented in Platform One. Sidecars, which are language agnostic, act as service proxies and allow for all traffic (ingress and egress) to flow through them before reaching or leaving a container. This greatly improves security as bad actors cannot easily hide their actions from the sidecar. Davis chimed in that sidecars are a 'distributed implementation of a platform.'
Istio service mesh relies on the sidecar to manage network traffic and enforces security measures. The first benefit of a sidecar model is, because it is injected alongside and not inside a container, you can independently update the sidecar or the container. The second benefit is that sidecars are injected automatically, no matter the workload. This means that even if your software team does not know about sidecars, they are still going to utilize the sidecar and its benefits. This is what baked-in security looks like in practice.
Chaillan notes that they use sidecars as a reverse proxy along with Istio as the service mesh. The reverse proxy is able to inject rules inside the IP tables to automatically create, for example, a mutual TLS tunnel. This gives Chaillan peace of mind to know that even if his team doesn’t think about encryption for container A talking to container B, the automatic injection of the sidecar will establish a mutual TLS tunnel using strong X.509 certificates.
Chaillan provided an example of how all this played out during the SolarWinds hack. The team was able to add a few additional detection mechanisms without having to coordinate with more than 150 product teams, and thousands of developers. They simply injected a new sidecar with no need to go through a laborious release planning process. That’s foundational to moving fast and reacting immediately to a zero day attack.
Drift detection with GitOps
Next, the conversation pivoted to GitOps to which Chaillan commented,
GitOps is game changing for the industry. It is a replicable, automated, immutable construct where your change management, everything happens in Git.
He credits GitOps with enabling and improving key operational tasks like change management, disaster recovery, networking and other infrastructure changes, and security practices to all be driven through Git interactions. Since changes are pulled (not pushed) from Git, it becomes easy to limit what a user can and cannot do. It disallows random
kubectl commands to be executed, or Helm charts to be installed. This further adds to strong security guard rails.
Chaillan emphasized the importance of drift detection when he said
The key is to monitor your staging and your production environment, and look at anything that’s not in Git. Anything that’s drifting or not present in Git is effectively either malicious or a drift. And that should never happen.
GitOps effectively reduces your attack surface by not allowing people to make manual changes to production - changes can only be applied through a reviewed change in Git. In the cluster, in the infrastructure, in your identity service, you do everything in Git. Drift detection doesn’t just make infrastructure management easier and more predictable, it greatly tightens security controls.
Ending on a high note, Davis and Chaillan spoke about the importance of open source software for any modern organization today. There are many enterprises that are weary of open source due to the lack of control and security. However, as Chaillan and the DoD have demonstrated that open source is not optional anymore, it is a necessity if you want to operate in a modern cloud-native world. And it is the only way to guarantee security and compliance.
Chaillan is proud that the DoD uses many of CNCFs projects, too many for him to mention off the top of his head. And adopting open source projects has had rippling benefits of attracting and retaining the best talent. He is confident that if the DoD, which operates in a highly regulated space, can make the transformation from legacy to cloud-native, DevSecOps, GitOps, and the platform model - any enterprise organization can do the same.
We've just highlighted the key points from the panel in this article, but to get the full scoop make sure to give the entire episode a listen.