Docker traps (and how to avoid them)

Trap #1

Dockerfile:  FROM ubuntu  (or fedora, centos, ...)   -OR-
docker run -i -t ubuntu  (or fedora, centos, ...)

Dockerfiles allow you to start from a base image (which is a good thing).  However, those base images are often very large:
  • ubuntu: 193 MB
  • centos: 236 MB
  • fedora: 374 MB
  • phusion/baseimage: 437 MB
Although Docker caches installed images so future containers based on the same FROM image will pull locally, that's still a big initial download for unreliable networks (e.g. WiFi, developing countries, etc), a lot of overhead for clusters managing load-balanced containers across servers/regions, increased storage space for regular docker backups, and (most importantly) they expose a much larger attack vector for hackers attempting to break into (or out of) your container.

Solution

If you need a named data volume container, use FROM scratch (which consumes a whopping 0 B of disk space).  Source: Michael Crosby, a member of the Docker Team  Update: Brian Goff shares an alternate opinion on this recommendation.

If you need to run a simple script, use FROM busybox:latest (2 MB disk space)

For everything else, use FROM gliderlabs/alpine:latest (5 MB disk space).  Alpine Linux is a security-oriented OS (often used for routers, firewalls, VPNs, etc.) that can run just about anything you need.  More info can be found on its Docker Hub page.  Update: Alpine Linux has some Java issues so I'm no longer recommending it as a good general use base image.  Update #2: now that Docker includes an official Alpine Linux image, an S6 wrapper is available, and Java 7 is supported, I'm now considering it again.  Update #3: Alpine Linux still has DNS issues that may be problematic for your use case.

Tiny Core Linux (7 MB, with lots of supported apps and the base for boot2docker) looks promising but no MySQL.  

Another promising option is SliTaz (9 MB) but their stable release is almost three years old and their English packages search page returns a 404.  

Ultimately, you'll probably need to compromise and follow Docker's official best practices recommendation to use FROM debian:latest (85 MB).  More info can be found on its Docker Hub page.  

Or, you can try FROM textlab/ubuntu-essential:latest (65 MB) as described here.  

If you want to standardize on Ubuntu and prefer a more "official" image you can use FROM ubuntu-debootstrap:latest (87 MB) which is an "official" Docker-sanctioned image but is not an official Canonical-approved Ubuntu image.  More info can be found on its Docker Hub page.

If you need an RPM-based solution, try FROM yamamon/centos7-zero:latest (137 MB) as described here.

P.S. Dray looks interesting (similar concept to Docker Nano, which is based on Buildroot).


________________________________________

Trap #2

Run everything as root.

This is obviously a bad security practice, but most Docker processes and images today run everything as root.  Not only is this dangerous for potential exploits launched from within the local container (like XSS, CSRF, shellshock, etc.) that manage to run arbitrary code, but can also allow hackers to break out of the container and achieve root on the host system (thereby allowing them to control all containers on the host and launch further network attacks).  For further reading:


Solution

The first step is to run your process as a regular unprivileged user.  See the official Docker best practices guide and the Dockerfile User documentation for details.  Note: the best practices guide also says "You should avoid installing or using sudo since it has unpredictable TTY and signal-forwarding behavior that can cause more more problems than it solves".  Also, for Alpine Linux, the command would be something like addgroup mygroup && adduser myuser -s /bin/bash -g mygroup

Furthermore, you should use a hardened Docker base image (like Alpine Linux).

Finally, you should use SELinux or AppArmor to separate containers from the host and each other.  Note: watch Pull Request #777 for progress of CoreOS SELinux or AppArmor support.  Update: CoreOS SELinux is supported in version 808.0.0!

FYI: As an aside, the recent Docker 1.2.0 release implemented --cap-add and --cap-drop capability so in theory you could do something like docker run -it --cap-drop=ALL --cap-add=SYS_TIME debian:latest /bin/bash to restrict all privileged permissions except the ability to set the system clock (no chown, useradd, etc.).  That said, Linux capabilities are murky and complex (for example, which capability allows adding users and groups?) and most of them allow privilege escalation anyways so I doubt this feature will be all that useful to regular admins.  Also, they added a new run flag called --privileged (disabled by default), which is a good step forward.

Conclusion: Even though Docker has passed the "1.0" milestone, it's not production-ready (unless you want to run a single instance in a VM without scale capabilities, container patch management, multi-tenant support, persistent data options, or full SDLC support*  ;)

...and then there's that whole PID 1 zombie problem...

* Update: Kubernetes appears promising to solve the aforementioned production limitations and LXD looks promising to solve the multi-tenant security concerns.

* Update #2: Snappy Ubuntu Core also looks promising for a multi-tenant security solution (but the nascent project still lacks a broad selection of "snappy apps").

* Update #3: Flynn has matured quite a bit since I last investigated it.  It supports a Postgres database cluster out of the box, which is a huge win over doing it manually with something like Crunchy Postgresql Manager (note: MySQL-to-Postgres converter can be found here).  Once they add persistent file storage (like GlusterFSLizardFS, or powerstrip-flocker) and get a better looking dashboard (like Cockpit or Rancher), it could be an awesome Docker PaaS leader.

________________________________________

Trap #3

Not including a tag when performing a pull or run.

Solution

Always provide a tag, even when the only option is "latest".  For example, instead of docker pull ubuntu you should run docker pull ubuntu:14.04

P.S. Use version number tags instead of distro code names.  "Trusty Tahr" may sound cute, but at a glance, ubuntu:14.04 is a lot more meaningful than ubuntu:trusty


________________________________________

Trap #4

Storing anything you care about exclusively in the container.

Valuable, persistent data (user content, databases, logs, etc.) should not be permanently and exclusively stored in a Docker container or VM host instance.  Containers (and the cloud VMs they're likely running on) are ephemeral by design, meaning everything gets wiped out when they restart or a new instance is provisioned. It's a paradigm shift from pets to cattle.  A lot of people are trying to find ways to keep data persisted inside and/or across ephemeral Docker containers, but here's the reality:
"if you put anything beyond your base OS on ephemeral storage, you are at great risk. That data could go away at any time. You can’t depend on it, so don’t use it unless you add in an additional form of redundancy at your own engineering expense. Data you care about belongs on block storage: it has built-in redundancy and improved availability, which ensure that the data you care about will be there when you need it."
     - Pete Johnson 

Solution

You have two options:

1)  Use a data volume or data volume container.  Essentially, you're mounting a directory from the host server to use within the Docker container.  When adding or modifying files in the container, you're transparently writing them to the host server instead.  Because of this, the data (within that specific mounted folder) persists when the Docker container stops.

Important: There's a subtle, but critical, difference between mounting using
          -v /mydata     (or Dockerfile VOLUME ["/mydata"])
and
          -v /mydata:/var/mydata

The first method allows Docker to control your data volume and Docker will automatically delete it once the last container referencing that volume is removed.  The second method cannot be used in a Dockerfile but has a significant advantage in that Docker will not delete the volume from disk until you explicitly call docker rm -v against the last container with a reference to the volume.

This "solution" isn't ideal (even if we use data only containers) because ultimately the data still resides on a single ephemeral host server and engineering a solution to support clustering across regions as well as managing regular backups will be inefficient and complex.

Update: distributed file systems are making good strides in this area.  See GlusterFSLizardFS, or powerstrip-flocker.  In theory, you would use a combination of a distributed file system for everyday live data handling and a traditional snapshot/archive offsite/cloud backup strategy for permanent storage.


2)  Mount a directory within the Docker container to a NAS or fuse it to a dedicated cloud storage solution (using something like s3fs, CloudFuse, or sshfs).  This approach is slightly less flexible because it's vendor specific, but overall the approach is simpler and more reliable.

Important: make sure to use local caching instead of fetching the file dynamically from across the network each time it's requested.  For example, s3fs has a use_cache feature that you must explicitly enable.

Conclusion: If cloud storage like S3 is designed for 99.99% availability and 99.999999999% durability and Netflix trusts them to store their content, isn't the answer obvious?  Use Option 2.  Update: due to network latency and potential unreliability, there are still performance (and single-point-of-failure) problems with Option 2 so I'm going to withhold judgement for the time being.

________________________________________

Trap #5

Installing or using SSH to login to your Docker container.

Solution

This is one of the most alluring and widespread traps for seasoned SysOps IT.  When it comes to Docker, though, you should use VOLUME or nsinit instead.  Jérôme Petazzoni (senior engineer at Docker) has a nice write-up explaining the reasoning and process.  Update: docker exec command now available in version 1.3.

________________________________________

P.S. Here's a nice read regarding Docker networking.

Comments

Popular Posts