Reading Time: 9 minutes
Containerizing an in-house application can be complex; a great resource for guiding this process is Docker’s list of Dockerfile best practices.
Wrapping an application in a container is the easy part. Extracting your application into multiple components and deploying those components in a way that gives all the benefits of containerization is a little more complicated. With an internal application, it’s fairly easy to adhere to best practices, such as using volumes to mount database storage or limiting a container to a single process. In some cases, this may involve redesigning part of an application, which is doable with an internal codebase.
It’s not as easy to make these kinds of changes when you’re trying to run an opinionated third-party application. In many cases, users have limited control over data store and application configuration, as well as process coordination and execution. Certain applications have strong opinions about execution environments, or they make assumptions about ports or other resources. This can make it difficult to adhere to containerization best practices or coordinate node HA or failover.
This post will provide some basic containerization best practices with a focus on opinionated, third-party applications. For a background on this guide, check out my talk from Dockercon EU 2014.
Many applications require configuration to be loaded via a file. However, container configuration is much simpler via environment.
With an internal application, it’s generally trivial to switch configuration from a file format like YAML to environment, especially using an abstraction layer like Isomer. Third-party applications are not usually this flexible, so in order to configure your container via the environment, it’s necessary to use a configuration intermediary.
Confd takes a provided set of custom templates and writes config files based on environment or keystore contents. It’s a great tool for bridging the gap between inconsolable container and application-level configurations.
Using confd involves exposing configuration data via a keystore server (etcd/Consul) or directly as environment variables in the container and then executing confd as part of your container startup before your main application is run. This is the simplest method of coercing a complicated configuration method into something easily automatable within your container infrastructure. For an example of this configuration method, see my GMod Server repo.
Configuration via mount or add
While you can include static logic to configuring your application from environment or a keystore server, mounting the configuration from another container has its benefits. By running an ephemeral container to write any config files needed and mount volumes exposed by this container, you end up with a much more complicated and dynamic deployment.
The benefit of using this method is that your main application container becomes isolated from the logic of pulling and writing configuration information via your infrastructure. It focuses entirely on the execution of your application. Single purpose containers composed of bare application components are ideal deployments.
This method of configuration is not exclusive to using etcd or Consul. Rather, it simply abstracts the logic of executing whatever configuration integration is being used, be that etcd, Consul, Chef, or a simple file pull from a remote server. A subtle benefit of this method is that you can test your container by mounting a pre-generated configuration from a local folder. In your production environment, you could replace this with a designated container.
To see an example of config mounting, check out any container repo mounting a volume or adding a file for configuration, such as tutum/mysql. This uses a simple set of file ADD commands.
# Add MySQL configuration ADD my.cnf /etc/mysql/conf.d/my.cnf ADD mysqld_charset.cnf /etc/mysql/conf.d/mysqld_charset.cnf
Separation of tasks
With a third-party application, it can be hard to separate out different processes as the Dockerfile best practices suggest. In this case, we are heavily limited by the application points of entry.
The best we can do is separate out the commands we would normally run against the application. Rather than running these inside the primary application container, we can move the logic into a set of long- and short-lived containers linked via volume mounts or by proxying local sockets through container ports.
The main thing we want to avoid is having a single monolithic container running the entire application with admin comments being run via SSH. In this case, the container is acting as a virtual machine for the application, and we miss out on a lot of the isolation and distribution gains that containers provide.
An easy mistake to make is embedding a process in the main application container to extract data from the running app. I’ve used this before to pull out auto-generated credentials, to monitor state, and to track errors. The problem with this approach is that it adds unnecessary complexity; either the tracking process or the main application process must be daemonized, and the harder job of coordinating those processes falls on logic within the container.
A cleaner solution is to push all tracking, monitoring, or interaction activity to an external container and link it to the main application container via a volume mount or a socket proxy. I’ve done this in the past to parse through application logs to extract generated authentication credentials in an automated manner. Check out my Teamspeak watcher repository for an example.
Since we now have a single purpose primary container, we can push all coordination up to the container orchestration layer and build a stack of all of our application components around the primary application container. This way, we can let our orchestration layer control any components we define in a standard manner.
Not all applications use standard sockets or volumes for interaction, and in these cases the simplest solution is generally the best.
Let’s say you need to regularly trigger a command on your application console. In a non-containerized environment, this would involve a cron triggering a console command which can only connect to a local instance of the application. If there is no way to map a local socket to allow a remote console to connect, try to only add supportive infrastructure to your primary container and keep business logic to your external container.
One way of achieving this would be to keep a simple web server proxying remote commands running in your primary container, which has no context on what commands exist or what may get executed. Your secondary container can then send requests to the remote web server instead of a local console. This is not an ideal situation, since more complex components are running in the primary container. However, the opinionated nature of some applications acts as a barrier to containerizing.
Your final deployment may look something like this:
# start a primary app container exposing volumes for /opt/logs and exposing /opt/tmp/app.socket via a port docker run -v /opt/data:/opt/data --name=myapp1 foo/myapp # start a monitor container, polling /opt/logs for events docker run --volumes-from myapp1 --name=myapp1-monitor foo/mypp-monitor # start a temporary container to use an exposed port from myapp-1 to interact with the running application instance, and execute a create user command docker run -it --rm --link myapp-1:myapp --name=myapp1-controller foo/myapp-controller create user test
Check out bfosberry/teamspeak as well as the associated watcher and controller for an example.
Separate all data stores
Another fairly obvious point that is that all persistent data stores need to be separated. With an opinionated application, this may be as simple as writing a config file at startup and providing the external database that the application should use, or it could be as complicated as linking a subset of files within a directory to a mounted volume.
Some discovery may be involved in order to determine what components of the application require extraction. However, storage volumes tend to be universally compatible for this purpose. Consider a Postgres container mounting persisted data via a volume mount.
VOLUME ["/var/lib/postgresql"] VOLUME ["/run/postgresql"]
An easy mistake to make is to write a simple Dockerfile like this:
FROM ubuntu RUN apt-get install -y mysql-server CMD start.sh
where start.sh looks something like this:
/etc/init.d mysql start && tail -f /var/log/mysql.log
This will start the application as a daemon within the container and then tail the application logs. The problem is that there is no direct process control; any signals sent from the orchestration layer will hit the tail process.
By running the application process in the foreground, we maintain the signal pipeline. Any signals the orchestration layer sends to the container go directly to the main application process.
Here is an example of running Apache in the foreground from tutum/apache-php.
source /etc/apache2/envvars tail -F /var/log/apache2/* & exec apache2 -D FOREGROUND
Log to Infrastructure
A fairly obvious point is to aggregate and separate logs from your application. In most cases, logs will be written to a folder within the application or system log directory. Be sure to mount these to a logging volume or print them to stdout so they are included in your standard logging infrastructure.
In cases where you can’t configure the logging location in a way to suit your needs, you can always start background redirects to push logs to stdout or to another location. Ideally this would not even be a consideration, since your application should be running in the foreground. However, depending on the application, other components running in the container, and your logging infrastructure, there could well be a need to extract logs via a mount. In order to keep your primary container simple when using a custom log sink, the best approach is to log to stdout and collect logs via a set of containers. For an example, check out the Deis logger.
Benefits of Containerization
Follow CMD Best Practices
A fairly simple win when designing your set of containers is to adhere to the [best practices around the usage of ENTRYPOINT and CMD.
By using a generic entry point for all of your containers, with a different command depending on the usage context, you can reuse the same container image in various parts of your application deployment. This will reduce your repository and image footprint. In cases where my primary application is large (4 GB and large), I tend to avoid doing this to reduce the number of containers using larger images. For an example of this, check out mysql/mysql-server.
Barriers to Containerizing Opinionated Applications
Most applications were never intended to be run in containers and can only be containerized while kicking and screaming. Don’t let that stop you from trying — after all, theirs is not to reason why, theirs is to containerize. In the next sections, I outline a few issues you may come across and some workarounds.
Self-discovery may not support dynamic ports
Some applications, specifically many game servers, report their bound IP and port to a master server list for discovery. While a game server may be accessible directly via the dynamic port mapping, the port provided to players through a search will be the internal port, which is not externally accessible.
The simplest way around this is to allocate external ports on your infrastructure and use direct static mapping into the application container. In this case, your application will need to be configured to use what is essentially a dynamic internal port via some configuration parameter. You can see an example of this in my GMod Server repo.
Some ports may not be configurable
It may not be possible to alter some application ports; in other cases, the application may bind to both TCP and UDP on a single port. There is currently no way to tell Docker to bind to both UDP and TCP on a specific port via a single configuration directive to ensure cohesion between host and container.
Requesting a dynamic binding for 12345 and 12345/udp may provide a mapping with two different internal port numbers. Statically mapping ports is the simplest solution to this issue.
In cases where a port is not configurable, you can always use iptables to forward traffic from one local port to another. However, this adds complexity to the container and may not be compatible with cases where your application is self-discovering.
Some applications don’t suit dynamic deployments
With many game servers, the typical discovery process begins with a user searching for a server with free slots matching a specific criteria.
Assuming this self-discovery process is functioning correctly, the user will be able to connect and may bookmark or favorite the server. Usually this bookmark consists of a pure IP and port. With a dynamic infrastructure, the port and IP will most likely change should the game server get moved. With most game server traffic being UDP, it can be difficult to consolidate bookmarks and ensure a consistent experience.
One solution is to leave a “patch” in place in the old container’s location to forward traffic from the old host to the new one, mapping ports dynamically. With some of the more advanced SDN solutions available for container deployments, this issue can be solved with routable IPs. This allows us to treat many statically deployed applications as a highly available dynamic resource.
You can find a number of examples of these concepts as part of the GSCS schema project.