Graph Theory in Codeship Pro

Codeship Pro

Reading Time: 6 minutes

This post is part of a series detailing aspects of our release of Codeship Pro’s new networking, depends_on, and healthcheck features. If you are just now joining us, I suggest you go read from the beginning.

Today we will be discussing how the refactored Codeship Pro models the dependency graph inside your codeship-services.yml file. I will assume you have a basic understanding of what steps and services are in Codeship Pro. If you don’t, you can easily read more about them in our introductory guide.

Describing Dependencies

First, let’s talk about how steps relate to services in Codeship Pro. Non-group type steps operate on or within a single service — called the primary — and might depend on any number of other services to perform their work.

Currently, there are four ways to express a dependency between two services:

  1. volumes_from: Use volumes from other services. Dependency containers are created, but not started.
  2. depends_on: Expose other containers as hosts on the same network.
  3. dockercfg_service: Generate a dockercfg file with registry authentication information. Useful for pushing and pulling images from private registries.
  4. links: A legacy feature of Docker and deprecated in Codeship Pro, this exposes a dependency in a container using environment variables.

If any of these directives are used in the description of a service, the system will need to ensure the services referenced are started before the primary service of the step.

The Previous Implementation

As discussed in “Codeship Pro: Extreme Makeover”, Codeship Pro recently underwent a serious overhaul. During that overhaul, we redefined a lot of the areas of responsibility within Codeship Pro. One such area that was reworked was how we determine the boot-order of containers for a given step.

While circular dependency detection was performed at parse time, as soon as the build began and we read your files, the resolution of dependencies was performed as late as possible: at step execution time. Services were started in an ad hoc manner and done so alongside other mission critical tasks inside a series of recursive functions.

This implementation was fine while there were a small set of dependency types. But as the product has matured, the cognitive burden of making changes to these functions is higher than we would like.

The New Implementation

The primary goal of our remodel was to establish clear layers of responsibility in the new system. Graph resolution was considered to be a major responsibility, and it was decided that it should be completely separated into the manifest parsing layer of the system.

The result is much simpler code in other parts of the system, because the graph decisions have already been made.

While parsing the manifests, we check for circular dependencies and calculate the list of dependencies for each service in the file. We make use of two interesting and well-known concepts in graph theory to perform these tasks:

  • strongly-connected components
  • topological sorting

Introduction to Graphs

Before we get into the weeds, it’s worth discussing what it we mean by graph. Graphs are collections of nodes and edges arranged in pairwise relationships. They allow us to model complex relationships between entities.

For graphs in Codeship Pro, the nodes represent the services in a codeship-services.yml file, and the edges represent one of the four types of dependency relationships that can be expressed (eg, depends_on).

For our purposes, we are only interested in a specific type of graph: a directed acyclic graph (DAG). In a DAG, each node is connected to one or more other nodes via edges that point in a single direction. Also, DAGs do not contain cycles. This means if you start a node and follow all possible contiguous sequences of edges, you will never arrive back at that same node.

For example, suppose we have a dependency between two services app and database where app depends on database. The DAG representation of that relationship would look like this:

Strongly connected components

A directed graph can be considered “strongly connected” if there is a path between all pairs of vertices. This property can also be extended to include any sub-graphs or “components” of the graph.

Strongly connected components are circular in nature. So for a directed graph to be acyclic, it must contain no strongly connected components containing more than one member.

The directed graph above contains three strongly connected components. One of these components contains four members, which means that a cycle, or circular dependency, must exist.

In Codeship Pro, after we’ve constructed the graph of the services in your codeship-services.yml file, we calculate the strongly connected component representation of that graph (using Tarjan’s algorithm). If there are any components containing more than one member, we return an error describing the circular dependency we’ve detected.

To illustrate, the following example contains a circular dependency between services app and worker:

# codeship-services.yml
app:
    image: codeship/app
    depends_on:
        - db
        - cache
        - worker
db:
    image: mysql:latest
cache:
    image: redis:latest
worker:
    image: codeship/worker
    depends_on:
        - app

There’s a circular dependency between services app and worker. Codeship Pro will return the following error:

circular dependency detected involving services: app, worker

(Pruned) topological sort

An interesting property of DAGs is that they each have an inherent linear ordering called a topological order. In a topological order, all inputs of a node precede it in the list. In Codeship Pro, we obtain the topological order of a DAG, via topological sorting, to know what order to create/start service containers for steps in the codeship-steps.yml file.

Consider the following example:

# codeship-steps.yml
- name: Tests
  service: app
  command: go test -race ./...

# codeship-services.yml
app:
    image: codeship/app
    depends_on:
        - db
        - cache
db:
    image: mysql:latest
cache:
    image: redis:latest
worker:
    image: codeship/worker
    depends_on:
        - db

Unfortunately, the resulting topological sort contains all services in the graph. As you can see in the codeship-steps.yml file, the starting point of our dependency resolution is the app service. We only need the list of dependencies, and any upstream dependencies they might have, for that primary service. To obtain this, we perform a variant of the sorting algorithm. This works by only performing a depth-first search of node inputs outward from our primary service.

That’s more like it. Now we have a list that contains only the services necessary for app to do its work. The subsequent layer of the system — responsible for running our step — can safely iterate through this list of dependencies and create, and start in most circumstances, a container for each.

Conclusion

By formally modeling the way services in your projects interact with each other, we’ve been able to design a better system as a whole. As of this writing, we’ve already started seeing dividends being paid from this specific aspect of the new Codeship Pro.

The first feature to benefit from our new design is the recent depends_on release. As of the isolated networks release, each service running within the context of a step is a member of the same network and reachable by other services via DNS. The only change that was necessary to make depends_on work was introducing it as a new edge type in our computed graph.

The most recent feature to benefit is our support of Docker’s healthchecks. This eliminates the need for extraneous sleeps and other workarounds to synchronize with dependency container startups.

Subscribe via Email

Over 60,000 people from companies like Netflix, Apple, Spotify and O'Reilly are reading our articles.
Subscribe to receive a weekly newsletter with articles around Continuous Integration, Docker, and software development best practices.



We promise that we won't spam you. You can unsubscribe any time.

Join the Discussion

Leave us some comments on what you think about this topic or if you like to add something.