Incident Report: DNSimple DDoS Attack

Codeship News

Reading Time: 2 minutes

Please accept our apologies for the recent issue affecting your service, starting at 19:09 UTC on December 1, 2014.

What happened?

The vendor that provides Codeship’s domain name service (DNS), DNSimple, experienced a major volumetric distributed denial of service (DDoS) attack which impacted their service availability. DNSimple has issued an incident report detailing their outage as result of the DDoS attack.

At 19:09 UTC on December 1st, when DNSimple’s DNS service became unavailable, Codeship customers were unable to resolve codeship.com domain names once they expired from cache. TTL on Codeship’s DNS records were set to 3600 seconds, so by 20:09 UTC, most of the world could not connect to Codeship despite all systems operating normally. Our monitoring service alerted us to the failure, and our status page and twitter accounts were updated with detail. Unfortunately, the DNS failure did affect external access to our status page, as well, preventing us from doing a better job communicating the status updates.

How did we respond and recover?

As we monitored DNSimple’s progress in recovering from the DDoS attack, it became clear that the timeframe that they expected to restore service was unacceptable and would require us to take more drastic action. We made the decision to make a temporary move to another DNS provider. This process was complicated by the fact that DNSimple’s management interface was unavailable. Fortunately, our continuous deployment process for DNS records meant we had easy access to the info we needed. At 3:14 UTC on December 2, Codeship’s DNS records were successfully being returned by the new provider, and service was restored.

Several actions were taken and planned in response to this incident to prevent future occurrences. First, the TTL on Codeship’s DNS records were increased to 24 hours allowing for much greater resiliency to a failure like this one, thanks to record caching. Second, the Codeship status page was moved to a new address on a new domain name registered with a different provider. Additionally, we now have an immediate solution to moving Codeship’s DNS records from one provider to another in case of failure. Longer term, we are researching a more robust solution with secondary DNS servers on a new provider that would be able to slave our primary servers at DNSimple, allowing for automatic failover with failure. DNSimple is working on the necessary feature that would allow this.

Subscribe via Email

Over 60,000 people from companies like Netflix, Apple, Spotify and O'Reilly are reading our articles.
Subscribe to receive a weekly newsletter with articles around Continuous Integration, Docker, and software development best practices.



We promise that we won't spam you. You can unsubscribe any time.

Join the Discussion

Leave us some comments on what you think about this topic or if you like to add something.

  • Yes, this outage was really bad – thank you for this report. Our website talenthouse.com was unreachable the entire day. We will definitely either switch to a more robust DNS company or leverage some sort of secondary mirror. Please keep us posted how you move forward with the secondary DNS initiative.

    • We used Cloudflare as a fallback during the outage, which worked well. We’ll keep you up to date on future choice.

    • As Flo mentioned, switching over to Cloudflare worked great to get us through the outage. Unfortunately, having primary and secondary name servers hosted by different providers is more difficult than I would have thought it would be. DNSimple seems to be very focused on ensuring this is available for their customers, though. I’m sure they will have more info about this available soon.

      • Yeah, we didn’t have access to our zone file unfortunately. Thinking about copying your CD approach for that now…