Reading Time: 3 minutes
Codeship is at DockerCon 2015! This week, we’ll be providing summaries on our blog of some of the talks we attend at this two-day conference in San Francisco. If you are interested in Docker support from Codeship, click here.
As a senior director of engineering at Capital One, Bardwaj explained to his audience that the American bank wants to build its technology foundation to ensure leadership in analytics. And leveraging open source material is a big part of that plan.
Bardwaj said that Capital One wants to enable access to the best data analysis tools for all its associates, specifically to allow for:
- local testing
- on-demand evaluation environment
- workload isolation
- tool governance
To reach that goal, the platform and engineering team chose to use Docker to build an Analytic Garage. They had to engineer an effective architecture to evaluate and integrate a large variety and volume of tools and software packages. They needed something that would allow for continuous testing of tools to better meet the analytics needs of thousands of users.
Bardwaj stated that the Analytic Garage was built to create a separate environment for users to quickly prototype new tools. The Garage went through a few different versions before landing on this stack: Mesos Marathon, Docker, cgroups, RHEL 6.x, and GlusterFS. As a result, the Analytic Garage could handle:
- improved stability
- improved resource utilization
- more users
- more tools
- isolated workload
The Garage also integrated with the rest of Capital One’s Big Data ecosystem to enable agile progression of insights to deployment.
Now, how to get employees to use the Analytic Garage? Bardwaj said that to encourage adoption across the analyst community, the team developed a self-service UI, which offered:
- a web portal to instantiate containers and analytic services
- Kerberos integration with Hadoop and Hive
- integrated monitoring and metrics
- lifecycle management (container expiration)
- highly available cluster with Mesos Marathon
- shared storage using GlusterFS
The team also needed to minimize the complexity of adoption. They created a virtual private server by integrating multiple analytic services, apps, and tools into one Docker image. This approach offered a few advantages:
- familiar data centric sandbox image
- maximized portability and performance
- allowed for the use of hybrid tools
- reproducibility and auditability with a versioned environment
- volume mounted tool directory to screen new tools before they were integrated into the sandbox image
- ability to instantiate containers in seconds, despite the size of the image
Of course, the VPS approach had a few challenges in store as well:
- trial-and-error coordination of the initialization order of the services
- GlusterFS, Docker, Mesos-master, Mesos-slave, Marathon
- open source Gluster resilience is fragile
- Docker isn’t fully supported or stable on 2.x Linux kernels
- cgroups bug
- random reboots
- the device mapper is much too complex for use
Bardwaj stated that, at the end of the day, the Analytic Garage on Docker has significantly reduced the time it takes his team to evaluate and onboard new tools and solutions. It’s also helped accelerate the evolution of his team’s data technology strategy.
Specifically, he said, the Garage enables them to build, test, and iterate complete app prototypes using a “LEGO block” approach; it allows different groups to easily select and use the tools that they prefer.
Within Docker at least, its performance is comparable to bare metal, enabling analysts to run complex models. Bardwaj said they were cautious about this due to the fact that VM performance wasn’t acceptable. However, they’ve been happy to see Docker perform so well. They’re currently testing different approaches to persist DBS on Docker as an enhancement to their analytic ecosystem.