Reading Time: 6 minutes
The radical shift toward DevOps and the continuous everything movement has changed how organizations develop and deploy software. As the consolidation and standardization of continuous integration and continuous delivery (CI/CD) processes and tools occur in the enterprise, a standardized DevOps model helps organizations deliver faster software functionality at a large scale. However, newer cyber threats, evolving regulatory requirements, and the need to protect brand reputation are putting tremendous pressure on IT leaders to effectively protect their customer and business-critical data.
Conceptually, a DevOps pipeline approach makes a lot of sense. However, in practice, site reliability engineering (SRE) and ops teams optimize systems for service reliability and robustness at the cost of delivering new features. The need for software reliability inherently decreases continuous delivery (CD) throughput. This conundrum is the biggest challenge for any organization adopting DevOps practices at a large scale today. By integrating and extending CI/CD with continuous resilience (CR) to provide protection against multitudes of software reliability disruptions, DevOps teams can confidently deploy new software and not affect the resiliency of the systems. In other words, Continuous Resilience is the radical new enabler that gives confidence to SREs and cloud operations teams to increase the speed of DevOps.
The need for app resiliency and compliance with DevOps pipelines
1. Instantly check, reject or rollback from erroneous deployments for resiliency Software deployments are complex and erroneous. Even when DevOps teams deploy ever smaller incremental, frequent updates using CodeShip, changes to infrastructure, cloud configurations, and bad containers can disrupt the robustness of your software systems. This is particularly complex when more and more microservices are added to the already complex distributed systems. By integrating Continuous Resilience (CR) into the CodeShip pipeline, you can instruct a system like Appranix to be your co-pilot to protect application environments with a continuous copy of cloud configurations, data and cloud services state without affecting your production environment. This allows recovery from disruptions at any point in time, giving a level of resiliency that was not possible before.
2. Instant rollback of cloud configurations and services Advanced CI/CD pipelines also deliver cloud infrastructure or Kubernetes infrastructure changes using infrastructure-as-code. There is always the good and tested code or bad infrastructure-as-code that gets pushed by the pipelines. SREs can’t always keep track of the changes to the infrastructure from 100s and 1000s of deployments. However, they can always recover bad cloud configurations that disrupt app resilience without a re-deployment. For instance, if security groups are deleted through a deployment, they can do granular recovery of security groups from Appranix. If load balancer configurations are messed up, Appranix can instantly recover the load balancer from its app environment time machine.
3. Instant test environments with real-world data to avoid resiliency in production Lack of testing with real-world data introduces an enormous risk to the reliability of the systems. What if you can get instant test environments with all other dependent services with the real-world data snapshots for automated testing along your DevOps pipeline? You can achieve this today with Appranix platform services integrated with CodeShip CI/CD. Moreover, you can test your software in another region of the cloud or even across another cloud provider.
4. Fast recovery from cyberattacks Ever-increasing cyberattacks like ransomware enable rogue groups to take over business-critical systems. Some of the recent notable ones are the Maryland City Systems ransomware attack, and Atlanta City’s ransomware attack, in which the ransom payments were lower than the actual recovery of the software systems. What if you had a copy of an entire application environment safely stored in another region of the cloud? What if the entire process is completely and transparently automated without affecting production systems? You can recover from any such attacks as quickly as possible without resorting to expensive, time-consuming efforts and, most importantly, save your organization’s reputation with quickly restored services. Once you restore the last known application environment, you can deploy the latest code from CodeShip to get the systems up and running while you figure out a permanent fix for your cyberattack.
5. Meet SoC/SoX compliance demands Most of the modern SaaS applications follow DevOps processes to keep up with the changes required. Service level requirements for multi-tenant SaaS applications are much higher than traditional applications. If these SaaS applications aim to achieve SoC II Type II compliance, organizations need to prove they have reliable recovery capabilities in another region of the cloud. In other words, the need for SoC compliance automatically drives the need for high availability and resilience. If you integrated your CodeShip deployed software updates with Appranix, you can automate the resilience compliance easily.
6. Protect against cloud provider failures Hyperscale cloud providers like AWS, Google Cloud and Microsoft Azure continuously work to improve their infrastructure and platform services. It is now possible to create more complex software systems with distributed architectures that allow easier updates and maintenance. However, even hyperscale cloud environments get massive disruptions due to configuration changes or capacity issues. Recent cloud outage issues, like Google Cloud Broke the Internet and Azure Cloud Capacity Issues in the UK, highlight the necessity to create resiliency at the application level with copies of application environments that will always be ready if and when you need them.
7. Be prepared for natural disasters Natural disasters create cloud service disruptions. Recent events like Lightning strike disrupts Azure highlight the need for second region protection. You can be well-prepared to recover or re-direct your application traffic to another region or another cloud provider with Appranix.
Create an app environment time machine to recover from disruptions
As you observed above, application disruptions are a normal part of the software development and operations process. Increasing complexity of software stacks and distributed architectures-enabled cloud platforms such as Kubernetes, demand better resiliency practices. Application resiliency should be considered as part of the software development process and should not be relegated to the end of the operations procedure.
When organizations integrate continuous resilience with CI/CD, overall application resilience increases dramatically. They can introduce an application environment time machine at the end of the CloudBees CodeShip CI/CD pipeline to take a continuous copy of cloud service configurations, application environment meta-data along with data snapshots to provide multiple levels of resiliency.
Integrate Appranix with the CloudBees CodeShip pipelines
It is very easy to integrate continuous application resilience into your CloudBees CodeShip pipelines. After a one-time discovery of your production environment in Appranix, you can integrate the CloudBees CodeShip project with a custom script for the deployment pipeline.
a) If you don’t have one already, create a new project in CloudBees CodeShip
b) Login to your code repository and connect with your CloudBees CodeShip project
c) Select appropriate CloudBees CodeShip project – Pro or Basic
d) Configure deployment branches with a custom script to include Appranix code to start managing App Environment Time Machine
e) If you want to rollback or recover from application disruptions or create an environment for testing, login to Appranix to select a timeline from the app environment time machine and hit a button to recover the application(s)
Achieve multiple levels of resiliency
With an application environment time machine, organizations can achieve three levels of resiliency for instant creation, rollback, and recovery of app environments running on cloud platforms such as AWS, Google or Azure or VMware. Protection and recovery across a different provider are easier with container-based applications running on Kubernetes systems. Most of the organizations will be satisfied with Level 1 and 2 resiliency architectures. Level 3 is possible for applications with container-based applications with fewer data transport requirements.
By integrating application resilience as an extension of CI/CD, DevOps teams can drastically reduce risks to system robustness while deploying new software. They can address service reliability issues proactively as opposed to the legacy reactive operations model. If software reliability is continuously automated, organizations can achieve resilience that could decrease disruptions by almost 50-300%.