In a galaxy far, far away, and not too long ago, I joined uShip as a DevOps Engineer where the scope of our team was to manage everything from the CI/CD pipeline of our monolith to all of the application configuration in dev and prod, all while maintaining the infrastructure of the supporting products (CDN, APM, logging, etc). I inherited a huge monolithic environment with no documentation, and everyone was operating on tribal knowledge. You’ve probably heard this story hundreds of times.
Eventually, the business decided that it was a distraction for the DevOps Team to continue supporting engineering as a single focus. As a result, the DevOps team was siloed off and was given a new name – the Cloud Operations Team. Or as we liked to call it, the Bespin Team, nicknamed after a planet located in a desolate sector of the galaxy in Star Wars.
The scope of our team had drastically changed. We were tasked with implementing new tooling, innovative development paradigms, and to modernize the platform and get it into the cloud. We became a tiger team, a group of experts tasked with solving this giant project.
It was bigger than an evolution. It felt like a revolution
But on the other side of the spectrum, the developers became the Wild, Wild West. They decided that production must go on, and pushed out new features like a donut factory. The developers wanted to adopt microservices, build micro front-ends, push out Lambda services, all while transforming the application into a polyglot by using React, NodeJS, .Net Core and Python. It was a development revolution.
What we began to notice was that every time the Bespin team hit their milestones or built something new, the developers got a new toy. When we built the VPC’s using Jenkins and Terraform, they now had Lambda services to play with. With Jenkins in place, they learned Groovy and started building their own pipelines. With Artifactory, the developers now had a place to store their NPM packages. With AWS SSM, they could now manage their own configs. And with Okta, we had SSO, which eliminated the login management for all of our tools.
The developers wanted more control in building their applications
With this new foundation, my team found that we were no longer in charge of CI/CD, building pipelines for the new projects or microservices, or managing configs. The large monolith that ran on tribal knowledge was finally documented, and we were able to empower the developers to take on these roles and responsibilities. We found out that they wanted more control in building their applications, and even managing tools like the CDN or APM. Our DevOps team completely dissolved and transformed the entire team into a highly successful DevOps culture.
Our focus quickly shifted onto the new and vast subset of problems
But what happens when an infrastructure team finishes the job that they were hired for? Well, that completely killed our team’s scope – again! With no infrastructure to manage and a legacy application that would eventually be killed off, we had to swing the pendulum. Our focus quickly shifted onto the new and vast subset of problems that erupted with our transition to the cloud. Our MTTR increased as the incident alerting was based on the monolith, we lacked visibility into all of our interconnected systems that talk to each other, and our on-call process didn’t scale anymore. With all of these new problems, we defined a new scope – to focus on the scalability and reliability of our service while educating the broader team. We also needed to discover and define monitoring metrics with the teams, identify and prioritize security issues in the code, and implement security testing in the pipeline. We evolved into a Site Reliability Engineering (SRE) team.
We’ve gone from silos to symbiosis
Two years later, we fully migrated to the cloud with only a few hours of downtime on a Friday night fueled by pizza. We’ve gone from siloes to symbiosis; working closely with our broader DevOps team on new infrastructure decisions, and with our QA team on the on-call and incident resolution. Together, we remain focused on creating automated testing for performance and security. But all of this wouldn’t be possible without the right tools, such as ELK for centralized logging, Grafana for holistic dashboards, PagerDuty for incident management, and Instana for fully automatic monitoring.
Moving to the cloud can fix many monolithic issues. It can break barriers. It can empower your developers to take on more responsibility and innovate. Yes, it opens Pandora’s Box of complexity and introduces a subset of new issues, but if your operating team is nimble enough to pivot based on the needs of stakeholders, DevOps as a culture is highly achievable.
If the Bespin team at uShip did it, so can you.
To hear more about uShip’s transformation and how it all works today, watch my talk “Return of the SRE: The Journey from Silo to Development Function“.
Greg Cervone is the Manager of Cloud Engineering & DevOps at uShip, with 15 years of experience in development, cloud operations, and instilling DevOps as a culture. He is a huge Star Wars fan. You can connect him on Linkedin.