It's not the DevOps engineers’ fault that the ecosystem is broken

By understanding the complexities involved in software development, we can shift the focus from blame to collaboration and work toward creating a more efficient and effective ecosystem.

Apr 18, 2023

In the world of software development, DevOps engineers play an important and oft-underappreciated role. They bridge the gap between development and operations teams, automate processes and ensure that software releases are error-free and delivered to the end-user quickly and reliably. Altogether, DevOps teams are the unsung heroes who ensure the smooth functioning of the entire system.

Unfortunately, DevOps engineers are often the ones who take the blame when things go wrong, a misapprehension that can be damaging and counterproductive.

It is essential to understand that the DevOps engineers’ role is to facilitate communication, collaboration and automation between teams, not to singlehandedly fix a broken ecosystem or operate in a vacuum.

In this post, I will explore the true factors behind a broken ecosystem and, importantly, why DevOps engineers are often not at fault, drawing upon my own experience as the chief technology officer and co-founder of Bridgecrew*.

Breaking the myth: Version control system is not controlling the version

The code is more like guidelines, rather than actual rules — Source: Versioning Software

Let's consider a common issue within an ecosystem: Say you’ve encountered a bug in version MAJOR.MINOR.PATCH of the production environment that you are unable to reproduce in the development environment from the same semantic version.

In this case, the root cause is with the source-control management system, which suggests that artifacts, internal or external, that are packaged and tagged together will produce the same software the next time it is built. This suggestion is a misleading one, as artifacts can be overridden, withdrawn and forged.

When this happens, the responsibility of addressing the issue will likely fall on the DevOps/Platform team – specifically the person in charge of release engineering, reliability and packaging. They may have to set up a call to discuss the problem, with the goal of achieving full reproducibility from the production environment to the development environment.

At this point, you may wonder, "What can we do about this? Can't we use artifact signing, immutable repositories, Docker images or Nix to ensure reproducibility?"

The need to have additional tools for version-controlling the software means that version-control systems aren’t really controlling the version of our software and that we need other tools to solve the problem in the software supply chain.

Breaking the myth: It’s safe to upgrade production once CI,CD are in place

Having a person responsible for the software release in the engineering team often creates the confidence that it is safe to upgrade production once the proper tooling, people and processes are in place. If the latest release is breaking the production environment, our first call will be to the DevOps team, because they are the ones to set up the CI/CD process that deploys to production in the first place.

But the reality is more complicated. Often, different deployment environments contain other data, and operations like data schema change/migration are still hard to test and verify in the development stages – the same applies to API contracts.

DevOps teams need to plan for the worst-case scenario – rollback, backup and recovery – and have a few sleepless nights now and then because, unlike application quality, data quality is not (yet) a first-class citizen (yet) of the software-development toolchain. Often, people are gluing data-oriented changes, which are not as tightly coupled with version-control systems as the application code itself.

Communication between DevOps and application teams is crucial to ensure that data quality-related changes will not break the application. And counting mostly on communication is not a well-engineered process to have in place.

Breaking the myth: K8 is scalable

Creating a scalable, by-design engineering organization in an early-stage startup occurs to every VP of R&D. But as I realized in founding my own company, and in the years since, implementing “scalable technologies” in the engineering org can sometimes be the wrong decision. In fact, I have seen these implementations backfire by slowing down operations.

For example, in my past as a software engineer, I’ve used “big data” – Hadoop – to scale the operations of our data-engineering team. At the time, we didn’t realize that having a few people maintaining the HDFS cluster required more resources than simply compressing existing data or filtering unneeded records. When HDFS failed to scale, we had to hire more DevOps engineers to maintain the cluster.

A few years later, when I tried Kubernetes, I felt similarly: It was too much too soon. To be fair, I could have done so much with it… later on. But for a long time, and for a significant number of users, serverless workloads – lambda functions and ECS Fargate – within Bridgecrew’s* architecture has held still. And it’s processing a lot of data!

When the DevOps team at Bridgecrew* considered migrating from serverless workloads to K8 we understood that choosing K8 had a "sunk cost" of 6 engineers maintaining the dev, staging and production environment. Those are engineers wiring and gluing together, keeping the servers up and running, and for a small engineering org of 1-100 engineers.

Needless to say, that isn’t a scalable approach.

Breaking the myth: You can shift security left

Security has always been a collaborative effort between security practitioners and IT administrators. The shift of responsibilities between security practitioners to development and DevOps teams is not new, but it assumes that the DevOps team is not only more suitable, but more potentially productive in executing the source change.

Say I’ve chosen a version-control system (that doesn’t actually control versions) and a supposedly scalable compute infrastructure (that actually claims half of my engineering team’s firepower).

Now, I am productive enough to listen to PR bots suggesting that I make my code more secure, but I’m also cynical. I’m cynical because 15 years ago, writing code was simpler and more productive: I would produce a DLL, put it under an IIS server and ultimately, I’d get what I expected.

Compilers do a lot of the security work of writing private, protected and public classes together, but when those classes become “cloud services,” a compiler cannot do the wiring.

In this situation, the right thing to do may be to rewrite some of the operational tools – not to put more people and processes around the tooling.

In conclusion, DevOps engineers are often blamed for a broken ecosystem, but this blame is undeserved. DevOps cannot fix all of the many issues that contribute to a broken ecosystem – and to do their jobs and resolve issues, they require the support of other teams.

By understanding the complexities involved in software development, we can shift the focus from blame to collaboration and work toward creating a more efficient and effective ecosystem.

We are all in the process of patching our software development toolchain into one that is less broken. I look forward to seeing what building software will look like in a few years. Hopefully, we will have fewer people to blame and fewer bugs!

*Denotes a Battery portfolio company. For a full list of all Battery investments, please click here.

The information contained herein is based solely on the opinions of Barak Schoster and nothing should be construed as investment advice. This material is provided for informational purposes, and it is not, and may not be relied on in any manner as, legal, tax or investment advice or as an offer to sell or a solicitation of an offer to buy an interest in any fund or investment vehicle managed by Battery Ventures or any other Battery entity.

This information covers investment and market activity, industry or sector trends, or other broad-based economic or market conditions and is for educational purposes. The anecdotal examples throughout are intended for an audience of entrepreneurs in their attempt to build their businesses and not recommendations or endorsements of any particular business.

Content obtained from third-party sources, although believed to be reliable, has not been independently verified as to its accuracy or completeness and cannot be guaranteed. Battery Ventures has no obligation to update, modify or amend the content of this post nor notify its readers in the event that any information, opinion, projection, forecast or estimate included, changes or subsequently becomes inaccurate.

It's not the DevOps engineers’ fault that the ecosystem is broken

By understanding the complexities involved in software development, we can shift the focus from blame to collaboration and work toward creating a more efficient and effective ecosystem.

Breaking the myth: Version control system is not controlling the version

Breaking the myth: It’s safe to upgrade production once CI,CD are in place

Breaking the myth: K8 is scalable

Breaking the myth: You can shift security left

Discussion about this post