Please don’t let SRE go the way of DevOps

by Matt Ouille  Opinions  ,


It always starts with a Tweet heard around the world state, amirite?

I will start this by saying I respect both of these gents immensely. Both help put on DevOps Days Austin and are really key figures in the Texas tech scene. Whether you reside in Dallas or San Antonio you probably know both of these Austin residents names.

I’m going to answer these questions, as an SRE, and then I’m going to turn some fire in Ernest’s direction.

“Can someone describe the roles and responsibilities in an ops model versus a DevOps models versus an SRE model?”

This is a loaded question. I’m going to repeat two things and you’d benefit from remembering them:

  • DevOps is a set of philosophies for developing software (and increasingly systems, since they’re similar) in teams.
  • Site Reliability Engineering (SE and SWE) is a role which is succinctly defined as Operations through the eyes of a software engineer.

By understanding that, you should now understand that the only two groups are at odds here are:

  • Legacy Operations (Ops from here on out)
  • DevOps/SRE

How am I defining “Ops”? Good question! I tend to define “Ops” by the legacy model: eyes on glass, unmaintainable Bash scripts, and repeated fire. This is not to talk down on the folks that’ve sadled this role before, afterall, I’m one of them. Really at the root of ops is the businesses perception of the role: It is designed to patch and in no way be proactive.

This is by in large what I have seen across Texas (or, outside Silicon Valley if you’d like to phrase it that way) and, I believe, largely has to do with businesses inability to view or position themselves as software first.

SRE’s on the otherhand take an active position in the business. Not only are they resident technical SME’s in either Systems Engineering or Software Engineering (spoiler alert, both write software) but they’re adept communicators, equate outages, losses, and bugs to real dollars and cents. SRE’s are probably the most data driven people, by culture and personality, that you can find on this planet. SRE is most commonly differentiated from “Ops” as being proactive.

“What is the biggest value add of moving from ops to DevOps and what about moving from a DevOps to a SRE model?”

I answered most of this in the previous question. One thing I did not mention is Legacy Operations does not imply “engineering”, SRE does. A good anecdote/litmus test for this: if you can satisfy your business requirements for operations by functional outsourcing then you are likely not practicing SRE.

“What are the challenges with each of these roles”

Personal

Ops: Alert fatigue, constant fires, no light at the end of the tunnel, this list is honestly endless.

SRE: SRE implies a culture of continuous learning. This seems to be the hardest thing for most people entering the market. This can be alleviated by companies paying for conferences, training, etc

Organizationally

Ops: Start calculating the cost of outages in dollars and cents. That’s the challenge here.

SRE: The rub is usually in the implementation details from both technology leaders and their businesses.

Why do people get mad if you say someone is a “DevOps Engineer?”

I didn’t know there are philosophy engineers, did you? What you really mean is Operations Team Member, Release Engineer, Software Engineer - something that properly describes the role. I get a lot of people think titles don’t matter, but I’m actively telling you that your opinion doesn’t matter. Call me what I am and what I get paid to do :)

“I thought “DevOps Teams” are bad?”

This quote needs a frame of reference. What this refers to is when an organization is likely understaffed and has a group of roving people who go stand up CI/CD pipelines for teams, teach people to use Bazel, how to deploy with Kubernetes, how to even LXC, how to implement monitoring/observability/instrumentation tools et al.

Yes, it’s an anti-pattern. Remember that old acronym “CAMS”? If you’re doing the S, which is sharing, then everyone in your organization should be growing together not a select few “rockstars”.

“How come there are SRE teams now? What’s the difference?”

SRE’s are functionally communal, but I would also add that while they are communal they get dispatched out to teams to help. This is why SRE knows how to code (so they can contribute to the primary codebase as well as tooling) and why SRE’s are split between systems engineers and software engineers with a balance.

The SRE Team golden rule, as I call it anyway, is that SRE teams do not grow with an organization. If your SRE org is growing as the organization grows then something is wrong. SRE’s by their very nature automate solutions and do not require rolling/repeated task assignment.

“How do you manage the rotation of on-call software developers with SRE teams in small businesses?”

This can work a few different ways, all of them usually end in success but have their own challenges.

“You build it, you own it” - The role of a SRE here is to enable teams through process and policy. They can jump on calls to help resolve advanced issues outside the scope of a normal software or systems engineer. Ultimately SRE’s in this situation can still pull apps from production, especially if they affect the availability of other apps. App developers are primary on-callers here with a supplemental SRE on-call when it’s needed.

“SRE as the first line of defense” - This is what Google mostly wrote about which is that SRE’s operationally own certain production applications. Not all applications are SRE owned or led, sometimes you don’t want an app to have high availability. SRE’s would be the primary on call here and can turn over support to app developers if their apps become too problematic. SRE’s would institute Production Readiness Review guidelines to ensure that an app version being turned over to SRE for production submit would not cripple the small team.

“How does your business integrate Security with DevOps and SRE processes?”

“Security Concerns” are bugs. You prioritize them and resolve them like normal bugs. You don’t just let a bug keep your pipeline red do you?

“What incidents or growing pains has your business gone through to solve this integration?”

Generally, when I was working in more progressive operations roles (such as SRE) the growing pain was with leadership not understanding the SRE mentality or how to run a software first business.

Ernest’s Article

My first recommendation is to ignore the title, it’s poorly communicated in my opinion and really only appeases the kind of people who say, “We’re just doing the same thing over and over” to get out of growing with changing concepts.

I’ll use the example of Agile vs XP. Yes, everyone knows Agile is very similar to Extreme Programming - Kent Black had a hand in both. It doesn’t mean that we didn’t discover some things along the way of doing Extreme Programming that changed course of the philosophy and thus deserved a differentiating name. If you know anything about both practices then you’ll understand why and how they’re different.

What Ernest is rightfully pleading for is companies to not water down and drown SRE like they did DevOps; mindlessly applying the label and thus creating chaos and confusion around definitions and meaning. Respect the practices and disciplines of these engineers. Apply our organizational and professional titles correctly, invest yourselves in discovering the pillars of the subject and what it’s really trying to strike at.

I cannot tell you how many times I would walk into a “DevOps Engineer” interview and find out it was totally bogus release engineering position. Not only is that entirely boring, but the pay is substantially different because the skillset is substantially different.

In conclusion, why do I care so much? I actually got hired into one of these roles and had to work for a year for a business who just didn’t know what “DevOps” meant. It hurt me as a professional and was disruptive to how I viewed the operational support ecosystem. Nobody deserves that and the same is bound to happen to SRE in Texas unless we call this behavior out.


Share

comments powered by Disqus
Social Media
  • @codencombovers
  • mattouille
  • code_n_combovers
  • mattouille
  • 5854293/matt-o