Code Got Cheap. Production Didn't

Introduction

I came into work and a co-worker asked me to review some Terraform code that he had written. The Claude-generated module looked clean and correct enough. I started to cycle through my top-of-mind questions. How many possible variant deployment targets are there? Are there officially provided modules which already encapsulate this grouping of resources and which are versioned? Is documentation provided to whoever will maintain it? Can I deploy this locally and in the cloud using the same toolchain? The project was low-stakes enough that, while many of these questions were deemed "later problems", it got me thinking.

In this context, the AI-generated Terraform module was simply a means to an end: a pragmatic and expedient way of bootstrapping a greenfield MVP and delivering code to the client. From my perspective, it may well have been setting the groundstone.

As system administrators became scripters and coders, and full-stack devs started touching more of the infra via self-service public clouds, I am sometimes surprised at the unpredictable ways in which practices and technology choices converge and where there are sharp divergences of opinion and priority. As we move rapidly towards AI, I'm prompted to reflect on the different perspectives ops bring to the conversation.

The two cost curves

The concerns of devs tend to be: shipping on time, feature completeness and coherent DRY abstraction. Skillful and experienced devs pride themselves on their stewardship of a codebase by instilling the discipline of efficient, readable, tested code written in a conventional style with consistent usage patterns. The crucial questions are "does it work as intended?", "is it logically consistent?", "is it easily grasped by a new team member?", "are its dependencies easily managed?", "does it re-invent the wheel?". "Once this works, is the problem solved?".

From the ops point of view, the primary concerns are uniform and repeatable build recipes, consistent entry-points and instrumentation, portability across environments, performance assumptions supported by benchmarking, and scalability. The crucial questions are "how will this behave under load?", "what are its observed and/or probable failure modes", "who will operate it?" and "how much will this cost over time"? Or more precisely, "are the operational costs commensurate with the expected ROI?".

These diverging concerns also reflect different cost curves:

Authoring curve = the cost of authoring code
Operating curve = the cost of running, observing, securing, and evolving what that code becomes IN PRODUCTION

In a sense it doesn't matter if a single person, a dedicated team, a cross-functional team or an agent executes on one or more concerns so long as the concerns don't become conflated.

Now that agents are both writing and reviewing code; the need for developer stewardship appears to be up for debate with cost being the decider. If it's cheap for a machine to write, refactor and test code, who cares if it's beautiful or interpretable? Feature completeness, in this context, can be validated by a user, which ultimately trumps clean coding style, or so the argument goes. We can "fix it in the mix", so to speak.

What DevOps actually became

When discussions over technology choices split into camps and turn into debates, I tend to see it less as bikeshedding or user preference, and more as a conflict in priorities between differing sets of concerns.

Sometimes these divergences are about ownership and reflect the shifting "wall" between development and production: who debugs the CI tests when they block the build, and who is immediately tagged when a pod fails on startup because of a bad config? A common, but underwhelming compromise is the emergence of ops as a tooling or service department. Here, the "over the wall" problem doesn't disappear — it simply shifts and gets rebranded. Dev teams gained self-service infra tooling and ops became a platform team that unblocks them. The wall didn't come down; it got a portal.

Other times it's about architectural vision, and the structural mechanism is surprisingly mundane: ops is often not in the room when architecture decisions are made. The planning meeting, the RFC, the design doc review — these happen within product or engineering, and the platform team is brought in to provision infrastructure for decisions already locked in. The result is that ops shows up to execute architecture, not shape it. By the time they're in the room, the choices are already made — the data model, the vendor, the deployment assumptions. This is how you end up with multiple datastores with half-finished schema migrations, observability bolted on after the fact, and runbooks that nobody writes because nobody expected to be the one running it.

These decisions have tangible and often expensive downstream consequences. Multi-region support, data residency, and tenant isolation — increasingly important to compliance-conscious businesses — are architectural properties, not features you add in a later sprint. The operating curve doesn't care that the decision was made in a room where nobody was thinking about it. The cost accumulates regardless.

What AI-assisted coding does — and doesn't — change

The authoring curve is collapsing. Code gets cheaper to write, refactor, and test — what used to take a sprint can take an afternoon. The democratization that follows is real: more people can now shape product design regardless of coding background, and this is a good thing.

But the operating curve doesn't work in the same way. Running, observing, securing, and evolving what that code becomes in production doesn't get cheaper because the code got cheaper to write. If anything, the surface area expands. More code means more failure modes, more configuration surface, more things to monitor. The cost migrates rather than disappears.

In the context of AI, it is helpful to think in terms of personas and workflows. AI can be an assistant, a teammate, a planner, an architect — and it can enter earlier in the process, before choices get locked in. Context matters here, though. I wouldn't hire my personal assistant to fix my plumbing or electrical problems. The same logic applies: AI-as-coder is a different context from AI-as-architect, and conflating them costs you.

The risk isn't role confusion in the agent — it's role confusion in the humans directing them. If you use AI to ship faster but haven't changed where in the process the operating curve gets considered, you've just built the wall faster. The choices still get locked in early. Ops is still not in the room.

What changes the structural problem isn't AI generating code. It's AI brought into architecture earlier — planning, stress-testing assumptions before choices lock in. Whether a person, a team, or an agent owns a concern matters less than whether the concern is represented at all. The tool isn't the fix. The workflow is.

The new shape of the role

As the authoring curve dissolves, a reckoning is happening in the industry about what is actually worth building. No / low-code as a value proposition is less convincing.

I'd like to see DevOps evolve toward an actual product R&D department rather than a service department for dev teams. Crash test programs don't just absorb impact; they generate data that changes design before the car is built.

We need platforms built for collaboration, not for self-service. To use a video-game analogy: games that reward fast individual reflexes or achievement unlocks are less interesting now. The games that reward long-horizon thinking, coalition-building, and knowing when not to move are the ones that map onto what's actually hard. Speed is table stakes. What you're managing is a system with a lot of moving parts, limited visibility, and decisions that compound. Which game your team is playing is determined by what you reward — and most incentive structures still reward reflexes.

The questions I asked during that code review — deployment targets, versioning, maintainability, environment parity — were not code questions. They didn't get answered by the quality of the generated Terraform. They got answered, or didn't, by decisions made before anyone opened a prompt.

This is what remains. Code got cheap. The questions didn't. What's worth asking now isn't whether AI can write the module — it's whether your workflow gets the operating curve into the room before the choices are already made.