AI generated code complexity doesn’t come from bad code — it comes from missing decisions. AI makes it easy to add behavior without deciding who owns it, when it runs, or what it’s allowed to change. Early on, everything works. Then the system starts disagreeing with itself. This article is about the work AI skips, why that work still matters, and how real SaaS systems quietly fall apart when no one does it.

Table of Contents

Intro

AI made it faster to write code.

It did not make it faster to build systems.

That sounds obvious once you’ve lived it. But early on, it doesn’t feel that way. It feels like you just unlocked a cheat code. Endpoints appear. Repos fill out. Background jobs write themselves. The diff is huge and green. Momentum feels real.

Then a few weeks later, something small breaks.

And suddenly you’re staring at code you didn’t quite write, behavior you didn’t quite design, and a system that won’t answer basic questions anymore.

This article is about that gap.
The work AI skips.
The work that still matters.
And why ignoring it is what actually slows you down.

AI didn’t remove work. It hid it.

Let’s say the quiet part out loud.

AI didn’t eliminate complexity.
It relocated it.

Before, the work was obvious:

Writing boilerplate
Wiring endpoints
Duplicating patterns
Arguing about folder names

Now that stuff is cheap.

What’s expensive now is understanding what actually owns what.

AI happily generates code that looks coherent in isolation. It does not know which parts of your system are allowed to change state, which ones should only react, and which ones must never touch certain data ever again.

If you don’t decide that, the system will decide for you. Poorly.

Why did this feel easy at first?

Because everything works on day one.

You have:

One request path
One writer
One happy user
No retries
No historical state
No competing behaviors

AI shines here. The happy path is clean. The code reads well. Tests pass. You ship.

This is where people get tricked.

Early success doesn’t mean the system is correct.
It means the system hasn’t been tested by reality yet.

The missing decisions don’t hurt until there’s more than one way to touch the same thing.

When did the system start arguing with itself?

There’s always a moment.

It’s rarely dramatic.

It’s usually something like:

“We’ll add a background job to clean this up.”
“This repo method already exists, let’s reuse it.”
“We’ll just put a feature flag around it.”
“This sync path needs to update the record too.”

None of these feel dangerous on their own.

Together, they create a system that no longer agrees with itself about what’s allowed.

Now the same data can be:

Written by an API
Fixed by a job
Touched by a sync process
Altered by a feature flag
Retried by a worker

Congratulations. You have multiple authors and no editor.

Why debugging feels worse than before AI

This is the part people don’t expect.

The code is readable.
The functions are named nicely.
The abstractions look reasonable.

And yet… nothing makes sense.

You add logs.
You add more logs.
You trace requests.
You stare at metrics.

Everything says “success”.

But users are still seeing weird behavior.

This is where people start blaming AI. That’s a mistake.

The problem isn’t that AI wrote bad code.
The problem is that nobody decided how the system should behave under pressure.

AI is confident. Even when it’s wrong.
It doesn’t hesitate.
It doesn’t push back.
It doesn’t say, “Are you sure this should mutate state here?”

Humans are supposed to do that part.

The work AI doesn’t see

There is a whole category of work that never shows up in a prompt.

Things like:

Who is the single owner of this piece of data?
Is this operation a command or a reaction?
What happens if this runs twice?
What happens if this runs late?
What happens if this runs out of order?

AI will happily generate code that answers none of these questions.

And to be fair — you didn’t ask it to.

This is the hidden work.
The boring work.
The work that feels like “overthinking” until it’s missing.

The background job that “fixed” user state

Let’s make this concrete.

You have a user record.

The API updates it. Simple.

AI generates a repo method that updates status. It’s reused everywhere. Works great.

Later, you notice some users end up in weird states. So you add a background job. It scans users and “fixes” them.

The job reuses the same repo method. Why not? DRY, right?

Then a job retries after a partial failure.

The retry runs after a newer API request already updated the user.

The job doesn’t know that. It just “fixes” the state back to what it thinks is correct.

Users randomly revert.
Support tickets say “this worked yesterday.”
Logs show successful updates everywhere.

Nothing looks wrong.

The real problem is simple: you now have two writers and no ordering guarantees.

The fix isn’t to make the job smarter.

The fix is to make it stop writing.

The API becomes the single writer.
The job emits signals or flags inconsistencies.
It reacts. It does not mutate.

It’s boring.
It works.

“Just one more helper” is how systems rot

Helpers feel harmless.

Cleanup scripts.
Backfills.
Fix-up jobs.
“Temporary” migrations that never leave.

Each one feels like saving time.

Each one quietly becomes another way to change reality.

The danger isn’t the helper itself.
It’s that helpers almost never have clear ownership rules.

If a process writes state, it’s not a helper.
It’s part of your system.

Treat it that way or remove it.

Feature flags didn’t break your system

Misusing them did.

Feature flags are great for rollouts and experiments.

They are terrible as permanent logic switches.

AI loves feature flags. They’re an easy escape hatch.

Behavior A.
Then behavior B.
Then tenant-specific behavior C.

Suddenly the same request behaves differently depending on flags, tenants, and timing.

A long-running workflow starts under one flag state and finishes under another.

Same input. Different output.

Support says “it depends.”
Engineers say “I can’t reproduce it.”

The fix is simple and uncomfortable.

Resolve behavior once.
At the boundary.
Before the work starts.

If behavior changes mid-flight, you don’t have a feature flag.
You have a time bomb.

Repos aren’t abstractions — they’re promises

AI loves fat repos.

One interface.
One place for logic.
Reusable everywhere.

Feels clean.

Until it isn’t.

That repo method writes data, triggers side effects, assumes request context, and emits notifications.

Then it gets reused by APIs, jobs, sync processes, and admin tools.

Each caller assumes something different.

The repo doesn’t know that.

You get duplicate emails.
Phantom updates.
“We have no idea why this fired.”

The fix isn’t more comments.

It’s drawing lines.

Some paths write.
Some paths react.
Some paths only read.

AI won’t enforce that.
You have to.

Async work is not free

Async feels like cheating.

Just queue it.
Retry it.
Eventually it’ll work.

Except retries amplify mistakes, delays hide causality, and idempotency is always assumed and rarely proven.

AI will generate retries instantly.

It will not tell you what happens when the same event runs twice or arrives out of order.

If you don’t think about that up front, you’ll be debugging ghosts.

Why observability feels noisy instead of helpful

This one hurts.

You add logs.
You add traces.
You add metrics.

You still can’t answer: “Who changed this?”

Observability doesn’t fix unclear systems.
It exposes them.

If five components can write the same record, no amount of tracing will make that understandable.

You don’t need more telemetry.
You need fewer authors.

The real fix is boring — and that’s the point

The fix isn’t better prompts.

It’s fewer writers.
Clear ownership.
Explicit reactions.
Predictable flow.

It feels slower at first.

Then everything speeds up.

Because when something breaks, you know exactly where to look.

AI didn’t replace architecture. It made it unavoidable.

AI is amazing at generating code.

It is terrible at deciding what your system is allowed to do.

That part is still on you.

The teams moving fastest aren’t prompting harder.

They’re doing the unsexy work:

Defining ownership
Reducing writers
Forcing clarity

AI makes the gaps visible faster.

If you fill them, you win.

If you don’t, you just fail faster.

And honestly?

That’s still progress.