Why “Just Add a Service” Is Architectural Debt

Architectural debt in SaaS rarely looks like a bad decision at the time. It usually starts with something reasonable — like “just add a service.” This article is about why that move quietly slows teams down, how it shows up in real systems, and what to do instead before your architecture starts fighting you.

There’s a moment in almost every SaaS codebase where someone says:

“Let’s just make this its own service.”

It sounds responsible. Grown-up. Scalable.

It’s also how a lot of teams quietly wreck their velocity for the next two years.

This article is about that move.
Why it feels right.
Why it usually isn’t.
And what to do instead when things start to hurt.

This isn’t theory.
This is stuff you only learn after shipping, breaking things, and cleaning up your own mess.

“Why does everything feel harder than it should?”

If you’re reading this, your SaaS probably works.

Users are signing up.
Revenue might even be coming in.
You’re not fighting fires every hour.

But…

Adding features feels slower than it did three months ago.
A “small change” touches four repos.
Debugging means hopping between logs, queues, and dashboards.
Nobody is 100% sure where certain logic lives anymore.

You didn’t do anything obviously wrong.

You just kept saying “sure” to new services.

“When did adding a service become the default move?”

Usually it starts innocently.

You ship a monolith.
It’s fine.
Then something gets a little messy.

Auth feels sensitive.
Background jobs feel noisy.
Sync logic feels risky.

Someone says:

“Let’s isolate this.”

So you extract a service.

It feels cleaner.
Less scary.
You can deploy it independently. Maybe.

Then another thing feels messy.

Extract again.

Before long, “just add a service” stops being a decision.
It becomes muscle memory.

At no point did you sit down and say:

“We are choosing coordination overhead in exchange for clarity.”

It just… happened.

That’s architectural debt.
Not because services are bad — but because the decision wasn’t deliberate.

“What problem were you actually trying to avoid?”

This is the uncomfortable part.

Most services don’t exist because of scaling needs.
They exist to dodge hard questions.

Questions like:

Who owns this data?
What’s the system of record?
What happens if this fails halfway?
Is this synchronous or eventually consistent — on purpose?
Can this be retried safely?

Those questions slow you down now.
Services let you delay them.

But you don’t avoid the cost.
You compound it.

You trade one hard decision today for ten harder ones later.

“Why things get slower after the third service”

One service is fine.
Two can be okay.
Three is where things start to drag.

Not because of latency.
Because of distance.

Distance between:

Where data lives
Where decisions are made
Where bugs show up
Where fixes need to happen

You add:

Network calls
Serialization
Versioning
Partial failures
Retry logic
Debugging gaps

And now every feature requires coordination.

Not with users.
With your own system.

That’s the tax.

“Async doesn’t fix unclear ownership”

When services start stepping on each other, teams often reach for queues.

“Let’s make it async.”
“Let’s add a worker.”
“Let’s just emit an event.”

Now you’ve added time as another axis of confusion.

Async is great when the ownership is clear.
It’s a disaster when it isn’t.

If no one can answer:

Who is allowed to write this state?
What’s safe to retry?
What does ‘done’ actually mean?

Then async just spreads the mess over time.

You don’t remove coupling.
You hide it behind delays.

Example 1: Auth and user data drift

The naive start

You start with a monolith.

Users table.
Auth logic nearby.
Everything is straightforward.

It works.

The breaking point

You add SSO.
Integrations.
Maybe enterprise customers.

Someone says:

“Auth is critical. Let’s make it its own service.”

Seems reasonable.

The symptoms

A few months later:

Auth says a user is active
The app says they’re disabled
Feature flags behave differently per request
Login bugs require tracing three systems

Nobody trusts user state anymore.

The fix

You pull auth back behind a hard boundary.

One owner.
One source of truth.
Explicit contracts.

Fewer network calls.
Stricter rules.

It’s boring again.
That’s a win.

“Your system didn’t get more modular — it got more distant”

This is the trap.

Services feel like modularity.
But modularity is about control, not distance.

A module:

Has clear inputs
Has clear outputs
Has rules you can enforce

A service:

Has latency
Has partial failure
Has coordination cost

If you don’t need those things, you’re paying for them anyway.

Example 2: Background jobs as a dumping ground

The naive start

Everything is synchronous.

Writes happen inline.
Cron jobs clean things up.

Simple.

The breaking point

Performance dips.
Timeouts appear.

“Let’s move this to a worker service.”

The symptoms

Soon:

Jobs run out of order
Retries cause duplicate side effects
Nobody knows which jobs are safe to replay
Production issues trace back to “some job”

Your system is now haunted.

The fix

You stop treating jobs as generic.

You define workflows.
Explicit steps.
Clear ownership.

Fewer job types.
Better observability.

The queue becomes boring again.

“Multi-tenant pain is usually self-inflicted”

Multi-tenancy exposes architectural lies fast.

The naive start

Single database.
Tenant ID passed around manually.
Works fine.

The breaking point

You add sync.
Then a “data service.”
Then a “sync service.”

The symptoms

Near-miss data leaks
Bugs only visible days later
Metrics don’t match user complaints
“Works locally” means nothing

Everyone’s nervous.

The fix

You centralize tenancy rules.

Make scope explicit at the repo layer.
Collapse services back into a shared runtime.
Make sync observable by default.

Suddenly, things line up again.

“What actually scales for small teams”

Here’s the unpopular truth:

Small teams scale by reducing choices, not adding abstractions.

What works:

Fewer primitives
Clear ownership
Explicit lifetimes
Strong internal boundaries

What doesn’t:

Premature services
Accidental async
“We’ll clean it up later”

You don’t need more boxes.
You need fewer places to put code.

“So when should you add a service?”

Rarely. And intentionally.

Good reasons:

Different team ownership
Different deployment cadence that actually exists
Hard isolation requirements
Regulatory boundaries

Bad reasons:

“This feels messy”
“It might scale”
“That’s how big companies do it”
“We can refactor later”

If you can’t explain the boundary in one sentence, don’t add the service.

“The fastest way forward is usually fewer boxes”

If your SaaS feels slow to change, don’t ask:

“What service should we add?”

Ask:

“What decision are we avoiding?”

Then make that decision.

Pull logic closer.
Delete services you don’t need.
Make ownership explicit.

Simple systems are not naive.
They’re deliberate.

And they let you ship.

That’s the whole point.