Architectural debt in SaaS rarely looks like a bad decision at the time. It usually starts with something reasonable — like “just add a service.” This article is about why that move quietly slows teams down, how it shows up in real systems, and what to do instead before your architecture starts fighting you.
Why “Just Add a Service” Is Architectural Debt
There’s a moment in almost every SaaS codebase where someone says:
“Let’s just make this its own service.”
It sounds responsible. Grown-up. Scalable.
It’s also how a lot of teams quietly wreck their velocity for the next two years.
This article is about that move.
Why it feels right.
Why it usually isn’t.
And what to do instead when things start to hurt.
This isn’t theory.
This is stuff you only learn after shipping, breaking things, and cleaning up your own mess.
“Why does everything feel harder than it should?”
If you’re reading this, your SaaS probably works.
Users are signing up.
Revenue might even be coming in.
You’re not fighting fires every hour.
But…
Adding features feels slower than it did three months ago.
A “small change” touches four repos.
Debugging means hopping between logs, queues, and dashboards.
Nobody is 100% sure where certain logic lives anymore.
You didn’t do anything obviously wrong.
You just kept saying “sure” to new services.
“When did adding a service become the default move?”
Usually it starts innocently.
You ship a monolith.
It’s fine.
Then something gets a little messy.
Auth feels sensitive.
Background jobs feel noisy.
Sync logic feels risky.
Someone says:
“Let’s isolate this.”
So you extract a service.
It feels cleaner.
Less scary.
You can deploy it independently. Maybe.
Then another thing feels messy.
Extract again.
Before long, “just add a service” stops being a decision.
It becomes muscle memory.
At no point did you sit down and say:
“We are choosing coordination overhead in exchange for clarity.”
It just… happened.
That’s architectural debt.
Not because services are bad — but because the decision wasn’t deliberate.
“What problem were you actually trying to avoid?”
This is the uncomfortable part.
Most services don’t exist because of scaling needs.
They exist to dodge hard questions.
Questions like:
- Who owns this data?
- What’s the system of record?
- What happens if this fails halfway?
- Is this synchronous or eventually consistent — on purpose?
- Can this be retried safely?
Those questions slow you down now.
Services let you delay them.
But you don’t avoid the cost.
You compound it.
You trade one hard decision today for ten harder ones later.
“Why things get slower after the third service”
One service is fine.
Two can be okay.
Three is where things start to drag.
Not because of latency.
Because of distance.
Distance between:
- Where data lives
- Where decisions are made
- Where bugs show up
- Where fixes need to happen
You add:
- Network calls
- Serialization
- Versioning
- Partial failures
- Retry logic
- Debugging gaps
And now every feature requires coordination.
Not with users.
With your own system.
That’s the tax.
“Async doesn’t fix unclear ownership”
When services start stepping on each other, teams often reach for queues.
“Let’s make it async.”
“Let’s add a worker.”
“Let’s just emit an event.”
Now you’ve added time as another axis of confusion.
Async is great when the ownership is clear.
It’s a disaster when it isn’t.
If no one can answer:
- Who is allowed to write this state?
- What’s safe to retry?
- What does ‘done’ actually mean?
Then async just spreads the mess over time.
You don’t remove coupling.
You hide it behind delays.
Example 1: Auth and user data drift
The naive start
You start with a monolith.
Users table.
Auth logic nearby.
Everything is straightforward.
It works.
The breaking point
You add SSO.
Integrations.
Maybe enterprise customers.
Someone says:
“Auth is critical. Let’s make it its own service.”
Seems reasonable.
The symptoms
A few months later:
- Auth says a user is active
- The app says they’re disabled
- Feature flags behave differently per request
- Login bugs require tracing three systems
Nobody trusts user state anymore.
The fix
You pull auth back behind a hard boundary.
One owner.
One source of truth.
Explicit contracts.
Fewer network calls.
Stricter rules.
It’s boring again.
That’s a win.
“Your system didn’t get more modular — it got more distant”
This is the trap.
Services feel like modularity.
But modularity is about control, not distance.
A module:
- Has clear inputs
- Has clear outputs
- Has rules you can enforce
A service:
- Has latency
- Has partial failure
- Has coordination cost
If you don’t need those things, you’re paying for them anyway.
Example 2: Background jobs as a dumping ground
The naive start
Everything is synchronous.
Writes happen inline.
Cron jobs clean things up.
Simple.
The breaking point
Performance dips.
Timeouts appear.
“Let’s move this to a worker service.”
The symptoms
Soon:
- Jobs run out of order
- Retries cause duplicate side effects
- Nobody knows which jobs are safe to replay
- Production issues trace back to “some job”
Your system is now haunted.
The fix
You stop treating jobs as generic.
You define workflows.
Explicit steps.
Clear ownership.
Fewer job types.
Better observability.
The queue becomes boring again.
“Multi-tenant pain is usually self-inflicted”
Multi-tenancy exposes architectural lies fast.
The naive start
Single database.
Tenant ID passed around manually.
Works fine.
The breaking point
You add sync.
Then a “data service.”
Then a “sync service.”
The symptoms
- Near-miss data leaks
- Bugs only visible days later
- Metrics don’t match user complaints
- “Works locally” means nothing
Everyone’s nervous.
The fix
You centralize tenancy rules.
Make scope explicit at the repo layer.
Collapse services back into a shared runtime.
Make sync observable by default.
Suddenly, things line up again.
“What actually scales for small teams”
Here’s the unpopular truth:
Small teams scale by reducing choices, not adding abstractions.
What works:
- Fewer primitives
- Clear ownership
- Explicit lifetimes
- Strong internal boundaries
What doesn’t:
- Premature services
- Accidental async
- “We’ll clean it up later”
You don’t need more boxes.
You need fewer places to put code.
“So when should you add a service?”
Rarely. And intentionally.
Good reasons:
- Different team ownership
- Different deployment cadence that actually exists
- Hard isolation requirements
- Regulatory boundaries
Bad reasons:
- “This feels messy”
- “It might scale”
- “That’s how big companies do it”
- “We can refactor later”
If you can’t explain the boundary in one sentence, don’t add the service.
“The fastest way forward is usually fewer boxes”
If your SaaS feels slow to change, don’t ask:
“What service should we add?”
Ask:
“What decision are we avoiding?”
Then make that decision.
Pull logic closer.
Delete services you don’t need.
Make ownership explicit.
Simple systems are not naive.
They’re deliberate.
And they let you ship.
That’s the whole point.