Three services, one product: when the API shouldn’t do everything


Picture someone finishing an inspection on a phone in a yard somewhere. They hit save. Behind that tap is a tangle of rules: seals before inspection headers, inspection before status flips, and somewhere in another building a SQL Server instance that still owns the “official” record for half the business.

If you try to do all of that inside one HTTP request, you get timeouts, angry users, and deploys where a harmless API tweak accidentally breaks overnight replication. Splitting API, queue, and sync into separate services is how I kept that mess manageable.


The question that actually matters

What should fail independently?

  • If the mobile app can’t list today’s work, that’s a full stop.
  • If a downstream write to a legacy system hiccups, you want a retry, not a lost form and a confused inspector.
  • If you’re copying reference data from enterprise SQL into Postgres, that’s batch work—it shouldn’t ride in the same process (or deploy cadence) as user-facing APIs.

Once you frame it that way, “one big service” stops looking simpler and starts looking fragile.


Service one: the API as a thin, fast contract

The main surface is Koa, JWT auth, and PostgreSQL. It’s the thing phones and integrations talk to.

What made it feel good to build and to consume:

  • Two speeds of updates. Some screens only need a handful of flags; others need the full inspection picture. Splitting “patch these six fields” from “replace the whole master record” avoided giant payloads on simple actions and kept validation honest.
  • Transactions where tables split. When one logical save touched both a header table and a detail table, doing it in one DB transaction meant we weren’t debugging half-written states at 11 p.m.
  • Predictable JSON. Boring consistency beats clever nesting when three different client versions are in the field.

The API also exposes enough sync visibility (status, manual nudge) that support doesn’t need SSH to answer “did the data land?”


Service two: the queue—where order and retries live

Here’s the move that saved the most grief: anything that must happen in sequence and might fail goes to pg-boss in a dedicated service.

Think: seals first, then inspection payload, then close the loop on status—all talking to a legacy SQL Server system that isn’t going away. That sequence does not belong in the request that returns 200 to the app. The app gets a quick acknowledgment; the job owns the rest.

The queue service also ships with:

  • REST endpoints to publish work (with duplicate protection—same inspection shouldn’t enqueue twice and race itself).
  • Cancel / retry for ops when something external is wrong.
  • A small dashboard behind an API key, because staring at SELECT * FROM job in psql doesn’t scale when you’ve got multiple queues and people in different time zones.

The first time a downstream stored procedure threw on a Friday, nobody had to replay the mobile flow by hand. They opened the dashboard, saw the failed job, fixed the data, hit retry. That’s the difference between a system people trust and one they work around.


Service three: sync—the boring pipe that must not break deploys

The third piece pulls from SQL Server into Postgres on a schedule—inspections, terminals, railyards, tags, the usual “enterprise source of truth → app-facing copy” story. Optional HTTP triggers for “run terminals only” or “full refresh” when support needs it.

Why isolate it?

  • Deploy the API daily; touch sync weekly if you want. Different failure domains.
  • Tune batch size and interval without touching request latency or connection pools for mobile traffic.
  • When replication lags, logs and metadata (“last run”, counts, errors) answer “is data stale?” in one glance instead of a archaeology session.

What I’d hammer in earlier next time

  1. Treat duplicate jobs as a feature, not an edge case. Production day one, not week three.
  2. Give operators a screen. Logs are necessary; a queue dashboard is kind. Same for sync: visible last-success beats tribal knowledge.
  3. Name the failure domains out loud before you merge services. If two concerns can fail for different reasons, they’re candidates to split.

If you only remember one thing

The API’s job is to be reliable and fast. The queue’s job is to be patient and ordered. Sync’s job is to be boring and observable. Stack them that way and replication can hiccup without taking the whole app with it.