How We Process 1,000 Tests/Day on a Single Server (Architecture Post)

The free tier of Inbox Check runs on a single dedicated server. Not a cluster, not an autoscaling group, not Kubernetes. One box. It handles roughly 1,000 inbox placement tests a day, returns results in under two minutes, and costs us less than a Netflix subscription in compute. This is the architecture that makes that possible.

The constraints

Every architecture decision is downstream of three hard constraints we set at the start:

Free tier must stay free. If we cannot serve a reasonable volume on a single server we cannot keep the free tier. That forces efficiency in every layer.
Sub-two-minute latency. A test starts when a user clicks Run. Results need to be useful before the user gets bored. Two minutes is the upper bound; one minute is the target.
Twenty-plus provider coverage. Every test seeds every provider in the pool. We do not sample. A test either reports all providers or it reports none.

High-level shape


           user (browser)
                 |
                 | POST /api/test  (SMTP creds or one-click send address)
                 v
  +---------------------------------------------+
  |  Next.js API  (app router, node runtime)    |
  +---------------------------------------------+
                 |
                 | enqueue job
                 v
  +---------------------------------------------+
  |  BullMQ (Redis)                             |
  |   test-queue   --> worker(s)                |
  |   seed-poll    --> worker(s)                |
  |   screenshot   --> worker(s)                |
  +---------------------------------------------+
         |                |              |
         |                |              |
         v                v              v
  +--------------+ +--------------+ +--------------+
  | seed pollers | |  browser      | |   SMTP tx   |
  | (IMAP/API)   | |  queue        | |   (to seeds)|
  |              | | (Puppeteer)   | |             |
  +--------------+ +--------------+ +--------------+
         |                |              |
         +--------+-------+--------------+
                  |
                  v
          +------------------+
          |   Postgres       |
          |   (state, logs,  |
          |    results)      |
          +------------------+
                  |
                  | server-sent events
                  v
           user (browser)
           live result stream

The browser queue: Puppeteer with a bounded pool

The heaviest part of a placement test is reading seed mailboxes on providers that do not expose a usable API (Mail.ru, Yandex, some French providers, ProtonMail). For those we keep a logged-in Puppeteer session and navigate the web UI.

Puppeteer is memory-heavy. A naive implementation leaks Chrome processes and OOMs the server within hours. We solved this with a bounded browser queue:

At most N (currently 6) concurrent Chrome instances across the server. A seventh request waits in a FIFO queue.
Each Chrome instance is recycled every 50 jobs or 30 minutes, whichever comes first. No long-lived sessions.
Session cookies for each seed are persisted to disk and restored on Chrome start, so we do not log in every recycle.
A watchdog kills any Chrome that exceeds 800MB RSS or 60s on a single navigation. The job is retried on a fresh browser.

Bounded pool plus aggressive recycling is the whole trick. The server stays stable for weeks between planned restarts.

Seed-mailbox pollers

Providers that offer a usable API (Gmail, Google Workspace, Microsoft 365, Outlook, Zoho, FastMail) get a much lighter path. Each of those seeds has a poller that watches for new mail and reports folder placement to the main job.

Gmail uses the Gmail API with watch (push) notifications over a Pub/Sub-style endpoint we host ourselves. Microsoft uses Graph with change notifications. IMAP-only providers get IDLE connections that wake on new mail. The result is that for API-friendly providers, a test message's arrival is observed within a few seconds of delivery; for web-UI providers, we poll the UI every 10–15 seconds during a test's active window.

Postgres schema overview

The data model is small. Four tables do most of the work:

tests
  id             uuid primary key
  created_at     timestamptz
  sender_domain  text
  message_id     text
  status         enum(queued, sending, observing, done, failed)
  summary        jsonb

seed_results
  id             uuid primary key
  test_id        uuid references tests
  provider       text       -- gmail_consumer, outlook_consumer, ...
  folder         text       -- inbox | spam | promotions | updates | unknown
  observed_at    timestamptz
  screenshot_url text null

seed_mailboxes
  id             uuid primary key
  provider       text
  account_id     text       -- opaque identifier, not exposed
  state          enum(active, soaking, retiring, archived, suspect)
  canary_score   jsonb

incidents
  id             uuid primary key
  started_at     timestamptz
  resolved_at    timestamptz null
  severity       enum(sev1, sev2, sev3)
  component      text
  postmortem_url text null

A test's result is the join of tests and every seed_results row that carries its id. We use a partial index on status IN ('queued', 'sending', 'observing') to keep the hot path small.

BullMQ for job orchestration

BullMQ on top of Redis is the backbone. Three logical queues: test-queue (orchestrates a run), seed-poll (checks a seed for the expected message), screenshot (captures the folder screenshot once placement is known). Each has its own concurrency cap tuned to the server's CPU and memory budget.

One thing we rely on that is not obvious from BullMQ's documentation: the repeatable-job feature drives the canary campaign. Every hour a single scheduled job fans out into one known-good and one known-bad send per seed. No cron, no systemd timer, no second system to operate.

SSE for live results

The user's browser holds a server-sent events connection to /api/tests/:id/stream for the duration of the run. As each seed reports placement, a row is written to seed_results and the SSE handler pushes an update. SSE is perfect for this: one-way, survives proxies, trivially resumable if the connection drops mid-run.

We considered WebSockets and rejected them — bidirectional is not needed, and SSE is cheaper on the server.

One server, three processes

In production the box runs three Node processes under PM2: the Next.js web app, the BullMQ worker pool, and the Puppeteer browser-queue worker. Postgres and Redis are local, served over the loopback interface. Total resident memory under full load is about 6GB.

Cost breakdown

Dedicated server: roughly $90/month for an 8-core / 32GB box.
Residential proxies for a few providers: $40/month.
Workspace / paid seed accounts: $150–200/month.
Domain, DNS, email-sending ESP for transactional: ~$30/month.
Object storage for screenshots: ~$5/month at current volume.

Total monthly run cost is a few hundred dollars. On 1,000 tests/day that is a small fraction of a cent per test, which is why we can keep the free tier unmetered.

Failure modes we hit (and fixed)

Chrome memory leak. Solved with the 50-job / 30-minute recycle policy described above.
Redis eviction on memory pressure. We were running Redis with default maxmemory-policy, which silently evicted queued jobs. Switched to noeviction and sized memory deliberately.
Postgres autovacuum freeze on the results table. Heavy insert traffic plus jsonb summaries produced a high bloat ratio. Partitioning by month and running a scheduled vacuum full on partitions older than 60 days sorted it out.
Provider rate limits. Gmail will throttle the Gmail API if you hit it too hard across a single project. We split across two Google Cloud projects and a token bucket in front of the client.

What will break at 10,000/day

We are honest about where the current design runs out:

Browser queue becomes a bottleneck. 6 concurrent Chrome instances cannot service 10,000 tests/day with 20+ providers unless we grow the queue, which means more RAM, which means a second server.
Postgres writes. At 10,000 tests/day the seed_results table adds ~200,000 rows a day. Partitioning covers it for a year, then we will want to push older partitions to cold storage.
Seed pool load. A single seed mailbox receiving 10,000 test messages a day on a consumer provider is a red flag. We would add a second seed per provider and round-robin.

None of these is a rewrite. Each is a routine scaling step. The current box comfortably handles 2,000–3,000 tests/day on a busy afternoon, which is plenty of headroom for the free tier.

Frequently asked questions

Why not serverless / Lambda for the workers?

Puppeteer with Chrome plus persistent session cookies does not fit Lambda's model well — cold starts are expensive and the browser pool logic becomes harder. A small, long-running server is the right shape for this workload.

Do you use any managed services?

Very few. Transactional email is outsourced; object storage is S3-compatible. Postgres and Redis are local. Managing them is a day a quarter and saves us roughly $200/month. At scale this tradeoff flips.

What monitoring do you run?

Prometheus scrapes node_exporter, postgres_exporter, a custom bullmq exporter, and the Next.js app's /metrics endpoint. Grafana dashboards cover queue depth, SSE connection count, browser pool saturation, and per-provider canary results. Alerts go to Slack and — for sev1 — PagerDuty.

Is the code open source?

Parts are, parts are not. The MCP server, the CLI client and the deliverability scoring library are open source. The seed-pool control plane and the canary orchestration are not, because disclosing them would make it easier to poison our seeds.

How we process 1,000 tests a day on one server