Localhost Email Testing vs Production Inbox Test: Two Different Things

Every developer knows the "test emails in dev" pattern. Spin up Mailhog in Docker, point your app's SMTP at localhost:1025, render the template, eyeball the result. That's staging-level testing. It's useful and it misses 90% of the problems real users hit.

This article clarifies what local email catchers do well, what they can't do, and where production inbox-placement testing picks up.

What Mailhog / Mailtrap / MailCatcher do

Local email catchers are SMTP sinks. They accept mail on localhost, store it in a web UI, and never deliver to a real recipient. Their job:

Catch broken templates. Missing variable? Weird escaping? Busted HTML? Visible immediately.
Prevent "oops sent real mail from dev" incidents. Staging mail doesn't escape to real inboxes.
Render headers for inspection. You can see what your app is producing.
Work offline. No external API calls, fine on a plane.

Set up in thirty seconds:

# docker-compose.dev.yml
services:
  mailhog:
    image: mailhog/mailhog:latest
    ports:
      - "1025:1025"   # SMTP
      - "8025:8025"   # web UI

  app:
    build: .
    environment:
      SMTP_HOST: mailhog
      SMTP_PORT: 1025
      # No auth, no TLS - local only

Run docker compose up, point your app at mailhog:1025, visit http://localhost:8025 and every outbound message lands in the inbox view.

What local catchers can't do

Mailhog accepts any mail, from any sender, with any headers, no auth check, no content scanning, no provider-specific filters. That's what makes it useful for template dev — and useless for deliverability testing. None of these are visible in Mailhog:

SPF / DKIM / DMARC results. Mailhog doesn't validate. Real Gmail does.
Inbox vs Spam vs Promotions. Mailhog has one folder. Real providers have filters that decide where your mail lands.
Content scoring. SpamAssassin / Rspamd / Microsoft SmartScreen don't run in Mailhog.
Link reputation. Gmail flags links from burned domains; Mailhog doesn't know.
Provider-specific rendering. Outlook strips CSS classes; Gmail strips background images; Apple Mail renders dark-mode differently. Mailhog shows you raw HTML.
Image proxy / click wrapping. Gmail proxies images; Outlook rewrites links. Mailhog does neither.

Green in Mailhog != delivered in production

A template that looks perfect in Mailhog can still land in Spam at Gmail because of sending-domain reputation, failed DMARC alignment, a single spammy phrase, or a link to a burned domain. Mailhog never sees any of that.

What production inbox-placement testing does

An inbox-placement test sends your actual mail, from your actual production sender, to a panel of real seed mailboxes distributed across providers. It then reads where each provider put the mail:

Inbox at Gmail / Outlook / Yahoo / Mail.ru / GMX / ProtonMail / etc.
Spam / Junk folder.
Promotions / Updates tab at Gmail.
Rejected at SMTP time (provider refused at 250 OK phase).
Silent drop (no trace, happens at Microsoft when reputation is bad).

It also shows you the Authentication-Results header each provider added, so you know whether SPF, DKIM, DMARC passed from their perspective — not from mail-tester.com's or your DMARC report's perspective.

Local SMTP catcher (Mailhog/Mailtrap) vs production inbox-placement test
Capability	Local catcher	Production placement test
Catches broken templates / variables	Yes	Yes (overkill for this)
Validates SPF / DKIM / DMARC	No	Yes — per provider Authentication-Results
Folder placement (Inbox / Spam / Promotions)	No (one folder)	Yes — per provider
Content scoring (SpamAssassin/Rspamd/SmartScreen)	No	Yes
Provider-specific rendering (Outlook, Gmail, Apple)	No (raw HTML)	Yes — real client render
Image proxy / link rewriting	No	Yes — visible
Works offline	Yes	No — external
Cost / speed	Free, instant	External call, slower
Right tool for	Dev iteration, PR/CI snapshots	Pre-launch + weekly production check

The two-layer workflow

Layer 1 — Dev / staging (local catcher)

Every branch / PR runs against Mailhog or Mailtrap.
CI snapshots the rendered HTML per template.
Visual regression on the snapshot (Playwright, Percy).
Never send real mail outside the staging environment.

Layer 2 — Pre-production / production (inbox placement)

Before every template launch: send to 20+ seed addresses.
Read placement + auth per provider.
Store results per template + per sending domain; diff over time.
Weekly check on your most-sent templates in production.
Alert on drops (Inbox % below a threshold).

CI snippet: both layers

# .github/workflows/email.yml
name: email-check
on: [pull_request]

jobs:
  template-dev:
    # Layer 1 - always on every PR
    runs-on: ubuntu-latest
    services:
      mailhog:
        image: mailhog/mailhog
        ports: ['1025:1025', '8025:8025']
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: SMTP_HOST=localhost SMTP_PORT=1025 npm run test:email

  placement-check:
    # Layer 2 - only on main merges, real external call
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Send seeds + fetch placement
        run: |
          node scripts/send-to-seeds.mjs --template welcome
          sleep 60
          curl -s "https://check.live-direct-marketing.online/api/placement?id=${RUN_ID}" \
            | tee placement.json
          node scripts/assert-placement.mjs placement.json --min-inbox 0.85

Common mistakes

Using Mailhog as "did the email work?" in staging. It only shows what your app produced, not what a real provider will do with it.
Only testing placement in production after a complaint. Catching it when customers complain is too late; that's already measurable revenue lost.
Seed-testing with a staging/dev sending domain. Staging mail fromdev.yourdomain.com with a throwaway SPF is not representative of production. Test with the production sender.
Running placement tests once and calling it done. Sending-domain reputation drifts. A test from last month is stale; run weekly in production.

Free production placement test

Inbox Check gives you 20+ seed addresses across the major providers, no signup. Send from your production sender, read placement + auth headers per provider. Run it before every template launch and weekly on your highest-volume flows.

When to use which

Writing a new template: Mailhog for iteration, then seed-test before ship.
Debugging "variable X is empty": Mailhog.
Debugging "mail goes to Spam at Gmail": Seed test + Gmail Postmaster Tools. Mailhog is useless here.
Visual regression on 40 templates: Mailhog + Playwright snapshot.
Verifying DKIM alignment after an ESP change: Seed test. Mailhog can't validate DKIM at all.
Pre-campaign preflight: Seed test on a warmed sending domain with the final copy.

Frequently asked questions

Can I skip Mailhog entirely if I do production seed tests?

No — seed tests are slow and external. You don't want every PR to fire real mail to real inboxes. Mailhog is free, instant, offline. Seed tests are for pre-prod and production. Different jobs.

Is Mailtrap.io different from Mailhog?

Mailtrap is hosted Mailhog-equivalent: SMTP sink, web UI, free tier, great for team staging. Same limitations: it doesn't measure deliverability. Mailtrap does have a separate paid "Email Testing" product that runs SpamAssassin and shows provider rendering; still doesn't send to real inboxes, so not a substitute for placement tests.

What about MailCrab, Mailcrab, or smtp4dev?

All the same category — local SMTP sinks with web UIs. Pick whichever deploys easiest in your stack (Mailhog for Docker, smtp4dev for .NET, MailCrab for a nicer UI). Equivalent deliverability blindness.

How often should production seed tests run?

Before every template launch (one-shot). Weekly on your highest-volume templates (scheduled). After any DNS, ESP, or sending-domain change (ad-hoc). Daily during the first 30 days of IP warmup.

Localhost email testing vs production inbox testing — two different jobs