Automate Email Deliverability Checks Before Every Deployment

Nobody would deploy a SQL migration that wasn't linted and run against staging. But a marketing team pushes a new MJML template straight to production and hits Send on a million-recipient campaign. The blast radius is the same; the feedback loop is days instead of seconds. This article is a recipe for fixing that.

The goal

Make it impossible to deploy an email template without an inbox placement test passing first. Transactional: 98%. Marketing: 85%. Cold: 70%. The number is tunable; the gate is not.

The problem: email templates silently drift

Three things cause most production deliverability incidents:

Template drift. A designer adds a tracking pixel, an image CDN changes URLs, a variable interpolation produces raw HTML entities. None of it fails a test because nobody wrote one.
DNS changes. A CDN migration rotates SPF includes, a new ESP is added without updating DMARC alignment, a developer tightens p=reject without verifying DKIM coverage.
Shared-IP rotation. Your ESP adds a bad neighbour to the pool and your Gmail placement drops 40% overnight. You find out when opens collapse.

A pre-deploy placement test catches the first two immediately and surfaces the third within a day. It won't prevent the neighbour problem, but it tells you to switch pool or escalate with the ESP.

The ideal pipeline shape

Four stages, each blocking the next:

lint    render    placement    deploy
 |-------|-----------|----------|
 |       |           |          |
 MJML    Actual HTML Inbox rate Production blast
 schema  from vars   on seeds   (or staged ramp)

Lint catches MJML syntax errors and invalid CSS. Render substitutes Handlebars / Liquid variables against a fixture recipient, producing the real HTML a user would receive. Placement runs that rendered HTML through 20+ seed mailboxes and returns an aggregate inbox rate. Deploy happens only if placement meets threshold.

Implementing the placement stage

A bash script calling the API. Drop this at scripts/deliverability-gate.sh and call it from your deploy pipeline.

#!/usr/bin/env bash
# scripts/deliverability-gate.sh
# Usage: ./deliverability-gate.sh <rendered.html> <min-inbox-rate>
set -euo pipefail

HTML_FILE="${1:?rendered HTML file required}"
THRESHOLD="${2:?min inbox rate (e.g. 0.85) required}"
BASE="https://check.live-direct-marketing.online"
KEY="${INBOX_CHECK_API_KEY:?set INBOX_CHECK_API_KEY}"

echo "Creating placement test..."
RESP=$(jq -Rs --arg s "deploy-gate $(git rev-parse --short HEAD)" \
  '{senderDomain:"news.example.com",subject:$s,html:.,tags:["pre-deploy"]}' \
  < "$HTML_FILE" \
  | curl -sS --fail -X POST "$BASE/api/tests" \
      -H "Authorization: Bearer $KEY" \
      -H "Content-Type: application/json" -d @-)

TEST_ID=$(echo "$RESP" | jq -r .id)
echo "Test id: $TEST_ID"

# Poll up to 12 minutes (48 x 15s)
for i in $(seq 1 48); do
  STATUS=$(curl -sS "$BASE/api/tests/$TEST_ID" \
            -H "Authorization: Bearer $KEY")
  STATE=$(echo "$STATUS" | jq -r .status)
  if [ "$STATE" = "complete" ]; then break; fi
  if [ "$STATE" = "failed"   ]; then
    echo "Test failed server-side: $(echo "$STATUS" | jq -r .error.message)"
    exit 2
  fi
  sleep 15
done

RATE=$(echo "$STATUS" | jq -r .summary.inboxRate)
SPAM=$(echo "$STATUS" | jq -r .summary.spamCount)
TOTAL=$(echo "$STATUS" | jq -r .summary.total)

printf "Inbox rate: %.2f (threshold %.2f)  Spam: %d/%d\n" \
  "$RATE" "$THRESHOLD" "$SPAM" "$TOTAL"

# bash floating compare via awk
awk -v r="$RATE" -v t="$THRESHOLD" 'BEGIN { exit (r < t) }' \
  || { echo "GATE FAILED"; exit 1; }

echo "Gate passed. Safe to deploy."

Call it from your deploy script with the threshold appropriate to the template type:

# In your deploy pipeline
node scripts/render-email.js templates/receipt.mjml > /tmp/receipt.html
./scripts/deliverability-gate.sh /tmp/receipt.html 0.95   # transactional

node scripts/render-email.js templates/weekly.mjml > /tmp/weekly.html
./scripts/deliverability-gate.sh /tmp/weekly.html 0.85    # marketing

Thresholds that make sense

Choose thresholds based on consequence of failure, not aspiration:

Transactional (98%). A password reset in Spam is a support ticket and a potential churn event. Tight gate.
Marketing (85%). Losing 15% of a send is acceptable; losing 50% is a wasted campaign. Medium gate.
Cold outreach (70%). Cold is structurally lossy; anything above 70% is a strong day. Loose gate, but still a gate.

Handling flakiness

Two factors produce variance between runs on identical content:

Seed mailbox rotation. We rotate a handful of seeds every week to avoid reputation contamination. A replaced seed can shift results by 1–2 slots.
ISP-side noise. Gmail's real-time scoring has a small stochastic component. Identical content run twice back-to-back can differ by up to 5%.

Two defences:

Retry once. If the first run is within 5% of threshold, run a second and average. Cheap insurance.
Use a buffer. If your real tolerance is 85%, gate at 82%. You lose nothing in signal; you gain resistance to noise.

Don\u2019t gate on a single sample

If you're about to block a production deploy on a 1-point drop from 85% to 84%, re-run first. Single samples of a stochastic process are noisy by definition. Two-sample average is the cheap way to dramatically lower false positives.

Alerts when deploys get blocked

A blocked deploy is a failed pipeline run; your CI already alerts on those. What you want on top of that is context — which template, which threshold, which providers flagged as Spam. Post the full summary to Slack in the same step that fails the job:

if ! ./scripts/deliverability-gate.sh /tmp/weekly.html 0.85; then
  curl -sS -X POST "$SLACK_WEBHOOK" \
    -H "Content-Type: application/json" \
    -d "$(jq -n --arg t "weekly.mjml" --arg r "$RATE" \
      '{text:("Deploy blocked: "+$t+" inbox rate "+$r)}')"
  exit 1
fi

Deploy only on template changes

Don't run the gate on every commit. Use path filters or content hashes so unchanged templates are skipped — saving credits and shaving minutes off normal deploys:

HASH=$(sha256sum templates/*.mjml | sha256sum | cut -d' ' -f1)
if grep -q "$HASH" .last-delivery-hash 2>/dev/null; then
  echo "Templates unchanged, skipping placement test"
  exit 0
fi
# ...run gate...
echo "$HASH" > .last-delivery-hash

Combine with staged rollouts

For marketing sends, a placement-gate at deploy time catches template bugs. A staged-rollout catches everything else:

Deploy template. Placement gate passes.
Send to 1% of the list. Wait 30 minutes.
Measure opens + bounces. If either is outside p95 of historical, halt the rollout and page the owner.
Otherwise proceed to 100%.

Staged rollout plus a deploy gate gives you two independent failure detectors. You'll still get surprised occasionally — but not by the same thing twice.

Frequently asked questions

Where does this fit relative to a traditional email QA checklist?

Replaces the half of the checklist that machines can check. Humans still need to review copy, tone, imagery. Machines check auth, rendering, spam score, and per-provider placement — which humans are terrible at anyway.

Do I need a staging sender domain, or can I test with production?

Staging domain strongly preferred. Every CI run to production slightly stresses reputation and pollutes your engagement data. A subdomain like ci.yourbrand.com with its own SPF/DKIM/DMARC is 30 minutes to set up and worth it.

What about transactional emails triggered from production code paths?

Render the template with a fixture payload (real-looking but non-PII), gate on that. The point is to catch template drift; the actual variable values matter less than the structural integrity of the HTML and headers.

How is this different from SpamAssassin in CI?

SpamAssassin scores a message in isolation. A placement test delivers the message to 20+ real mailboxes across Gmail, Outlook, Yahoo, Mail.ru and more, then reports actual folder outcomes. SpamAssassin catches ~60% of what placement tests catch, and nothing about authentication or provider-specific filtering.

Automate deliverability checks before every deploy