Nobody would deploy a SQL migration that wasn't linted and run against staging. But a marketing team pushes a new MJML template straight to production and hits Send on a million-recipient campaign. The blast radius is the same; the feedback loop is days instead of seconds. This article is a recipe for fixing that.
Make it impossible to deploy an email template without an inbox placement test passing first. Transactional: 98%. Marketing: 85%. Cold: 70%. The number is tunable; the gate is not.
The problem: email templates silently drift
Three things cause most production deliverability incidents:
- Template drift. A designer adds a tracking pixel, an image CDN changes URLs, a variable interpolation produces raw HTML entities. None of it fails a test because nobody wrote one.
- DNS changes. A CDN migration rotates SPF includes, a new ESP is added without updating DMARC alignment, a developer tightens
p=rejectwithout verifying DKIM coverage. - Shared-IP rotation. Your ESP adds a bad neighbour to the pool and your Gmail placement drops 40% overnight. You find out when opens collapse.
A pre-deploy placement test catches the first two immediately and surfaces the third within a day. It won't prevent the neighbour problem, but it tells you to switch pool or escalate with the ESP.
The ideal pipeline shape
Four stages, each blocking the next:
lint render placement deploy
|-------|-----------|----------|
| | | |
MJML Actual HTML Inbox rate Production blast
schema from vars on seeds (or staged ramp)Lint catches MJML syntax errors and invalid CSS. Render substitutes Handlebars / Liquid variables against a fixture recipient, producing the real HTML a user would receive. Placement runs that rendered HTML through 20+ seed mailboxes and returns an aggregate inbox rate. Deploy happens only if placement meets threshold.
Implementing the placement stage
A bash script calling the API. Drop this at scripts/deliverability-gate.sh and call it from your deploy pipeline.
#!/usr/bin/env bash
# scripts/deliverability-gate.sh
# Usage: ./deliverability-gate.sh <rendered.html> <min-inbox-rate>
set -euo pipefail
HTML_FILE="${1:?rendered HTML file required}"
THRESHOLD="${2:?min inbox rate (e.g. 0.85) required}"
BASE="https://check.live-direct-marketing.online"
KEY="${INBOX_CHECK_API_KEY:?set INBOX_CHECK_API_KEY}"
echo "Creating placement test..."
RESP=$(jq -Rs --arg s "deploy-gate $(git rev-parse --short HEAD)" \
'{senderDomain:"news.example.com",subject:$s,html:.,tags:["pre-deploy"]}' \
< "$HTML_FILE" \
| curl -sS --fail -X POST "$BASE/api/tests" \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" -d @-)
TEST_ID=$(echo "$RESP" | jq -r .id)
echo "Test id: $TEST_ID"
# Poll up to 12 minutes (48 x 15s)
for i in $(seq 1 48); do
STATUS=$(curl -sS "$BASE/api/tests/$TEST_ID" \
-H "Authorization: Bearer $KEY")
STATE=$(echo "$STATUS" | jq -r .status)
if [ "$STATE" = "complete" ]; then break; fi
if [ "$STATE" = "failed" ]; then
echo "Test failed server-side: $(echo "$STATUS" | jq -r .error.message)"
exit 2
fi
sleep 15
done
RATE=$(echo "$STATUS" | jq -r .summary.inboxRate)
SPAM=$(echo "$STATUS" | jq -r .summary.spamCount)
TOTAL=$(echo "$STATUS" | jq -r .summary.total)
printf "Inbox rate: %.2f (threshold %.2f) Spam: %d/%d\n" \
"$RATE" "$THRESHOLD" "$SPAM" "$TOTAL"
# bash floating compare via awk
awk -v r="$RATE" -v t="$THRESHOLD" 'BEGIN { exit (r < t) }' \
|| { echo "GATE FAILED"; exit 1; }
echo "Gate passed. Safe to deploy."Call it from your deploy script with the threshold appropriate to the template type:
# In your deploy pipeline
node scripts/render-email.js templates/receipt.mjml > /tmp/receipt.html
./scripts/deliverability-gate.sh /tmp/receipt.html 0.95 # transactional
node scripts/render-email.js templates/weekly.mjml > /tmp/weekly.html
./scripts/deliverability-gate.sh /tmp/weekly.html 0.85 # marketingThresholds that make sense
Choose thresholds based on consequence of failure, not aspiration:
- Transactional (98%). A password reset in Spam is a support ticket and a potential churn event. Tight gate.
- Marketing (85%). Losing 15% of a send is acceptable; losing 50% is a wasted campaign. Medium gate.
- Cold outreach (70%). Cold is structurally lossy; anything above 70% is a strong day. Loose gate, but still a gate.
Handling flakiness
Two factors produce variance between runs on identical content:
- Seed mailbox rotation. We rotate a handful of seeds every week to avoid reputation contamination. A replaced seed can shift results by 1–2 slots.
- ISP-side noise. Gmail's real-time scoring has a small stochastic component. Identical content run twice back-to-back can differ by up to 5%.
Two defences:
- Retry once. If the first run is within 5% of threshold, run a second and average. Cheap insurance.
- Use a buffer. If your real tolerance is 85%, gate at 82%. You lose nothing in signal; you gain resistance to noise.
If you're about to block a production deploy on a 1-point drop from 85% to 84%, re-run first. Single samples of a stochastic process are noisy by definition. Two-sample average is the cheap way to dramatically lower false positives.
Alerts when deploys get blocked
A blocked deploy is a failed pipeline run; your CI already alerts on those. What you want on top of that is context — which template, which threshold, which providers flagged as Spam. Post the full summary to Slack in the same step that fails the job:
if ! ./scripts/deliverability-gate.sh /tmp/weekly.html 0.85; then
curl -sS -X POST "$SLACK_WEBHOOK" \
-H "Content-Type: application/json" \
-d "$(jq -n --arg t "weekly.mjml" --arg r "$RATE" \
'{text:("Deploy blocked: "+$t+" inbox rate "+$r)}')"
exit 1
fiDeploy only on template changes
Don't run the gate on every commit. Use path filters or content hashes so unchanged templates are skipped — saving credits and shaving minutes off normal deploys:
HASH=$(sha256sum templates/*.mjml | sha256sum | cut -d' ' -f1)
if grep -q "$HASH" .last-delivery-hash 2>/dev/null; then
echo "Templates unchanged, skipping placement test"
exit 0
fi
# ...run gate...
echo "$HASH" > .last-delivery-hashCombine with staged rollouts
For marketing sends, a placement-gate at deploy time catches template bugs. A staged-rollout catches everything else:
- Deploy template. Placement gate passes.
- Send to 1% of the list. Wait 30 minutes.
- Measure opens + bounces. If either is outside p95 of historical, halt the rollout and page the owner.
- Otherwise proceed to 100%.
Staged rollout plus a deploy gate gives you two independent failure detectors. You'll still get surprised occasionally — but not by the same thing twice.