Uptime for Email: Your Website Is Up but Email Is Down

Uptime monitoring is a solved problem. Pingdom in 2006, StatusCake and UptimeRobot in 2012, BetterStack and Cronitor now. You pay a few dollars a month, point it at your site, and page on-call when the green dot goes red.

Email has no equivalent. Your ESP dashboard shows "delivered" when the destination MTA accepts the message. "Delivered" does not mean inbox. "Delivered" includes spam folder. Your status page stays green while half your customers never see the password reset mail.

The core problem

Website uptime is about reachability. Email "uptime" is about placement. The two require completely different instrumentation. A 200 OK on /health tells you nothing about whether Gmail trusts your domain today.

What does "email up" even mean?

Four things have to hold for a recipient to see your mail in their inbox:

Your SMTP outbound works. Your application server (or your ESP) can connect to the internet and deliver to destination MTAs.
Authentication passes. SPF, DKIM, and DMARC all evaluate to pass at the recipient's MTA.
The MTA accepts the message. Not rejected for bad reputation, bad content, volume anomaly, or trigger-word filters.
The mail lands in the inbox, not the spam/junk/promotions folder.

Standard monitoring catches (1) and sometimes (3). It never catches (2) as a real-time signal, and it never, ever catches (4). Your ESP dashboard also only catches (3).

The minimum viable email-up system

Two components: a periodic synthetic sender, and a placement probe.

Synthetic sender. A cron or scheduled worker that sends a real message through your real production sending path every 15 minutes (or hourly for low-volume senders). Not a health check, a real message.
Placement probe. A seed mailbox panel at Gmail, Outlook, Yahoo, and so on. Something reads those mailboxes and reports inbox/spam/missing per provider.

The probe is what you would otherwise build yourself. It is a lot of infrastructure — 20+ mailboxes, provider auth, scraping without breaking terms, rotation — which is why most teams either buy or skip it. Skipping it means flying blind.

A $0 version you can run today

Use the Inbox Check free API as the placement probe. One cron, one shell script, 15 minutes of setup.

#!/usr/bin/env bash
# /usr/local/bin/email_uptime_check.sh
set -euo pipefail

API="https://check.live-direct-marketing.online/api"
KEY="${INBOX_CHECK_API_KEY}"
DOMAIN="mail.acme.io"

# Create a synthetic test via our real sending infrastructure
RESP=$(curl -s -X POST "$API/check" \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"senderDomain\": \"$DOMAIN\",
    \"subject\": \"Uptime check\",
    \"html\": \"<p>ok</p>\"
  }")

TEST_ID=$(echo "$RESP" | jq -r '.id')

# Poll for result (tests take ~2-5 minutes)
for i in {1..30}; do
  STATUS=$(curl -s "$API/check/$TEST_ID" -H "Authorization: Bearer $KEY" | jq -r '.status')
  [[ "$STATUS" == "complete" ]] && break
  sleep 10
done

RATE=$(curl -s "$API/check/$TEST_ID" -H "Authorization: Bearer $KEY" | jq -r '.summary.inboxRate')

# Write to a status file your status page can read
echo "{\"domain\":\"$DOMAIN\",\"rate\":$RATE,\"ts\":\"$(date -Iseconds)\"}" \
  > /var/www/status/email-latest.json

# /etc/cron.d/email_uptime
*/15 * * * * monitor /usr/local/bin/email_uptime_check.sh

Surfacing it on your status page

Status-page tools (Statuspage.io, Instatus, Cachet) accept external components via webhooks or API. Post the rate as a metric, and configure a component status rule:

Operational — inbox rate 90%+
Degraded performance — inbox rate 75–89%
Partial outage — inbox rate 50–74%
Major outage — inbox rate below 50%

Give the component a distinct name — "Transactional email placement," not just "Email" — so users understand when "email delivered but in spam" is the real failure mode.

The honest caveat

Inbox placement is a lagging indicator. A domain can look fine at 10:00 and be degraded at 12:00 without any step-change; the reputation at Gmail and Outlook moves on the order of hours, not seconds. Set incident thresholds conservatively (30+ minutes of sustained low rate), or you will alert-fatigue yourself out of taking real signals seriously.

What an incident actually looks like

When email placement drops hard, here is what you will typically find in order of decreasing frequency:

DNS record regression. SPF, DKIM, or DMARC was changed recently by somebody who did not know what they were doing.
Content regression. A new template has trigger words, bad HTML, or a suspicious redirect.
IP reputation hit. Shared IP pool took on a noisy neighbour; dedicated IP got listed somewhere.
Sending volume anomaly. Batch job pumped 10x normal volume overnight; the MTAs throttled.
Recipient feedback loop. A wave of spam complaints from a recent campaign tanked reputation for downstream mail.

A runbook with these five items, in order, answers 90% of incidents.

FAQ

Can I just check if Postfix is running?

Process-level checks tell you the MTA is alive. They tell you nothing about whether Gmail is accepting your mail, whether it is landing in the inbox, or whether your DKIM key matches DNS. Useful, necessary, insufficient.

Does this work for transactional mail specifically?

Yes — and transactional is where it matters most. Password resets, 2FA codes, order confirmations all need near-100% inbox rate. See Password reset emails not delivered for the transactional playbook.

What is a realistic uptime number for email?

95%+ inbox rate is achievable and sustainable for a warmed-up transactional domain. 90% is the bar below which you have a problem. Cold outreach runs lower (65-75%) by nature; measure it differently.

Should this show on our public status page?

Yes, if the business depends on email (SaaS, e-commerce, fintech). Customers appreciate transparency, and it gives you a reference point when a support ticket says 'your mail is broken.' You can point at the component state.

Your website is up. Your email is down.