DNS Records Your Email Depends On (and How to Monitor Them)

The most common root cause of a "why is our mail in spam" incident is not spam filters getting smarter. It is a DNS record that someone changed on Tuesday and broke silently. A new ESP flag rotated the DKIM selector; the SPF record got too long; the MX TTL was dropped to 60 and now half the resolvers are returning NXDOMAIN during propagation.

Monitoring DNS for email is cheap and catches these incidents hours or days before they show up as placement drops. Here is what to monitor and how.

The records that matter

MX, SPF (TXT at apex), DKIM (TXT at each active selector), DMARC (TXT at _dmarc), reverse DNS (PTR), and optionally BIMI (TXT at default._bimi) and MTA-STS. In that order of importance.

MX records

If MX is broken, inbound stops. For an outbound-only domain this does not matter; for anything that receives mail (including bounces and DMARC reports), MX missing is an incident.

dig +short MX acme.io
10 inbound-smtp.eu-west-1.amazonaws.com.
20 inbound-smtp.us-east-1.amazonaws.com.

Monitor for: no MX records returned, or records with priority zero pointing at an unreachable host.

SPF records

SPF is a TXT record at the domain apex that starts with v=spf1. Two common failure modes:

Too many DNS lookups. SPF has a hard limit of 10 DNS lookups per evaluation. Each include: counts. Once you add your fourth ESP (SendGrid, Mailgun, Zendesk, HubSpot), you blow through the limit and SPF evaluates to PermError. Recipients may accept or reject depending on their policy, but your deliverability takes a hit regardless.
Accidental double record. Exactly one TXT record starting with v=spf1 is permitted per domain. Two records = PermError.

# Check SPF + count lookups
dig +short TXT acme.io | grep 'v=spf1'

# Counts DNS-lookup mechanisms (include, a, mx, ptr, exists, redirect)
dig +short TXT acme.io | grep 'v=spf1' | grep -oE '(include:|a:|mx:|ptr|exists:|redirect=)' | wc -l

DKIM records

DKIM is a TXT record at <selector>._domainkey.<domain>. The public key lives there; the private key signs outbound mail at your sender.

DKIM failures are insidious because the sender does not know. Your ESP happily signs with a key; recipients cannot validate against the DNS record; the mail still delivers (SPF may be enough); but DMARC alignment fails and placement degrades.

# Verify the DKIM public key for your selector
dig +short TXT google._domainkey.acme.io
dig +short TXT s1._domainkey.acme.io

# Quick sanity: is it a valid v=DKIM1 record?
dig +short TXT s1._domainkey.acme.io | grep -oE 'v=DKIM1[^"]*' | head -1

Track every active selector. When you add a new selector (rotation, a new ESP), add it to monitoring the same day.

DMARC records

DMARC is a TXT record at _dmarc.domain.tld. It declares your policy (p=none|quarantine|reject) and reporting destinations (rua, ruf).

dig +short TXT _dmarc.acme.io
"v=DMARC1; p=reject; rua=mailto:dmarc@acme.io; adkim=s; aspf=s"

What to monitor:

Presence and validity (v=DMARC1, valid p value).
The rua address is still reachable and accepts DMARC reports.
Policy regression. If someone lowers you from p=rejectto p=none to debug something and forgets to restore, that is a silent security regression.

A single check script for all of the above

One Bash script, one line per domain, Nagios-compatible exit codes. Save as /usr/local/bin/check_email_dns.

#!/usr/bin/env bash
# check_email_dns - verify MX, SPF, DKIM, DMARC
# Usage: check_email_dns <domain> <dkim_selector1,dkim_selector2>
set -euo pipefail

DOMAIN="$1"
SELECTORS="${2:-default,google,s1}"
ERRORS=()

# MX
MX=$(dig +short +time=5 MX "$DOMAIN")
[[ -z "$MX" ]] && ERRORS+=("no MX")

# SPF
SPF=$(dig +short +time=5 TXT "$DOMAIN" | grep -c 'v=spf1' || true)
[[ "$SPF" -eq 0 ]] && ERRORS+=("no SPF")
[[ "$SPF" -gt 1 ]] && ERRORS+=("multiple SPF records")

# Count SPF DNS lookups (rough)
LOOKUPS=$(dig +short TXT "$DOMAIN" | grep 'v=spf1' | grep -oE '(include:|a:|mx:|ptr|exists:|redirect=)' | wc -l || true)
[[ "$LOOKUPS" -gt 10 ]] && ERRORS+=("SPF lookups=$LOOKUPS (>10)")

# DKIM (any one of the listed selectors must resolve)
IFS=',' read -r -a SELECTOR_LIST <<< "$SELECTORS"
DKIM_OK=0
for sel in "${SELECTOR_LIST[@]}"; do
  if dig +short TXT "${sel}._domainkey.${DOMAIN}" | grep -q 'v=DKIM1'; then
    DKIM_OK=1
    break
  fi
done
[[ "$DKIM_OK" -eq 0 ]] && ERRORS+=("no valid DKIM on [$SELECTORS]")

# DMARC
DMARC=$(dig +short TXT "_dmarc.$DOMAIN" | grep 'v=DMARC1' || true)
[[ -z "$DMARC" ]] && ERRORS+=("no DMARC")

if [[ ${#ERRORS[@]} -eq 0 ]]; then
  echo "OK - $DOMAIN: MX+SPF+DKIM+DMARC valid"
  exit 0
else
  echo "CRITICAL - $DOMAIN: ${ERRORS[*]}"
  exit 2
fi

Wire it to cron, Nagios, Zabbix, or Prometheus the same way you would any other check:

# Check every 10 minutes
*/10 * * * * /usr/local/bin/check_email_dns acme.io google,s1 || curl -X POST "$SLACK_WEBHOOK" -d "{\"text\":\"DNS email check failed for acme.io\"}"

Query your own recursive resolver, not 8.8.8.8

If you run your own resolver (unbound, dnsmasq, enterprise DNS), monitor via that resolver. A broken cached record in your resolver will bite your application long before Google's 8.8.8.8 reflects the same problem. To catch external regressions, run the check against both your resolver and a public one.

Less obvious records worth monitoring

PTR (reverse DNS)

For dedicated IPs, the PTR record should point back to a hostname that itself has an A record pointing back to the IP (forward-confirmed reverse DNS). Outlook is particularly strict here.

# Check PTR for a sending IP
dig +short -x 192.0.2.42
mail.acme.io.

# Verify the hostname resolves back
dig +short A mail.acme.io.
192.0.2.42

MTA-STS

If you publish MTA-STS, monitor that the policy file at https://mta-sts.<domain>/.well-known/mta-sts.txt returns the expected version and that the HTTPS cert is not expiring.

BIMI

BIMI is a TXT at default._bimi pointing at an SVG logo (and usually a VMC cert). Monitor that the SVG URL returns 200 and the VMC has more than 30 days until expiry.

FAQ

How often should I check DNS?

Every 10 minutes is plenty. DNS TTLs for email records typically sit between 5 minutes and 1 hour; a 10-minute check picks up any regression inside one TTL window.

What about domains we do not actively send from?

You should still monitor SPF and DMARC on non-sending domains (parked domains, legacy brands). Without a strict SPF and DMARC p=reject, attackers will spoof you. A non-sending domain's SPF should be v=spf1 -all.

Can I use a hosted tool instead of scripting?

Yes — DNSControl, Valimail, EasyDMARC and others monitor DNS for you. The DIY path described here is free and integrates with whatever monitoring you already run. Pick based on how many domains you manage.

What is the single most common DNS regression?

Adding a new ESP and forgetting to add its include: to SPF, or adding it and blowing past the 10-lookup limit. Both cause SPF to PermError, which DMARC then treats as a soft fail, which recipients then treat as a deliverability signal.

Monitor the DNS records your email depends on