The most common root cause of a "why is our mail in spam" incident is not spam filters getting smarter. It is a DNS record that someone changed on Tuesday and broke silently. A new ESP flag rotated the DKIM selector; the SPF record got too long; the MX TTL was dropped to 60 and now half the resolvers are returning NXDOMAIN during propagation.
Monitoring DNS for email is cheap and catches these incidents hours or days before they show up as placement drops. Here is what to monitor and how.
MX, SPF (TXT at apex), DKIM (TXT at each active selector), DMARC (TXT at _dmarc), reverse DNS (PTR), and optionally BIMI (TXT at default._bimi) and MTA-STS. In that order of importance.
MX records
If MX is broken, inbound stops. For an outbound-only domain this does not matter; for anything that receives mail (including bounces and DMARC reports), MX missing is an incident.
dig +short MX acme.io
10 inbound-smtp.eu-west-1.amazonaws.com.
20 inbound-smtp.us-east-1.amazonaws.com.Monitor for: no MX records returned, or records with priority zero pointing at an unreachable host.
SPF records
SPF is a TXT record at the domain apex that starts with v=spf1. Two common failure modes:
- Too many DNS lookups. SPF has a hard limit of 10 DNS lookups per evaluation. Each
include:counts. Once you add your fourth ESP (SendGrid, Mailgun, Zendesk, HubSpot), you blow through the limit and SPF evaluates to PermError. Recipients may accept or reject depending on their policy, but your deliverability takes a hit regardless. - Accidental double record. Exactly one TXT record starting with
v=spf1is permitted per domain. Two records = PermError.
# Check SPF + count lookups
dig +short TXT acme.io | grep 'v=spf1'
# Counts DNS-lookup mechanisms (include, a, mx, ptr, exists, redirect)
dig +short TXT acme.io | grep 'v=spf1' | grep -oE '(include:|a:|mx:|ptr|exists:|redirect=)' | wc -lDKIM records
DKIM is a TXT record at <selector>._domainkey.<domain>. The public key lives there; the private key signs outbound mail at your sender.
DKIM failures are insidious because the sender does not know. Your ESP happily signs with a key; recipients cannot validate against the DNS record; the mail still delivers (SPF may be enough); but DMARC alignment fails and placement degrades.
# Verify the DKIM public key for your selector
dig +short TXT google._domainkey.acme.io
dig +short TXT s1._domainkey.acme.io
# Quick sanity: is it a valid v=DKIM1 record?
dig +short TXT s1._domainkey.acme.io | grep -oE 'v=DKIM1[^"]*' | head -1Track every active selector. When you add a new selector (rotation, a new ESP), add it to monitoring the same day.
DMARC records
DMARC is a TXT record at _dmarc.domain.tld. It declares your policy (p=none|quarantine|reject) and reporting destinations (rua, ruf).
dig +short TXT _dmarc.acme.io
"v=DMARC1; p=reject; rua=mailto:dmarc@acme.io; adkim=s; aspf=s"What to monitor:
- Presence and validity (
v=DMARC1, validpvalue). - The
ruaaddress is still reachable and accepts DMARC reports. - Policy regression. If someone lowers you from
p=rejecttop=noneto debug something and forgets to restore, that is a silent security regression.
A single check script for all of the above
One Bash script, one line per domain, Nagios-compatible exit codes. Save as /usr/local/bin/check_email_dns.
#!/usr/bin/env bash
# check_email_dns - verify MX, SPF, DKIM, DMARC
# Usage: check_email_dns <domain> <dkim_selector1,dkim_selector2>
set -euo pipefail
DOMAIN="$1"
SELECTORS="${2:-default,google,s1}"
ERRORS=()
# MX
MX=$(dig +short +time=5 MX "$DOMAIN")
[[ -z "$MX" ]] && ERRORS+=("no MX")
# SPF
SPF=$(dig +short +time=5 TXT "$DOMAIN" | grep -c 'v=spf1' || true)
[[ "$SPF" -eq 0 ]] && ERRORS+=("no SPF")
[[ "$SPF" -gt 1 ]] && ERRORS+=("multiple SPF records")
# Count SPF DNS lookups (rough)
LOOKUPS=$(dig +short TXT "$DOMAIN" | grep 'v=spf1' | grep -oE '(include:|a:|mx:|ptr|exists:|redirect=)' | wc -l || true)
[[ "$LOOKUPS" -gt 10 ]] && ERRORS+=("SPF lookups=$LOOKUPS (>10)")
# DKIM (any one of the listed selectors must resolve)
IFS=',' read -r -a SELECTOR_LIST <<< "$SELECTORS"
DKIM_OK=0
for sel in "${SELECTOR_LIST[@]}"; do
if dig +short TXT "${sel}._domainkey.${DOMAIN}" | grep -q 'v=DKIM1'; then
DKIM_OK=1
break
fi
done
[[ "$DKIM_OK" -eq 0 ]] && ERRORS+=("no valid DKIM on [$SELECTORS]")
# DMARC
DMARC=$(dig +short TXT "_dmarc.$DOMAIN" | grep 'v=DMARC1' || true)
[[ -z "$DMARC" ]] && ERRORS+=("no DMARC")
if [[ ${#ERRORS[@]} -eq 0 ]]; then
echo "OK - $DOMAIN: MX+SPF+DKIM+DMARC valid"
exit 0
else
echo "CRITICAL - $DOMAIN: ${ERRORS[*]}"
exit 2
fiWire it to cron, Nagios, Zabbix, or Prometheus the same way you would any other check:
# Check every 10 minutes
*/10 * * * * /usr/local/bin/check_email_dns acme.io google,s1 || curl -X POST "$SLACK_WEBHOOK" -d "{\"text\":\"DNS email check failed for acme.io\"}"If you run your own resolver (unbound, dnsmasq, enterprise DNS), monitor via that resolver. A broken cached record in your resolver will bite your application long before Google's 8.8.8.8 reflects the same problem. To catch external regressions, run the check against both your resolver and a public one.
Less obvious records worth monitoring
PTR (reverse DNS)
For dedicated IPs, the PTR record should point back to a hostname that itself has an A record pointing back to the IP (forward-confirmed reverse DNS). Outlook is particularly strict here.
# Check PTR for a sending IP
dig +short -x 192.0.2.42
mail.acme.io.
# Verify the hostname resolves back
dig +short A mail.acme.io.
192.0.2.42MTA-STS
If you publish MTA-STS, monitor that the policy file at https://mta-sts.<domain>/.well-known/mta-sts.txt returns the expected version and that the HTTPS cert is not expiring.
BIMI
BIMI is a TXT at default._bimi pointing at an SVG logo (and usually a VMC cert). Monitor that the SVG URL returns 200 and the VMC has more than 30 days until expiry.
FAQ
How often should I check DNS?
What about domains we do not actively send from?
p=reject, attackers will spoof you. A non-sending domain's SPF should be v=spf1 -all.