If you rely on an external API for pre-deploy checks or overnight regressions, you need to know when it breaks. You need to know before your cron does. You need the explanation afterwards. Most deliverability vendors publish glossy SLA numbers without a matching incident history. We publish both — the status page link, the raw incidents, the postmortems, the fixes.
Public status page: status.live-direct-marketing.online. RSS + JSON feeds available. Webhook subscriptions for incident.created and incident.resolved.
What the status page monitors
- API endpoints —
POST /api/tests,GET /api/tests/:id, SSE stream,POST /api/webhooks. 60-second synthetic checks. - Seed mailbox reachability — per-provider health (Gmail, Outlook, Yahoo, Mail.ru, Yandex, GMX, Orange, ProtonMail, iCloud, Zoho, Fastmail, HEY, and more). A provider with degraded status means we can still run tests but placement verdicts for that provider are temporarily unreliable.
- SpamAssassin + Rspamd workers — score endpoints.
- Share link + dashboard — the public-facing web UI.
- MCP server —
npx ldm-inbox-check-mcpruntime health.
How we classify incidents
- Investigating — metrics anomaly seen, cause unknown. Posted within 5 minutes of alerting.
- Identified — root cause located, fix in progress.
- Monitoring — fix deployed, watching for regression.
- Resolved — 30 minutes of clean metrics; incident closed.
- Postmortem posted — for any incident over 15 minutes, within 72 hours.
Incidents we have had
Representative selection (not all incidents). The full list lives on the status page with per-minute timelines.
April 2026 — Mail.ru IMAP outage (external)
Mail.ru's IMAP endpoint returned intermittent timeouts for ~40 minutes. Our seed mailbox pool could not fetch placement results for that provider, so Mail.ru verdicts in tests during the window were marked inconclusive rather than guessed. API availability was unaffected; only Mail.ru rows in result sets were gapped.
March 2026 — DNS provider regional outage
Our primary authoritative DNS provider had a 12-minute regional outage. Because we run a secondary on a different provider with shared NS records, external lookups resolved within a few seconds of the primary failure. Test throughput was flat; API error rate briefly spiked to 0.3%. Postmortem triggered an audit of every hard-coded DNS dependency in our stack.
February 2026 — Rspamd worker memory leak
A Rspamd process in our worker pool started leaking memory after a rule-update push. The worker's supervisor restarted it on OOM, but the loop slowed Rspamd scoring by 20–30 seconds for about 25 minutes. Placement verdicts were delivered; SpamAssassin scores were not delayed. Fix: pin rule bundles, add a canary worker ahead of fleet rollout.
January 2026 — SSE connection drops for long-running tests
After a load-balancer upgrade, SSE connections over 4 minutes dropped intermittently. Because our SDK transparently falls back to polling on SSE disconnect, user-facing effect was minimal — but alerts fired and we noticed. Fix: idle timeout bumped, reconnect logic documented in SDK README.
How we write postmortems
- Blameless. No names. The artifact is the system, not the person who pushed the config.
- Timeline to the minute. Detection, comms, fix, resolved. Plus a "when we should have noticed" note when detection lagged.
- Root cause before action items. Five-why is the baseline. Action items that solve the contributing factor, not just the immediate trigger.
- Public. Every postmortem is a markdown page on the docs site. Linked from the status page incident entry.
Most SaaS vendors would rather bury an incident than explain it. We think the opposite. The vendors we trust are the ones who told us what broke and how they fixed it. We try to be that vendor — if you are evaluating us against GlockApps or Mail-Tester, this page is part of the answer.
How to subscribe
- RSS — point any feed reader at
/status.rss. - Email — one email per incident; resolved notice when we close.
- Webhook — register your endpoint; we POST the incident payload on
incident.created,incident.updated, andincident.resolved. - Slack — Slackbot integration via the standard status-page incoming webhook recipe.