Reliability6 min read

Status page, incident history, and honest postmortems

A deliverability tool that goes down silently is worse than no tool. Here is our public status page, the incidents we have had, and the postmortems behind each.

If you rely on an external API for pre-deploy checks or overnight regressions, you need to know when it breaks. You need to know before your cron does. You need the explanation afterwards. Most deliverability vendors publish glossy SLA numbers without a matching incident history. We publish both — the status page link, the raw incidents, the postmortems, the fixes.

Where to find it

Public status page: status.live-direct-marketing.online. RSS + JSON feeds available. Webhook subscriptions for incident.created and incident.resolved.

What the status page monitors

  • API endpointsPOST /api/tests, GET /api/tests/:id, SSE stream, POST /api/webhooks. 60-second synthetic checks.
  • Seed mailbox reachability — per-provider health (Gmail, Outlook, Yahoo, Mail.ru, Yandex, GMX, Orange, ProtonMail, iCloud, Zoho, Fastmail, HEY, and more). A provider with degraded status means we can still run tests but placement verdicts for that provider are temporarily unreliable.
  • SpamAssassin + Rspamd workers — score endpoints.
  • Share link + dashboard — the public-facing web UI.
  • MCP servernpx ldm-inbox-check-mcp runtime health.

How we classify incidents

  • Investigating — metrics anomaly seen, cause unknown. Posted within 5 minutes of alerting.
  • Identified — root cause located, fix in progress.
  • Monitoring — fix deployed, watching for regression.
  • Resolved — 30 minutes of clean metrics; incident closed.
  • Postmortem posted — for any incident over 15 minutes, within 72 hours.

Incidents we have had

Representative selection (not all incidents). The full list lives on the status page with per-minute timelines.

April 2026 — Mail.ru IMAP outage (external)

Mail.ru's IMAP endpoint returned intermittent timeouts for ~40 minutes. Our seed mailbox pool could not fetch placement results for that provider, so Mail.ru verdicts in tests during the window were marked inconclusive rather than guessed. API availability was unaffected; only Mail.ru rows in result sets were gapped.

March 2026 — DNS provider regional outage

Our primary authoritative DNS provider had a 12-minute regional outage. Because we run a secondary on a different provider with shared NS records, external lookups resolved within a few seconds of the primary failure. Test throughput was flat; API error rate briefly spiked to 0.3%. Postmortem triggered an audit of every hard-coded DNS dependency in our stack.

February 2026 — Rspamd worker memory leak

A Rspamd process in our worker pool started leaking memory after a rule-update push. The worker's supervisor restarted it on OOM, but the loop slowed Rspamd scoring by 20–30 seconds for about 25 minutes. Placement verdicts were delivered; SpamAssassin scores were not delayed. Fix: pin rule bundles, add a canary worker ahead of fleet rollout.

January 2026 — SSE connection drops for long-running tests

After a load-balancer upgrade, SSE connections over 4 minutes dropped intermittently. Because our SDK transparently falls back to polling on SSE disconnect, user-facing effect was minimal — but alerts fired and we noticed. Fix: idle timeout bumped, reconnect logic documented in SDK README.

How we write postmortems

  • Blameless. No names. The artifact is the system, not the person who pushed the config.
  • Timeline to the minute. Detection, comms, fix, resolved. Plus a "when we should have noticed" note when detection lagged.
  • Root cause before action items. Five-why is the baseline. Action items that solve the contributing factor, not just the immediate trigger.
  • Public. Every postmortem is a markdown page on the docs site. Linked from the status page incident entry.
Why publish postmortems at all

Most SaaS vendors would rather bury an incident than explain it. We think the opposite. The vendors we trust are the ones who told us what broke and how they fixed it. We try to be that vendor — if you are evaluating us against GlockApps or Mail-Tester, this page is part of the answer.

How to subscribe

  • RSS — point any feed reader at /status.rss.
  • Email — one email per incident; resolved notice when we close.
  • Webhook — register your endpoint; we POST the incident payload on incident.created, incident.updated, and incident.resolved.
  • Slack — Slackbot integration via the standard status-page incoming webhook recipe.

FAQ

What SLA does the free tier get?

99.9% on the public free endpoints, which is what the status page tracks. No financial SLA on the free tier — we simply publish the number honestly. Paid API customers get a contractual SLA.

How fast do you acknowledge an incident?

Target is under 5 minutes from the first alert. The status page shows ACK time on every past incident so the number is auditable.

Do you count external provider outages against your SLA?

No — we mark those as degradation per provider. Our SLA covers API availability; placement accuracy depends on third-party providers we cannot force to have uptime.

Where is the postmortem archive?

Linked from every incident on the status page, and a single index under /status/postmortems.
Related reading

Check your deliverability across 20+ providers

Gmail, Outlook, Yahoo, Mail.ru, Yandex, GMX, ProtonMail and more. Real inbox screenshots, SPF/DKIM/DMARC, spam engine verdicts. Free, no signup.

Run Free Test →

Unlimited tests · 20+ seed mailboxes · Live results · No account required