Most software teams can tell you exactly what their HTTP 5xx rate is, how long their p99 database query takes, and what their uptime was last quarter. Ask the same team where their transactional email landed last week and the answer is almost always the same: "our ESP says it was delivered." Which is not an answer.
This is not about marketing mail. This is the dozens of small, invisible emails your product generates every day: the verification link a new user needs to finish signup, the password reset, the 2FA code, the invoice from billing, the weekly digest, the invitation to a team, the password-change notification to the compromised user. Every one of these messages has a customer on the other end waiting for it, and the only monitoring in place is an ESP dashboard that conflates "accepted by receiving SMTP" with "delivered to inbox".
Enumerate what your app actually sends
The first step is a template census. Most teams discover, when they do this exercise, that their app sends three to five times more types of email than they thought. A quick checklist that covers the majority of SaaS and consumer apps:
- Signup flow. Verification email, welcome email, onboarding tips day 1/3/7.
- Authentication. Password reset, 2FA code, new-device alert, suspicious login.
- Billing. Trial started, trial ending, invoice, receipt, payment failed, renewal reminder, cancellation confirmation.
- Collaboration. Invite to workspace, invite accepted, mention, comment, assignment.
- Lifecycle. Re-engagement after inactivity, feature launch announcements, incident postmortems.
- Support. Ticket created, ticket updated, ticket resolved, satisfaction survey.
A team of five to ten engineers shipping a typical B2B SaaS ends up with 30–60 distinct templates once the census is done. At even a modest user base, this is easily 50 emails per day leaving the building.
What your ESP dashboard hides
Every ESP reports three numbers: delivered, bounced, complained. None of these distinguish between the four possible fates of a transactional email:
accepted by receiving SMTP
├── lands in Inbox ← what you want
├── lands in Promotions ← acceptable for marketing, bad for transactional
├── lands in Spam ← user never sees it
└── silently dropped / greylisted ← rare but worstFrom the ESP's perspective, all four are "delivered" because the receiving server returned a 2xx status. The differentiation happens after the handoff, inside Gmail / Outlook / Yahoo / iCloud's filter pipeline, and that decision is invisible to the sender.
"Delivered" in an ESP dashboard is the equivalent of a client-side 200 OK that only verifies the load balancer accepted the request. You still have no idea whether the request made it to the application, succeeded there, or was silently routed to a failure bucket. You would not run production on that signal alone for HTTP. Don't do it for email either.
A minimal transactional-email observability setup
1. A canonical set of seed addresses
Own one real mailbox on each provider your users use. For a global SaaS that means Gmail, Outlook.com, Yahoo, iCloud, Google Workspace, Office 365, Fastmail, Zoho. For CIS add Mail.ru, Yandex, Rambler. These mailboxes exist to receive your production traffic as if they were real users.
2. A test harness that replays real templates
Write a small job that picks one of your templates each run, renders it with a fixture user whose email is a seed address, and fires it through the same sending path production uses. Rotate through all templates over the course of a day.
# pseudocode for a per-template placement probe
for template in enumerate_templates():
fixture = make_fixture_user(email=seed_address_for_provider(template))
send(template, fixture) # uses the real production path
record_provider_placement(template, seed_address_for_provider(template))3. Read the headers to determine placement
Each provider writes the folder (Inbox / Spam / Promotions / other) into the received-message headers with a known pattern. An IMAP connection plus a header parser is enough to tell you where each seeded copy of your template landed. This is the step most teams skip because building it is tedious, which is why services like this exist.
4. Alert on regressions, not thresholds
Don't alert on "spam rate > 5%". Alert on "spam rate this hour is 3σ above baseline for this template." Placement for a given template on a given provider is very stable week-to-week; changes are the signal.
Skip building the seed network, IMAP parsers, and provider heuristics. Use our API to run placement tests on your real templates from cron, CI, or on every deploy. Free tier available, no signup needed.
What to monitor, ranked by business impact
- Password reset. Users who can't reset quietly leave. See the password-reset analysis.
- 2FA / login code. Users get locked out completely if the code doesn't arrive. Zero-tolerance flow.
- Payment and billing. Missing renewal reminders cause surprise churn labeled as "cancelled" that is really "never saw the warning".
- Order / action confirmation. Duplicate orders, support tickets, trust erosion. See the order-confirmation breakdown.
- Invite / collaboration. Missed invites block team expansion; users assume the product "doesn't work" and go elsewhere.
- Security alerts. New-device notifications and password-change emails in spam defeat the entire point of sending them.
- Onboarding sequence. Engagement mail matters less as urgency, but a user who never sees the welcome email starts cold and churns faster.
- Notifications and digests. Lower priority individually but the biggest volume; worth monitoring the aggregate placement.
Common pitfalls once you start monitoring
- Testing with clean boilerplate. Don't test with "Hello, this is a test" — filters evaluate the real template. Fire the production template with a fixture user, always.
- Testing too rarely. A monthly test misses a three-day Gmail penalty. An hourly test per critical flow catches regressions within the blast radius.
- Averaging across providers. "93% inbox" overall might hide "0% inbox at Yahoo". Always monitor by provider.
- Not tying tests to deploys. If you can, run a placement check as part of your CD pipeline when templates, DNS, or ESP config changes. Catch regressions before they ship, not after users complain — and they won't.