Your App Sends 50 Emails/Day. Do You Know Where They Go?

Most software teams can tell you exactly what their HTTP 5xx rate is, how long their p99 database query takes, and what their uptime was last quarter. Ask the same team where their transactional email landed last week and the answer is almost always the same: "our ESP says it was delivered." Which is not an answer.

This is not about marketing mail. This is the dozens of small, invisible emails your product generates every day: the verification link a new user needs to finish signup, the password reset, the 2FA code, the invoice from billing, the weekly digest, the invitation to a team, the password-change notification to the compromised user. Every one of these messages has a customer on the other end waiting for it, and the only monitoring in place is an ESP dashboard that conflates "accepted by receiving SMTP" with "delivered to inbox".

Enumerate what your app actually sends

The first step is a template census. Most teams discover, when they do this exercise, that their app sends three to five times more types of email than they thought. A quick checklist that covers the majority of SaaS and consumer apps:

Signup flow. Verification email, welcome email, onboarding tips day 1/3/7.
Authentication. Password reset, 2FA code, new-device alert, suspicious login.
Billing. Trial started, trial ending, invoice, receipt, payment failed, renewal reminder, cancellation confirmation.
Collaboration. Invite to workspace, invite accepted, mention, comment, assignment.
Lifecycle. Re-engagement after inactivity, feature launch announcements, incident postmortems.
Support. Ticket created, ticket updated, ticket resolved, satisfaction survey.

A team of five to ten engineers shipping a typical B2B SaaS ends up with 30–60 distinct templates once the census is done. At even a modest user base, this is easily 50 emails per day leaving the building.

What your ESP dashboard hides

Every ESP reports three numbers: delivered, bounced, complained. None of these distinguish between the four possible fates of a transactional email:

accepted by receiving SMTP
├── lands in Inbox           ← what you want
├── lands in Promotions      ← acceptable for marketing, bad for transactional
├── lands in Spam            ← user never sees it
└── silently dropped / greylisted  ← rare but worst

From the ESP's perspective, all four are "delivered" because the receiving server returned a 2xx status. The differentiation happens after the handoff, inside Gmail / Outlook / Yahoo / iCloud's filter pipeline, and that decision is invisible to the sender.

The engineering analogy

"Delivered" in an ESP dashboard is the equivalent of a client-side 200 OK that only verifies the load balancer accepted the request. You still have no idea whether the request made it to the application, succeeded there, or was silently routed to a failure bucket. You would not run production on that signal alone for HTTP. Don't do it for email either.

A minimal transactional-email observability setup

1. A canonical set of seed addresses

Own one real mailbox on each provider your users use. For a global SaaS that means Gmail, Outlook.com, Yahoo, iCloud, Google Workspace, Office 365, Fastmail, Zoho. For CIS add Mail.ru, Yandex, Rambler. These mailboxes exist to receive your production traffic as if they were real users.

2. A test harness that replays real templates

Write a small job that picks one of your templates each run, renders it with a fixture user whose email is a seed address, and fires it through the same sending path production uses. Rotate through all templates over the course of a day.

# pseudocode for a per-template placement probe
for template in enumerate_templates():
    fixture = make_fixture_user(email=seed_address_for_provider(template))
    send(template, fixture)            # uses the real production path
    record_provider_placement(template, seed_address_for_provider(template))

3. Read the headers to determine placement

Each provider writes the folder (Inbox / Spam / Promotions / other) into the received-message headers with a known pattern. An IMAP connection plus a header parser is enough to tell you where each seeded copy of your template landed. This is the step most teams skip because building it is tedious, which is why services like this exist.

4. Alert on regressions, not thresholds

Don't alert on "spam rate > 5%". Alert on "spam rate this hour is 3σ above baseline for this template." Placement for a given template on a given provider is very stable week-to-week; changes are the signal.

Drop-in observability for transactional mail

Skip building the seed network, IMAP parsers, and provider heuristics. Use our API to run placement tests on your real templates from cron, CI, or on every deploy. Free tier available, no signup needed.

→ API docs · Run a free test

What to monitor, ranked by business impact

Password reset. Users who can't reset quietly leave. See the password-reset analysis.
2FA / login code. Users get locked out completely if the code doesn't arrive. Zero-tolerance flow.
Payment and billing. Missing renewal reminders cause surprise churn labeled as "cancelled" that is really "never saw the warning".
Order / action confirmation. Duplicate orders, support tickets, trust erosion. See the order-confirmation breakdown.
Invite / collaboration. Missed invites block team expansion; users assume the product "doesn't work" and go elsewhere.
Security alerts. New-device notifications and password-change emails in spam defeat the entire point of sending them.
Onboarding sequence. Engagement mail matters less as urgency, but a user who never sees the welcome email starts cold and churns faster.
Notifications and digests. Lower priority individually but the biggest volume; worth monitoring the aggregate placement.

Common pitfalls once you start monitoring

Testing with clean boilerplate. Don't test with "Hello, this is a test" — filters evaluate the real template. Fire the production template with a fixture user, always.
Testing too rarely. A monthly test misses a three-day Gmail penalty. An hourly test per critical flow catches regressions within the blast radius.
Averaging across providers. "93% inbox" overall might hide "0% inbox at Yahoo". Always monitor by provider.
Not tying tests to deploys. If you can, run a placement check as part of your CD pipeline when templates, DNS, or ESP config changes. Catch regressions before they ship, not after users complain — and they won't.

FAQ

Is this really worth the effort for a small app?

Do the math: if you have 1,000 users, 3% try to reset per month, and 10% of resets end up in spam, you lose three users a month invisibly. Add in onboarding, invites, and billing, and you are quietly losing 1–2% of your user base to email filters. For most apps this pays for itself as soon as you see the first regression.

We use Postmark / Resend / a reputable transactional ESP. Doesn't that solve this?

Using a good transactional-specific ESP helps (separate IP pools, better setup guidance, stronger defaults) but does not solve placement. Reputation is tied to your domain, not your ESP. Even with Postmark, your From domain's DKIM/SPF/DMARC, your template, and your engagement patterns determine placement. The ESP cannot measure it for you — it ends at delivered.

How is this different from Mailtrap?

Mailtrap catches errors in staging — it intercepts outbound mail so you can review templates before they reach real inboxes. Inbox-check runs in production and tells you where real sends actually landed across real providers. Different problem, different tool. Most teams need both.

What are the actionable fixes when the test reveals a problem?

First, authentication: verify SPF, DKIM, DMARC for the exact subdomain you're sending from. Second, split transactional and marketing domains so one does not poison the other. Third, trim templates — fewer images, fewer links, no all-caps. Fourth, monitor: regressions happen, and the only way to catch them before users do is continuous seed testing.