How to Test If Your Warmup Actually Worked (Seed Test After Warmup)

The dirty secret of every automated warmup tool is that the success metric they show you is measured inside their own pool. Mailwarm tells you 96% of messages landed in inbox — but those messages were sent to other Mailwarm subscribers' mailboxes, which by design accept and engage with everything that comes in. Lemwarm shows similar numbers from the Lemwarm pool. Warmy from the Warmy pool. The numbers aren't fake; they're just not measuring the thing you actually care about.

TL;DR

Self-reported warmup metrics measure performance inside a private network, not against real Gmail and Outlook filtering. Run three independent seed tests at staggered times across 20+ real provider mailboxes. Aim for 80%+ inbox at Gmail and Outlook before going live. Anything below 60% means more warmup, not more campaigns.

Why self-reported warmup metrics lie

A warmup tool's pool is constructed to make every message look engaged. Pool mailboxes auto-reply, auto-mark important, auto-move out of spam. Gmail and Outlook quickly learn to recognise these behaviours when they happen at scale across identifiable IP ranges, and they discount them. So the engagement numbers the tool reports are partly real and partly invisible to the actual filters.

Worse, the tool reports inbox placement based on whether its own pool mailboxes received the message in inbox. Those mailboxes are configured permissively. They aren't a stress test of Gmail's real-world spam filtering decisions on a 1,000- recipient cold campaign.

The result: a domain showing 96% inbox rate inside the warmup tool's dashboard can land in spam at 50% on the first real cold campaign. We see this regularly.

What a real seed test looks like

A real seed test sends one message from your warmed domain to a defined set of real, unrelated mailboxes across major providers, and measures where each message lands: inbox, promotions, spam, or missing.

Minimum coverage to be useful:

3+ Gmail addresses (different accounts, different labels).
2+ Outlook.com / Hotmail / Live mailboxes.
2+ Yahoo / AOL mailboxes.
1+ Apple iCloud mailbox.
1+ ProtonMail or Tutanota (privacy filter behaviour).
1+ Mail.ru and 1+ Yandex if relevant to your audience.
1+ Workspace business address (Google Workspace).
1+ Office 365 business mailbox.

Anything fewer than 10 seeds will give you a noisy result. Real Gmail placement is not a binary; it varies per recipient based on that recipient's individual signals. You need enough seeds to see the distribution, not a single data point.

Run three tests, staggered

A single seed test on Tuesday morning measures Tuesday-morning Gmail behaviour for that test message's subject line and body. Tuesday afternoon Gmail can decide differently. Friday morning's filter can be tuned differently again.

Test 1. Tuesday 10:00, with a campaign-style subject and body that mirrors what you actually plan to send.
Test 2. Wednesday 14:00, with a different subject line but same content style.
Test 3. Friday 09:00, with a third subject variant.

If all three test runs hit 80%+ inbox at Gmail and Outlook across the seed set, you're ready to go live. If any single test drops below 60%, you have a problem you need to diagnose before launching: it could be content (subject lines, links, formatting), it could be auth degradation, it could be that the warmup didn't take.

What thresholds to demand

Different recipient types have different realistic ceilings:

Gmail consumer. 80%+ inbox is the bar for cold outreach. Below 70% means significant lost reach. Below 50% means do not launch.
Outlook consumer. 60%+ is realistic; 80%+ is excellent. Outlook is structurally harsher on new domains.
Google Workspace. 85%+ should be easy with a clean warmup. Workspace defaults are more permissive than consumer Gmail for B2B-shaped content.
Office 365 business. 70%+ is the practical bar. Microsoft is famously variable here.
Yahoo / AOL. 75%+ inbox; below 60% indicates a reputation issue at the Yahoo/AOL platform.

Run the test before you spend a cent on copy

A 10-minute seed test against 20+ real provider mailboxes tells you in objective terms whether the four weeks of warmup worked. Run a free placement test before you queue your first campaign.

Postmaster Tools cross-reference

Seed tests give you the per-recipient verdict. Google Postmaster Tools gives you the aggregate Gmail view. Cross-reference both:

Domain reputation should be at least Medium for a cold-outreach launch. Low is launchable but you'll underperform.
Spam rate in Postmaster should be under 0.1%. Anything above 0.3% means stop, not launch.
IP reputation, if you're on a dedicated IP, should be High or Medium-High.
Authentication pass rate at 99%+ for SPF, DKIM, DMARC.

What failure looks like

Common failure patterns and what they mean:

50% inbox at Gmail, 90% at Outlook. Gmail-specific reputation issue. Likely insufficient engagement during warmup, or a complaint cluster at Gmail.
80% inbox everywhere except Outlook 30%. Outlook-specific throttling — fairly common with new domains, often improves with another 30 days. Subscribe to SNDS to see what Microsoft is measuring.
Most messages in Promotions at Gmail. Not a deliverability failure per se — Promotions is technically inbox. But for cold outreach you want Primary. Promotions placement usually comes from content cues: heavy HTML, multiple links, image- heavy formatting.
Missing messages at any provider. Worst case. The message was silently dropped. Investigate auth (DKIM/SPF alignment), then blacklist status, then ESP routing.

When to redo warmup

If two out of three seed tests show under 60% inbox at Gmail or Outlook, the warmup didn't take. Don't launch. The options are:

Continue warmup for 2 more weeks with stricter focus on engagement signals (replies, folder moves) rather than volume.
Investigate root cause — auth misconfiguration, content issues, blacklist hits — and fix before resuming.
Restart on a fresh domain if you discover something fundamentally broken (e.g., the domain has reputation history you didn't know about).

Frequently asked questions

Can I trust my warmup tool's reported inbox rate?

As a directional indicator of warmup activity, yes. As a measure of real-world Gmail and Outlook placement, no. Pool-internal numbers are routinely 30+ percentage points higher than real seed test numbers.

How many seed tests should I run before launch?

At least three, staggered across days and times. One test is a snapshot; three is enough to see whether placement is stable or volatile. Volatile placement is itself a warning sign.

Do seed tests damage warmup?

No. A handful of additional sends to seed mailboxes are negligible against the warmup volume curve, and seed mailboxes don't complain or bounce.

What if Postmaster shows Medium but seed tests show 40% inbox?

Postmaster lags real-time placement decisions by hours to days. The seed test is more current. Trust the seed test, hold the launch, and watch Postmaster for confirmation in 2–3 days.

How to test if your warmup actually worked