A/B Testing Cold Email Content Beats Warmup Volume Every Time

Gmail classifies content; then it classifies the sender partially on the response to that content. Better content is a direct upstream lever on both classifications. It's the single most underused placement variable in cold outbound, and it doesn't come in a subscription.

TL;DR

Test subject lines, first lines, length, CTA type, and tracking on/off. Each dimension typically moves placement 5–20 points on Gmail. Warmup pool volume, if it moves anything at all, moves 2–4 points. The content variable dominates the pool variable.

Five content variables that move placement

Subject line length and form. Short + curious vs long + specific. Move: 5–10 points on Gmail Promotions vs Primary classification.
First line personalisation. Referencing something specific and recent vs generic greeting. Move: 10–20 points via downstream reply rate → engagement signal.
Length. Under 80 words vs 80–150 vs 150+. Primary-placement probability peaks in the 60–100 word band.
Link count. Zero vs one vs two+. One link is the sweet spot for cold; two+ is a strong spam signal on Gmail for unestablished senders.
Tracking pixel on/off. Pixel-off consistently outperforms pixel-on by 3–8 points on Gmail. Pixel opens are unreliable post-MPP anyway.

Test protocol

Hold everything except the variable under test constant.
Run variants against a fixed seed network and against a matched subset of real prospects.
Record placement from both sources. The seed result is controlled; the prospect result confirms real-world lift.
Minimum 200 addresses per arm for meaningful provider-level breakdown.
Test one variable at a time. Interactions are real but small at this stage — sequence them.

Measurement without a warmup dashboard

An external seed network gives you same-template, same-authentication placement breakdown per provider and per folder. That's all you need for content tests.

Per-provider Inbox/Spam/Promotions breakdown.
Repeat runs 2–3 times per arm to control for intraday variance.
Hold sending cadence constant during the test.

What a good content A/B looks like

Example from one of our case studies: same domain, same SPF/DKIM/DMARC, same ramp. Control subject “Quick question for {company}”, variant subject “{first_name}, your {tool} stack”. Gmail Primary placement: 37% → 54%. Outlook Focused: 61% → 69%. This, from one subject variable.

The same team had spent $600/month on Lemwarm for 8 weeks with no measurable independent-placement lift. The $0 content test produced 17 points in a week.

Run content A/B with real seeds

Inbox Check is free and outside every warmup pool. Variant vs control numbers you can take to your boss.

Why content compounds and pools don't

Better content generates real replies. Real replies are weighted far above pool replies. Real replies improve sender score. Improved sender score lifts the next send's placement, which improves reply rate further. Content work compounds; pool work plateaus at whatever the pool's detection state allows.

FAQ

Is this just ESP A/B testing?

No — ESPs test open/click, which are noisy post-MPP and don't reflect placement. This protocol tests placement directly via seed network, which maps to real-world outcomes.

How many arms should I run?

Two for a first cut, then champion-vs-challenger ongoing. Multi-armed when you have statistical budget.

Can I content-test while my warmup runs?

Yes, but run the kill test first (pause warmup for 48h) so you know what the pool is actually contributing. You may find there's no need to keep paying.

A/B testing content beats warmup volume every time