Bot clicks from security scanners: a field guide to spotting them

Every enterprise buys a secure email gateway. Every secure email gateway opens your mail, renders the HTML, and clicks your links. For any B2B sender, between 10% and 30% of "clicks" in a typical campaign come from bots, not buyers. This article is the rough taxonomy we use internally when we audit ESP data.

The main scanner families

Not every gateway leaves the same fingerprint, but there are enough recurring patterns that a quick signature match filters most noise.

Microsoft SafeLinks / Defender — rewrites URLs through safelinks.protection.outlook.com. Usually leaks MSOffice or empty User-Agent. Most common.
Barracuda Link Protection — rewrites tolinkprotect.cudasvc.com. Fetches withBarracuda in User-Agent.
Proofpoint URL Defense — rewrites tourldefense.proofpoint.com. Sometimesurldefense.com. UA usually containsproofpoint.
Mimecast URL Protect — rewrites toprotect-*.mimecast.com. Scanner IPs are in AS-34440.
Cisco Secure Email / IronPort — rewrites tosecure-web.cisco.com. UA CiscoSecureEmailor similar.
Trend Micro Email Security — various rewrites, UA TMES.
FortiMail / Sophos / F-Secure — less common for SMB, but Fortinet and Sophos leak identifiable UAs.

A simple click-log audit

Export one hour of raw click events from your ESP. You want at minimum: timestamp, recipient, link_id, url, ip, user_agent, referrer. Load it into a notebook and ask three questions:

What fraction of clicks happen within 5 seconds of delivery? Humans are not that fast. If it is above 5%, you have scanner noise.
How many recipients clicked every single link in the email? Real humans click one, maybe two. Bots click all.
What is the distribution of clicks per recipient in the first 30 seconds? A long tail above 3 is bots.

SELECT
  recipient,
  COUNT(*) AS clicks_in_first_30s,
  COUNT(DISTINCT link_id) AS unique_links_clicked
FROM clicks
WHERE clicked_at - delivered_at < INTERVAL '30 seconds'
GROUP BY recipient
HAVING COUNT(DISTINCT link_id) >= 3
ORDER BY unique_links_clicked DESC;

Every row in that output is almost certainly a bot. Mark and exclude.

User-Agent signatures worth blocking

The following substrings are unambiguous and safe to filter:

safelinks
barracuda
proofpoint
mimecast
cisco
ironport
sophos
fortimail
trend
symantec
microsoft office
msoffice
ms-office
google-safety
google-inspectiontool
googleimageproxy
slack-imgproxy
outlookwebapp
checkpoint
tmes
f-secure

A few more are context-dependent. For example,HeadlessChrome and PhantomJS are sometimes used by sandboxing engines, and while neither belongs in marketing click logs, you may see them from headless browsers on your own side if you prefetch links in your dashboard. Check before blacklisting.

The empty-UA problem

Many scanners send no User-Agent or set it to the literal string-. About a third of all bot hits in our sample had no UA. Relying on UA alone will miss them. The fallback signals:

IP in AS-owned ranges of a gateway vendor. Usewhois or a passive DNS service.
No Accept or Accept-Language header.
First click on the link is within 2 seconds of delivery.
No cookie set, ever. Real browsers set cookies; scanners drop them.

Build a scanner table, not a static filter

Scanner User-Agents change every six months. Static blocklists go stale fast. We recommend storing scanner signatures in a database you update weekly, not in a hard-coded file. Review it during every quarterly deliverability audit.

Why you still want to see the bot clicks

Strip them from CTR, but keep them in raw logs. Bot activity is a reliable signal of delivery. If a scanner never hits your link, it often means the message went to spam and the scanner never rendered it. The ratio of "scanners hit the link" to "humans hit the link" is itself a deliverability KPI.

Two practical workflows

Workflow A: post-hoc clean-up

Ship your CTR as-is to the dashboard, but add actr_human metric alongside. Finance and exec teams keep seeing what they expect; analysts and growth teams use the clean number.

Workflow B: pre-blast seed test

Before the real send, blast to a seed list that has no scanner between sender and inbox. Every click on a seed list is a human-ish click (or your own test) and you get a noise-free baseline. Inbox Check seeds are exactly this — real mailboxes, no gateway in front.

FAQ

Do scanners always click every link, or only some?

Most scanners render the full HTML, which means every link embedded in an image map or href. Some enterprise policies sample 1 in N messages instead of 100%. Assume full coverage unless you know the policy.

What's the easiest way to tag bot clicks in Mailchimp, HubSpot, or Customer.io?

None of them expose raw User-Agent in the UI. Use the API to pull click logs, tag bots in your warehouse, and reconcile. HubSpot's new 'bot activity' filter helps but is partial.

If a scanner clicks, does the recipient ever see the email?

Almost always, yes — scanner click is part of the delivery pipeline. The scanner is checking for malware; if the URL passes, the message continues to the inbox.

Can I adversarially unclick the bot hits by firing a 'de-click' event?

No, ESPs don't support event deletion via public API. Your only tool is post-processing in your warehouse.