Building a Multi-Agent Email QA Pipeline With A2A

The instinct when you first see agent frameworks is to build one agent that does everything. That pattern breaks the moment the operation has real complexity. Email QA is a good example. Between drafting and sending, you want multiple specialised checks — tone, deliverability, DNS health — and each of those benefits from its own agent with its own tools and context. A2A is what lets those agents delegate work to each other without tying them together in a single codebase.

The pattern

The shape of the pipeline is simple: four agents, each doing one thing, connected by A2A task exchanges. A coordinator agent decides the order and the exit conditions. Each specialist exposes an Agent Card, advertises one or two capabilities, and handles its domain in isolation.

The benefit is not just separation of concerns. It is that each agent can live on different infrastructure, be built by different teams, and be swapped for an alternative without touching the others. The reviewer agent could be built in LangGraph, the deliverability agent hosted at our service, the DNS agent a tiny thing someone built over a weekend. They cooperate because they agree on the A2A envelope.

Architecture

Described in prose because the most useful thing is the call graph. A send request enters from a human (a marketing operator) or an upstream system (a CRM triggering a sequence). The request hits the coordinator agent. The coordinator reads the target, pulls the draft, and walks through a decision tree:

Delegate to reviewer agent for tone, compliance, and personalisation quality.
If reviewer approves, delegate to DNS agent to verify the sending domain is authentically configured today (records can drift).
If DNS is healthy, delegate to deliverability agent (us) for a placement verdict.
If placement is above threshold, hand to sender agent to actually dispatch the mail.

Each step either approves, fails, or emits a revision request. Failures propagate back to the coordinator, which decides whether to retry, escalate to human, or abort.

Why not chain them sequentially in code

You could. People do. The downside is that each check ends up coupled to the caller's runtime, and swapping one for another becomes a code change. A2A pushes the contract to the network edge, so each agent is independently deployable.

Agent roles

Reviewer agent

Reads the draft HTML and subject. Checks for tone, compliance (unsubscribe links, accurate sender identity, GDPR statements), and personalisation token correctness. Capability IDs: review-template, rewrite-template.

Deliverability agent (us)

Runs the placement test, returns per-provider inbox/spam/missing counts. Capabilities: inbox-placement-test, dns-audit, blacklist-check.

DNS agent

A narrow, fast agent focused on live DNS state. Used to cross-check before committing to a send — records can drift between the time a template was written and the time it goes out. Capabilities: check-spf, check-dkim, check-dmarc, check-bimi.

Sender agent

The final hop: dispatches approved mail. Holds provider credentials and throttling state. Capability: send-email.

Task-exchange schema

Every task between these agents uses the same A2A envelope. That uniformity is what makes the pipeline observable — the coordinator can log every envelope with the same structure, and a human debugging later can trace a send end-to-end by task ID.

// A2A task envelope, used at every hop
{
  "taskId":      "task_<id>",
  "capability":  "<capability-id>",
  "requestedBy": { "agentName": "...", "agentUrl": "https://.../.well-known/agent.json" },
  "input":       { /* capability-specific */ },
  "callbackUrl": "https://coordinator.example/a2a/callbacks/task_<id>"  // for async
}

Complete walkthrough

A real send traced through the pipeline. The draft: a cold email from outreach@news.acme.com to a list of 500 B2B prospects. Entry point: a nightly cron in the CRM fires the coordinator agent.

Hop 1: coordinator → reviewer

POST https://reviewer.acme.ai/a2a/tasks
Authorization: Bearer rev_live_...
Content-Type: application/json

{
  "taskId":      "task_01",
  "capability":  "review-template",
  "requestedBy": { "agentName": "Coordinator", "agentUrl": "https://coord.acme.ai/.well-known/agent.json" },
  "input": {
    "subject": "Quick question about your payroll stack",
    "html":    "<html>...body...</html>",
    "audience": "b2b-operations-500"
  }
}

Reviewer returns inline (sync mode, about 8 seconds for an LLM pass):

{
  "taskId": "task_01",
  "status": "completed",
  "result": {
    "verdict": "approved",
    "notes":   "Tone OK. Unsubscribe present. Personalisation tokens render correctly.",
    "score":   0.87
  }
}

Hop 2: coordinator → DNS agent

POST https://dns-agent.acme.ai/a2a/tasks
{
  "taskId":     "task_02",
  "capability": "check-dmarc",
  "input":      { "domain": "news.acme.com" }
}

{
  "taskId": "task_02",
  "status": "completed",
  "result": {
    "policy":    "quarantine",
    "alignment": "strict",
    "rua":       "dmarc@acme.com",
    "verdict":   "healthy"
  }
}

Hop 3: coordinator → deliverability agent (us)

POST https://check.live-direct-marketing.online/a2a/tasks
Authorization: Bearer ic_live_...
Content-Type: application/json

{
  "taskId":      "task_03",
  "capability":  "inbox-placement-test",
  "requestedBy": { "agentName": "Coordinator", "agentUrl": "https://coord.acme.ai/.well-known/agent.json" },
  "input": {
    "from":      "outreach@news.acme.com",
    "subject":   "Quick question about your payroll stack",
    "html":      "<html>...body...</html>",
    "providers": ["gmail", "outlook", "yahoo", "mailru", "yandex", "gmx"]
  },
  "callbackUrl": "https://coord.acme.ai/a2a/callbacks/task_03"
}

We respond 202 Accepted with an ETA. Two minutes later the coordinator's callback receives:

POST https://coord.acme.ai/a2a/callbacks/task_03
{
  "taskId": "task_03",
  "status": "completed",
  "result": {
    "inboxRate":   0.94,
    "spamRate":    0.02,
    "missingRate": 0.04,
    "perProvider": {
      "gmail":   { "folder": "inbox",      "auth": "pass" },
      "outlook": { "folder": "inbox",      "auth": "pass" },
      "yahoo":   { "folder": "inbox",      "auth": "pass" },
      "mailru":  { "folder": "promotions", "auth": "pass" },
      "yandex":  { "folder": "inbox",      "auth": "pass" },
      "gmx":     { "folder": "inbox",      "auth": "pass" }
    },
    "spamAssassinScore": 1.2
  }
}

Hop 4: coordinator → sender

Inbox rate is above the 0.9 threshold the coordinator was configured with. It delegates to the sender agent:

POST https://sender.acme.ai/a2a/tasks
{
  "taskId":     "task_04",
  "capability": "send-email",
  "input": {
    "from":      "outreach@news.acme.com",
    "audienceId": "b2b-operations-500",
    "templateId": "tpl_payroll_q3",
    "approvalChain": ["task_01", "task_02", "task_03"]
  }
}

The approvalChain is a custom field the sender uses for audit — each task ID lets a human trace the full approval path later.

Failure handling

Failures in this pipeline come in two flavours. Soft: a reviewer rejection or a deliverability verdict below threshold. Hard: an agent is down, auth failed, or a task timed out. The coordinator treats them differently.

Soft failures — loop back. Reviewer rejects → ask reviewer to rewrite. Placement below threshold → revision round. Max 3 loops, then escalate.
Hard failures — no retry inside the pipeline. The coordinator records the error and escalates to human. Retrying an agent that is returning 500s just burns time.
Partial results — DNS agent times out but deliverability succeeds. The coordinator can decide to proceed with a logged warning or to block. This is policy, not protocol.

Human-in-the-loop checkpoints

Two places benefit from an explicit human checkpoint. First, between reviewer approval and the send — for a first-time campaign, you want a human glance. Second, after any escalation — an automated loop should never silently drop a stuck send.

A2A does not model humans; it does not have to. The coordinator just does not fire the next task until a human-facing system (Slack button, dashboard click) marks the previous one approved. From A2A's point of view this is just a delay.

When this overdelivers vs underdelivers

Overdelivers: high-stakes outbound where a bad send is expensive — cold outreach to named accounts, investor updates, regulated newsletters. The latency cost of three or four agent hops (five to ten minutes end-to-end) is trivial compared to a misfired send.

Underdelivers: transactional mail (password resets, order confirmations) where latency is the product. Do not gate those on a placement test. Do gate a quarterly product-update newsletter.

Rule of thumb

If a send is expensive to get wrong and rare enough that a five-minute audit is cheap, route it through a multi-agent QA pipeline. If it is cheap to retry and fires millions of times per day, keep it on a direct path with out-of-band monitoring.

Frequently asked questions

Do all four agents need to be A2A-native?

No. Any agent that is not yet A2A-capable can sit behind a thin shim that speaks A2A on the outside and whatever the agent uses internally. The coordinator does not care.

How do you authenticate between agents in different orgs?

Each Agent Card advertises its auth scheme. Bearer tokens for simple cases, OAuth2 client credentials for multi-tenant. The coordinator holds the credentials for every peer it calls.

What stops this pipeline from running forever in a revision loop?

The coordinator enforces a loop cap (typically 3) and an overall time budget. Beyond that it escalates. The agents themselves do not know they are in a loop — only the coordinator does.

Can the same approach scale to more agents — translation, legal review, image validation?

Yes. The protocol is the same; the coordinator just has more hops to orchestrate. The risk at scale is latency and failure-mode complexity — test the pipeline on real traffic before you add a seventh agent.

Multi-agent email QA pipeline over A2A