API8 min read

Error handling — retries, timeouts, idempotency

Getting error handling right is what separates the integration that silently loses tests from the one you can rely on for CI gates. Here is the contract.

Every API integration is one retry loop away from subtle bugs. Retry too eagerly and you double-charge yourself. Retry too timidly and you lose requests to transient 502s. Retry a POST without idempotency and you book two tests instead of one. This page is the contract: what to retry, how long to wait, and when to give up.

The rule of thumb

4xx — your fault, do not retry unless you fix the request. 429 — read Retry-After, then retry. 5xx — exponential backoff with jitter, max 5 attempts. POST without an idempotency key — never retry.

Taxonomy of errors

Two axes. The first is who is at fault: client (your request) or server (our infrastructure). The second is whether retrying is safe: retriable or terminal. That gives a 2×2 matrix:

  • Client, terminal — 400, 401, 403, 404, 422. Fix the request and resubmit manually; retry logic must not retry these.
  • Client, retriable — 429 (rate limit). Wait per Retry-After, retry same payload.
  • Server, retriable — 500, 502, 503, 504. Exponential backoff, max 5 attempts, give up and alert.
  • Server, terminal — rare. 501 Not Implemented from a client using a removed endpoint. Fix the client.

Error envelope shape

Every error response has the same shape:

HTTP/1.1 422 Unprocessable Entity
X-Request-Id: req_01H9XBADREQUEST

{
  "data": null,
  "error": {
    "code":    "sender_domain_blocked",
    "message": "Sender domain 'example.tld' is on the internal block list.",
    "field":   "from",
    "docs":    "https://check.live-direct-marketing.online/docs/errors#sender_domain_blocked"
  },
  "meta": { "requestId": "req_01H9XBADREQUEST", "version": "2026-07-01" }
}

The error.code is a stable string enum you can switch on. The error.message is human-readable and may change wording over time — do not parse it. The requestId is what you cite when you file a support ticket.

Status code map with retry guidance

Status  Retry?  Wait            Notes
------  ------  --------------  -----------------------------------
400     No      -               Malformed JSON. Fix and resubmit.
401     No      -               Invalid/missing Bearer. Re-auth.
403     No      -               Forbidden. Check scopes.
404     No      -               Gone or never existed.
409     No      -               Idempotency conflict. Change key.
422     No      -               Business rule violation.
429     YES     Retry-After s   Rate limit. Exact wait in header.
500     YES     Backoff         Transient. Up to 5 attempts.
502     YES     Backoff         Upstream. Up to 5 attempts.
503     YES     Retry-After s   Maintenance. Header gives wait.
504     YES     Backoff         Upstream timeout. Up to 5 attempts.

Recommended exponential backoff

For 5xx responses without a Retry-After header, use exponential backoff with jitter. The formula we recommend:

wait_ms = min(30000, base * 2^attempt) + random(0, 1000)

attempt  base=500ms  wait
1        500         ~1.0s
2        500         ~2.0s
3        500         ~4.0s
4        500         ~8.5s
5        500         ~16s
give up

The jitter term is essential. Without it, a fleet of clients retrying a flaky endpoint synchronise their retries and produce a thundering herd exactly when the upstream is most vulnerable. One second of random jitter breaks the synchronisation without changing perceived latency.

Idempotency keys

Every write endpoint (POST, DELETE) accepts an Idempotency-Key header. If you pass the same key twice within 24 hours, we return the cached response of the first request without running the operation again.

POST /api/tests
Authorization: Bearer ic_live_xxx
Idempotency-Key: tenant-42:campaign-99:2026-07-04T11:00
Content-Type: application/json

{ "from": "...", "subject": "...", "html": "..." }

Rules for the key:

  • Unique per logical operation. A good key includes the tenant ID, the operation, and a timestamp or UUID that changes only when you genuinely want a new operation.
  • At most 255 bytes. ASCII only.
  • If you retry with the same key but a different request body, you get 409 Conflict. Change one or the other.
  • Keys expire after 24 hours. After that the same key is fresh again.
The classic bug

Retry loops that don't pass an idempotency key will double-bill you on every 5xx. The failure mode is silent: the first POST succeeds server-side but returns 502 to you (a transient edge failure). Your retry creates a second test. Now you have two tests with different IDs, both charged, and your client has lost track of the first. Always, always pass an idempotency key on POST.

Timeouts — request vs total

Two timeouts matter, and they are different.

  • Request timeout — how long any single HTTP call may take. Set to 30 seconds. Our endpoints either return in well under a second or genuinely time out.
  • Total timeout — your entire retry loop's budget. Set to 5 minutes. Past that, abandon and surface the error to a human — the test result will be queryable later by ID if it went through.

The SSE stream is an exception. Keep-alive pings every 15s mean you can sit on the socket for up to 10 minutes. Set a much longer read-timeout (e.g. 30s idle) and be prepared to re-subscribe if the connection drops.

429 handling with Retry-After

When we rate-limit you, the response looks like:

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1751454060

{
  "data": null,
  "error": {
    "code":    "rate_limited",
    "message": "Rate limit exceeded. Retry in 12 seconds."
  },
  "meta": { "requestId": "req_..." }
}

Retry-After is in seconds (an integer). Sleep for exactly that long — not longer, not shorter — and retry the same request with the same idempotency key. Do not apply exponential backoff on top; the header has the authoritative answer.

5xx without Retry-After

5xx responses do not carry Retry-After. Apply your exponential backoff, with jitter, up to 5 attempts. If you are still failing after 5 attempts over ~30 seconds, something is genuinely broken and more retries will not help — alert a human.

Webhook retries (ours, not yours)

If your webhook endpoint returns non-2xx or times out, we retry automatically. The schedule:

Attempt 1:  immediate
Attempt 2:  30s later
Attempt 3:  5 min later
Attempt 4:  30 min later
Attempt 5:  3 hours later
Attempt 6:  12 hours later
Attempt 7:  24 hours later
Give up:    after 7 failures

Your webhook must be idempotent. We will deliver the same event ID more than once in normal operation — not just on retry. Dedupe on event.id in your database.

Retry snippets — Node and Python

Node / TypeScript

import { randomUUID } from 'crypto';

async function withRetry<T>(fn: () => Promise<Response>, maxAttempts = 5): Promise<T> {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const res = await fn();
    if (res.ok) return res.json() as Promise<T>;

    if (res.status >= 400 && res.status < 500 && res.status !== 429) {
      // Terminal 4xx — do not retry
      throw Object.assign(new Error(`HTTP ${res.status}`), { status: res.status });
    }

    const retryAfter = res.headers.get('retry-after');
    const waitMs = retryAfter
      ? parseInt(retryAfter, 10) * 1000
      : Math.min(30000, 500 * 2 ** attempt) + Math.random() * 1000;

    if (attempt === maxAttempts) throw new Error(`Giving up after ${maxAttempts} attempts`);
    await new Promise((r) => setTimeout(r, waitMs));
  }
  throw new Error('unreachable');
}

// Usage: POST with idempotency key
const idemKey = `tenant-${tenantId}:${randomUUID()}`;
const data = await withRetry(() =>
  fetch('https://check.live-direct-marketing.online/api/tests', {
    method: 'POST',
    headers: {
      'Authorization':    `Bearer ${process.env.IC_KEY}`,
      'Content-Type':     'application/json',
      'Idempotency-Key':  idemKey,
    },
    body: JSON.stringify(payload),
    signal: AbortSignal.timeout(30000),
  }),
);

Python

import os, time, random, uuid, httpx

BASE = "https://check.live-direct-marketing.online"

def with_retry(send, max_attempts=5):
    for attempt in range(1, max_attempts + 1):
        resp = send()
        if resp.status_code < 400:
            return resp.json()

        terminal = 400 <= resp.status_code < 500 and resp.status_code != 429
        if terminal:
            resp.raise_for_status()

        retry_after = resp.headers.get("retry-after")
        if retry_after:
            wait = int(retry_after)
        else:
            wait = min(30, 0.5 * (2 ** attempt)) + random.random()

        if attempt == max_attempts:
            resp.raise_for_status()
        time.sleep(wait)

idem = f"tenant-{tenant_id}:{uuid.uuid4()}"
with httpx.Client(timeout=30.0) as c:
    data = with_retry(lambda: c.post(
        f"{BASE}/api/tests",
        headers={
            "Authorization":    f"Bearer {os.environ['IC_KEY']}",
            "Content-Type":     "application/json",
            "Idempotency-Key":  idem,
        },
        json=payload,
    ))

Integration test patterns

Two tests every integration should have, and most don't:

  1. The 429 storm. Mock the transport to return 429 with Retry-After: 2 on the first two attempts, 200 on the third. Assert your client waits roughly 4 seconds and ultimately succeeds.
  2. The 502 double-POST. Mock a POST to return 502 on the first attempt, 200 on the second. Assert that the second call carries the same Idempotency-Key as the first.

Both take about 20 lines of test code and will catch the two classes of bug that consume the most support time in practice.

Frequently asked questions

Should I retry network errors (ECONNRESET, timeouts) the same way as 5xx?

Yes. Any error that happened before you received an HTTP status code is effectively a transient 5xx — the server may or may not have processed the request. Retry with the same idempotency key and the server will either run it for the first time or return the cached result of the original.

How do I know whether a POST that timed out was actually processed?

Pass an Idempotency-Key and retry. If the server already processed it, your retry returns the cached response with the original test ID. If not, a new test is created. Either way, no double-booking.

What if my idempotency key is accidentally reused for different payloads?

You get a 409 Conflict with a pointer to the original response. Your code should treat this as a bug (rotate the key) rather than retry. In practice it happens when the key is generated from too little entropy — always include a UUID or a millisecond timestamp.

Do DNS checks (SPF/DKIM/DMARC) also need idempotency keys?

No. They are idempotent by nature — running them twice is harmless and costs you nothing. Idempotency keys matter only for endpoints that create test records and consume credits.
Related reading

Check your deliverability across 20+ providers

Gmail, Outlook, Yahoo, Mail.ru, Yandex, GMX, ProtonMail and more. Real inbox screenshots, SPF/DKIM/DMARC, spam engine verdicts. Free, no signup.

Run Free Test →

Unlimited tests · 20+ seed mailboxes · Live results · No account required