Every API integration is one retry loop away from subtle bugs. Retry too eagerly and you double-charge yourself. Retry too timidly and you lose requests to transient 502s. Retry a POST without idempotency and you book two tests instead of one. This page is the contract: what to retry, how long to wait, and when to give up.
4xx — your fault, do not retry unless you fix the request. 429 — read Retry-After, then retry. 5xx — exponential backoff with jitter, max 5 attempts. POST without an idempotency key — never retry.
Taxonomy of errors
Two axes. The first is who is at fault: client (your request) or server (our infrastructure). The second is whether retrying is safe: retriable or terminal. That gives a 2×2 matrix:
- Client, terminal — 400, 401, 403, 404, 422. Fix the request and resubmit manually; retry logic must not retry these.
- Client, retriable — 429 (rate limit). Wait per
Retry-After, retry same payload. - Server, retriable — 500, 502, 503, 504. Exponential backoff, max 5 attempts, give up and alert.
- Server, terminal — rare. 501 Not Implemented from a client using a removed endpoint. Fix the client.
Error envelope shape
Every error response has the same shape:
HTTP/1.1 422 Unprocessable Entity
X-Request-Id: req_01H9XBADREQUEST
{
"data": null,
"error": {
"code": "sender_domain_blocked",
"message": "Sender domain 'example.tld' is on the internal block list.",
"field": "from",
"docs": "https://check.live-direct-marketing.online/docs/errors#sender_domain_blocked"
},
"meta": { "requestId": "req_01H9XBADREQUEST", "version": "2026-07-01" }
}The error.code is a stable string enum you can switch on. The error.message is human-readable and may change wording over time — do not parse it. The requestId is what you cite when you file a support ticket.
Status code map with retry guidance
Status Retry? Wait Notes
------ ------ -------------- -----------------------------------
400 No - Malformed JSON. Fix and resubmit.
401 No - Invalid/missing Bearer. Re-auth.
403 No - Forbidden. Check scopes.
404 No - Gone or never existed.
409 No - Idempotency conflict. Change key.
422 No - Business rule violation.
429 YES Retry-After s Rate limit. Exact wait in header.
500 YES Backoff Transient. Up to 5 attempts.
502 YES Backoff Upstream. Up to 5 attempts.
503 YES Retry-After s Maintenance. Header gives wait.
504 YES Backoff Upstream timeout. Up to 5 attempts.Recommended exponential backoff
For 5xx responses without a Retry-After header, use exponential backoff with jitter. The formula we recommend:
wait_ms = min(30000, base * 2^attempt) + random(0, 1000)
attempt base=500ms wait
1 500 ~1.0s
2 500 ~2.0s
3 500 ~4.0s
4 500 ~8.5s
5 500 ~16s
give upThe jitter term is essential. Without it, a fleet of clients retrying a flaky endpoint synchronise their retries and produce a thundering herd exactly when the upstream is most vulnerable. One second of random jitter breaks the synchronisation without changing perceived latency.
Idempotency keys
Every write endpoint (POST, DELETE) accepts an Idempotency-Key header. If you pass the same key twice within 24 hours, we return the cached response of the first request without running the operation again.
POST /api/tests
Authorization: Bearer ic_live_xxx
Idempotency-Key: tenant-42:campaign-99:2026-07-04T11:00
Content-Type: application/json
{ "from": "...", "subject": "...", "html": "..." }Rules for the key:
- Unique per logical operation. A good key includes the tenant ID, the operation, and a timestamp or UUID that changes only when you genuinely want a new operation.
- At most 255 bytes. ASCII only.
- If you retry with the same key but a different request body, you get
409 Conflict. Change one or the other. - Keys expire after 24 hours. After that the same key is fresh again.
Retry loops that don't pass an idempotency key will double-bill you on every 5xx. The failure mode is silent: the first POST succeeds server-side but returns 502 to you (a transient edge failure). Your retry creates a second test. Now you have two tests with different IDs, both charged, and your client has lost track of the first. Always, always pass an idempotency key on POST.
Timeouts — request vs total
Two timeouts matter, and they are different.
- Request timeout — how long any single HTTP call may take. Set to 30 seconds. Our endpoints either return in well under a second or genuinely time out.
- Total timeout — your entire retry loop's budget. Set to 5 minutes. Past that, abandon and surface the error to a human — the test result will be queryable later by ID if it went through.
The SSE stream is an exception. Keep-alive pings every 15s mean you can sit on the socket for up to 10 minutes. Set a much longer read-timeout (e.g. 30s idle) and be prepared to re-subscribe if the connection drops.
429 handling with Retry-After
When we rate-limit you, the response looks like:
HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1751454060
{
"data": null,
"error": {
"code": "rate_limited",
"message": "Rate limit exceeded. Retry in 12 seconds."
},
"meta": { "requestId": "req_..." }
}Retry-After is in seconds (an integer). Sleep for exactly that long — not longer, not shorter — and retry the same request with the same idempotency key. Do not apply exponential backoff on top; the header has the authoritative answer.
5xx without Retry-After
5xx responses do not carry Retry-After. Apply your exponential backoff, with jitter, up to 5 attempts. If you are still failing after 5 attempts over ~30 seconds, something is genuinely broken and more retries will not help — alert a human.
Webhook retries (ours, not yours)
If your webhook endpoint returns non-2xx or times out, we retry automatically. The schedule:
Attempt 1: immediate
Attempt 2: 30s later
Attempt 3: 5 min later
Attempt 4: 30 min later
Attempt 5: 3 hours later
Attempt 6: 12 hours later
Attempt 7: 24 hours later
Give up: after 7 failuresYour webhook must be idempotent. We will deliver the same event ID more than once in normal operation — not just on retry. Dedupe on event.id in your database.
Retry snippets — Node and Python
Node / TypeScript
import { randomUUID } from 'crypto';
async function withRetry<T>(fn: () => Promise<Response>, maxAttempts = 5): Promise<T> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
const res = await fn();
if (res.ok) return res.json() as Promise<T>;
if (res.status >= 400 && res.status < 500 && res.status !== 429) {
// Terminal 4xx — do not retry
throw Object.assign(new Error(`HTTP ${res.status}`), { status: res.status });
}
const retryAfter = res.headers.get('retry-after');
const waitMs = retryAfter
? parseInt(retryAfter, 10) * 1000
: Math.min(30000, 500 * 2 ** attempt) + Math.random() * 1000;
if (attempt === maxAttempts) throw new Error(`Giving up after ${maxAttempts} attempts`);
await new Promise((r) => setTimeout(r, waitMs));
}
throw new Error('unreachable');
}
// Usage: POST with idempotency key
const idemKey = `tenant-${tenantId}:${randomUUID()}`;
const data = await withRetry(() =>
fetch('https://check.live-direct-marketing.online/api/tests', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.IC_KEY}`,
'Content-Type': 'application/json',
'Idempotency-Key': idemKey,
},
body: JSON.stringify(payload),
signal: AbortSignal.timeout(30000),
}),
);Python
import os, time, random, uuid, httpx
BASE = "https://check.live-direct-marketing.online"
def with_retry(send, max_attempts=5):
for attempt in range(1, max_attempts + 1):
resp = send()
if resp.status_code < 400:
return resp.json()
terminal = 400 <= resp.status_code < 500 and resp.status_code != 429
if terminal:
resp.raise_for_status()
retry_after = resp.headers.get("retry-after")
if retry_after:
wait = int(retry_after)
else:
wait = min(30, 0.5 * (2 ** attempt)) + random.random()
if attempt == max_attempts:
resp.raise_for_status()
time.sleep(wait)
idem = f"tenant-{tenant_id}:{uuid.uuid4()}"
with httpx.Client(timeout=30.0) as c:
data = with_retry(lambda: c.post(
f"{BASE}/api/tests",
headers={
"Authorization": f"Bearer {os.environ['IC_KEY']}",
"Content-Type": "application/json",
"Idempotency-Key": idem,
},
json=payload,
))Integration test patterns
Two tests every integration should have, and most don't:
- The 429 storm. Mock the transport to return 429 with
Retry-After: 2on the first two attempts, 200 on the third. Assert your client waits roughly 4 seconds and ultimately succeeds. - The 502 double-POST. Mock a POST to return 502 on the first attempt, 200 on the second. Assert that the second call carries the same
Idempotency-Keyas the first.
Both take about 20 lines of test code and will catch the two classes of bug that consume the most support time in practice.