B2B platform for digital goods

Webhook Failure Recovery 2026: Patterns for B2B Digital-Goods APIs

Battle-tested webhook recovery patterns: backoff math, DLQ, idempotency, and SLA alerts.

Webhook Failure Recovery 2026: Patterns for B2B Digital-Goods APIs

Webhook delivery in B2B digital-goods integrations is a critical path: a missed order.delivered event means the gift-card code never reaches the customer and support is flooded with tickets. This article covers the production patterns we see at top FoxReload partners running 99.95%+ uptime.

1. Exponential backoff with jitter β€” the math

A naive 30-second retry will kill your receiver during an incident on our side (thundering herd). The correct formula:

function nextRetryDelay(attempt: number): number {
  const base = 30_000; // 30s
  const cap = 6 * 60 * 60 * 1000; // 6h
  const exp = Math.min(base * Math.pow(2, attempt), cap);
  const jitter = Math.random() * exp * 0.3; // Β±30%
  return exp + jitter;
}

FoxReload uses exactly this algorithm: attempts 1–8 spread across a 30s β†’ 24h window. The 6-hour cap prevents a single webhook from monopolising a worker, and Β±30% jitter smooths retry spikes across the fleet.

2. Dead-letter queue (DLQ)

After 8 failed attempts, the event must go to a DLQ, not be lost. The production pattern is a dedicated queue with manual replay:

// Express + BullMQ
app.post('/webhook/foxreload', async (req, res) => {
  const sig = req.header('X-Foxreload-Signature');
  if (!verifyHmac(req.rawBody, sig, process.env.WEBHOOK_SECRET)) {
    return res.sendStatus(401);
  }
  const eventId = req.header('X-Foxreload-Event-Id');
  const fresh = await redis.set(`evt:${eventId}`, '1', 'EX', 86400, 'NX');
  if (!fresh) return res.sendStatus(200); // already processed
  await queue.add('process-event', req.body, {
    attempts: 8,
    backoff: { type: 'exponential', delay: 30000 },
    removeOnFail: false, // keep for DLQ inspection
  });
  res.sendStatus(200);
});

DLQ events are reviewed by an on-call engineer: either replayed via POST /v1/webhooks/{id}/replay or closed out manually in admin.

3. Idempotency keys β€” non-negotiable

Webhook delivery is at-least-once, never exactly-once. Without idempotency, a single order.delivered event could decrement inventory twice. Use X-Foxreload-Event-Id as a natural idempotency key:

Storage Latency TTL Cost / 1M events
Redis SETNX <2ms 24h $0.40
Postgres UNIQUE index 5–8ms forever $0.10
DynamoDB ConditionExpression 8–12ms 24h $1.25

For most partners Redis SETNX is optimal: cheap, fast, and the TTL covers the FoxReload retry window (24h).

4. Alerting on >1% loss

The metric you must monitor is a rolling 5-minute webhook delivery success rate. If it drops below 99%, that's an incident. Prometheus rule:

- alert: WebhookDeliveryDegraded
  expr: |
    (sum(rate(webhook_received_total[5m]))
     - sum(rate(webhook_failed_total[5m])))
    / sum(rate(webhook_received_total[5m])) < 0.99
  for: 2m
  labels: { severity: page }

The alert routes to PagerDuty/Opsgenie, on-call inspects the DLQ and receiver logs. In 80% of cases the root cause is a recent receiver deploy with a regression β€” rollback resolves it in 5 minutes.

CTA

Full FoxReload webhook documentation, replay endpoints, and Prometheus metrics are available after onboarding β€” request API access.

Frequently asked questions

How many times does FoxReload retry a webhook?
Up to 8 attempts over 24 hours: 30s, 1m, 5m, 30m, 2h, 6h, 12h, 24h. After attempt 8, the event lands in your DLQ endpoint (if configured) and is marked delivery_failed=true in the admin log.
How do I deduplicate webhook deliveries?
FoxReload sends a unique X-Foxreload-Event-Id (UUIDv4) per request. Store it in Redis with a 24h TTL and check SETNX event_id 1 before processing. If the key exists, it's a retry β€” return 200 with no side effects.
What timeout should my receiver endpoint use?
Accept the webhook in under 2 seconds (FoxReload waits 10s, but we recommend reply-fast/work-async). Persist the payload to a queue (SQS, Redis Streams, BullMQ) and return 200 immediately. Run heavy logic in workers.
How do I monitor webhook delivery health?
Track a rolling 5-minute success rate. Page if it drops below 99%. FoxReload exposes GET /v1/webhooks/stats with p50/p95/p99 latency and 24h failure rate β€” scrape it with Prometheus via blackbox-exporter.
Get FoxReload API access

Related articles