Strategy

Webhook-arkkitehtuuri luotettaviin integraatioihin

8. toukokuuta 2026Empirium Team10 min read

Read in:en fr es de it pt nl pl ru zh ja ko ar hi tr sv no da fi cs

Webhooks are the backbone of real-time integrations. When a customer pays on Stripe, a webhook notifies your CRM. When a support ticket closes in Zendesk, a webhook updates your customer health score. When a pull request merges on GitHub, a webhook triggers a deployment.

Simple concept. Deceptively difficult to get right.

The problem with webhooks is that they fail silently. A dropped webhook doesn't throw an error in your application — it just doesn't arrive. Data goes missing. Records fall out of sync. And nobody notices until a customer complains or a quarterly reconciliation reveals discrepancies that have been accumulating for months.

Here's how to build webhook infrastructure that doesn't lose data.

Why Webhooks Fail Silently

Webhooks fail at four points, and each failure mode is invisible by default.

1. Network Failures

The sender fires a POST request to your endpoint. If your server is down, slow, or returns a non-2xx status code, the webhook fails. Most providers retry — but retry behavior varies wildly:

Provider	Retry Attempts	Retry Window	Backoff Strategy
Stripe	16	3 days	Exponential
GitHub	3	~1 hour	Fixed intervals
HubSpot	10	24 hours	Exponential
Slack	3	~30 minutes	Exponential
Shopify	19	48 hours	Exponential

If your server has a 30-minute outage and you're receiving GitHub webhooks, you'll lose events permanently after 3 retries. Stripe gives you 3 days of retries — far more forgiving. Your architecture needs to account for the least forgiving provider in your stack.

2. Timeout Failures

Most webhook providers expect a response within 5-30 seconds. If your endpoint does heavy processing before responding — querying a database, calling another API, running business logic — it will timeout. The provider interprets the timeout as a failure and retries, potentially causing duplicate processing.

3. Payload Validation Failures

Webhook payloads change. A provider adds a new field, changes a field type from string to integer, or nests data differently in a new API version. If your parsing code is strict, it breaks on the new payload shape and silently drops events.

4. Ordering and Duplication

Webhooks arrive out of order. A payment.succeeded event might arrive before payment.created. Retry storms can deliver the same event multiple times. Without idempotency handling, you process events twice — creating duplicate records, sending duplicate emails, or charging customers double.

The Reliable Webhook Architecture

The core principle: acknowledge immediately, process asynchronously. Your webhook endpoint should do exactly two things: verify the signature and enqueue the payload. Everything else happens in a background worker.

                    ┌──────────────┐
  Provider ──POST──→│   Endpoint    │
                    │ 1. Verify sig │
                    │ 2. Enqueue    │
                    │ 3. Return 200 │
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │  Queue (SQS/ │
                    │  Redis/Bull) │
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │    Worker     │
                    │ 1. Dedup     │
                    │ 2. Process   │
                    │ 3. Log       │
                    └──────────────┘

Signature Verification

Every reputable webhook provider signs payloads with an HMAC or asymmetric signature. Verify it before processing. This prevents forged webhook attacks — an attacker who knows your endpoint URL could send fake events.

Stripe uses HMAC-SHA256. GitHub uses HMAC-SHA256. HubSpot uses HMAC-SHA256 with a client secret. The verification logic is different for each provider but the principle is the same: compute the expected signature from the raw request body and your secret key, then compare with the signature header.

Never skip this step, even in development. A habit of ignoring signatures leads to production endpoints that don't verify.

Queue-First Processing

The endpoint's only job is to put the raw payload into a message queue and return 200. This ensures:

Fast response times. The endpoint responds in <100ms, well within any provider's timeout window.
No data loss. If the worker is down, events accumulate in the queue and process when the worker recovers.
Retry capability. Failed worker jobs can be retried from the queue without asking the provider to resend.
Rate limiting. The worker can process at whatever pace your downstream systems handle, regardless of incoming webhook volume.

Queue options:

Queue	Best For	Complexity	Cost
Redis + BullMQ	Small-medium volume, self-hosted	Low	$0 (self-hosted)
AWS SQS	High volume, managed	Medium	~$0.40/million messages
RabbitMQ	Complex routing, self-hosted	Medium	$0 (self-hosted)
Google Cloud Pub/Sub	High volume, GCP stack	Medium	~$0.40/million messages

For most B2B applications handling under 100,000 webhooks/day, Redis + BullMQ is the simplest choice. It runs on the same server as your application, requires no additional infrastructure, and provides retry logic, dead-letter queues, and job monitoring out of the box.

Idempotency

Every webhook payload should be processed exactly once, regardless of how many times it's delivered. The implementation:

Extract the event ID from the payload (most providers include one — Stripe's evt_xxx, GitHub's X-GitHub-Delivery header).
Before processing, check if this event ID exists in your processed events store (a database table or Redis set).
If it exists, skip processing and return success.
If it doesn't exist, process the event and record the event ID.

Use a database unique constraint or Redis SET NX for atomic check-and-insert. Don't use a check-then-insert pattern — it's vulnerable to race conditions when duplicate webhooks arrive simultaneously.

Dead-Letter Queue

When a webhook event fails processing after exhausting retries (typically 3-5 attempts), it moves to a dead-letter queue (DLQ). The DLQ stores failed events for manual inspection and reprocessing.

Critical: alert on DLQ depth. A growing DLQ means events are failing systematically, not transiently. Common causes: schema changes in the provider's payload, expired API credentials in your processing logic, or a bug introduced in a recent deployment.

Common Provider Patterns

Each webhook provider has quirks that your integration code must handle.

Stripe

Stripe's webhooks are the gold standard. Event objects are versioned, payload structure is consistent, and the dashboard includes a webhook event log with replay capability. Always pin your Stripe API version and listen for api_version in webhook events to detect mismatches.

Key pattern: Stripe sends notification events, not complete data. A customer.subscription.updated event tells you something changed — it doesn't include the full diff. Your worker should fetch the current state from the Stripe API rather than relying solely on the webhook payload.

GitHub

GitHub webhooks include the event type in the X-GitHub-Event header and a unique delivery ID in X-GitHub-Delivery. The payload structure varies significantly between event types. Plan to handle at least push, pull_request, issues, and workflow_run if you're building CI/CD integrations.

HubSpot

HubSpot batches webhook events. A single webhook request can contain multiple events. Your endpoint must iterate over the batch rather than processing the payload as a single event. HubSpot also rate-limits webhook creation — you can only subscribe to 1,000 subscriptions per app.

Slack

Slack webhooks require URL verification — when you register a webhook endpoint, Slack sends a url_verification challenge that your endpoint must echo back. Slack also enforces a 3-second response timeout, making queue-first processing mandatory.

Monitoring and Debugging

Logging Strategy

Log every webhook event at three points:

Receipt. Log the raw payload (redact sensitive fields like payment info), event type, provider, and timestamp. This is your audit trail.
Processing. Log what the worker did — records created, updated, or skipped. Include the event ID for correlation.
Failure. Log the full error, stack trace, and the event that triggered it. Include enough context to reproduce the failure.

Store webhook logs for at least 90 days. When a data discrepancy surfaces, you'll need to replay the timeline of events to diagnose the root cause.

Replay Capability

Build the ability to replay any webhook event from your logs. This means:

Storing the raw, unmodified payload
A replay endpoint or script that feeds stored payloads through your worker
Idempotency handling that allows safe replay without duplicate side effects

This capability is invaluable during incident response. When a bug in your worker corrupts data for 48 hours, you can fix the bug, clear the affected records, and replay every webhook from the past 48 hours.

Alerting Rules

Alert	Condition	Severity
Endpoint down	No webhooks received in 15 minutes (during business hours)	Warning
Processing lag	Queue depth > 1,000 events	Warning
Failure spike	Error rate > 5% in a 5-minute window	Critical
DLQ growth	Dead-letter queue depth > 0	Critical (investigate immediately)
Signature failure	Any verification failure	Critical (possible attack or config drift)

FAQ

When should we use webhooks vs polling? Webhooks for time-sensitive events where you need near-real-time notification (payments, user actions, CI/CD triggers). Polling for data that changes infrequently or where you need complete state snapshots (CRM records, inventory, analytics aggregates). If the provider offers both, prefer webhooks for freshness and polling for completeness — run a nightly reconciliation poll to catch anything webhooks missed.

How do we test webhook integrations locally? Use ngrok or Cloudflare Tunnel to expose your local development server to the internet. Configure the provider to send webhooks to your tunnel URL. For automated testing, mock the webhook payloads in your test suite and send them directly to your endpoint handler.

What's the maximum payload size we should expect? Most providers limit payloads to 1-5 MB. Stripe payloads are typically under 50 KB. GitHub payloads can reach several MB for large push events. Configure your web server and queue to handle the maximum expected size. Reject payloads over 10 MB as a safety measure.

How do we handle webhook migrations when switching providers? Run both webhook endpoints simultaneously during the transition. Process events from both with deduplication to prevent doubles. Once the new provider is confirmed working (monitor for 1-2 weeks), deactivate the old webhook endpoint. Never cut over in a single step.

Webhook architecture done right is invisible — events flow, data stays in sync, and your team never has to think about it. Done wrong, it's a source of silent data corruption that compounds for months. The investment in queue-first processing, idempotency, and monitoring pays for itself the first time it prevents a data incident. Empirium builds reliable integration infrastructure for B2B operators — let's discuss your integration needs.

Why Webhooks Fail Silently

1. Network Failures

2. Timeout Failures

3. Payload Validation Failures

4. Ordering and Duplication

The Reliable Webhook Architecture

Signature Verification

Queue-First Processing

Idempotency

Dead-Letter Queue

Common Provider Patterns

Stripe

GitHub

HubSpot

Slack

Monitoring and Debugging

Logging Strategy

Replay Capability

Alerting Rules

FAQ

Related Reading

From Other Pillars

Explore More

Moderni markkinointioperaatioiden stack: Referenssiarkkitehtuuri

More in Strategy

Moderni markkinointioperaatioiden stack: Referenssiarkkitehtuuri

CRM-integraatiomallit B2B-myyntitiimeille

Sähköposti-infrastruktuuri joka oikeasti toimittaa

Operaattorin opas: HubSpot vs Salesforce vs Pipedrive

From Other Pillars

TypeScript frontendissä: Investoinnin arvoinen?

Kansainvälinen SEO vuonna 2026: Operaattorin pelikirja monialueiseen sijoitukseen

Tuotanto-AI-agentin anatomia

Related Resources

Key Terms

Common Questions

Compare

Services

Industries

Need help with this?