Steeren/Deliverability & metricslive from the platform← site
Engage your audience

Deliverability & metrics

This article covers what happens to a newsletter issue after you hit send: how Nukipa talks to Resend, how opens/clicks/bounces/complaints flow back in, how hard signals (bounces and complaints) suppress an address, how sender-domain verification and reputation work, where live metrics come from, and why an issue can sit in scheduled longer than you'd expect.

Email sending is handled by the newsletters service. It never calls Resend directly — every send and every webhook routes through the platform gateway. Resend is the email provider underneath.

The lifecycle of one issue

An issue moves through these statuses:

Status What it means Deletable?
draft Editable. Carries body_markdown (your editor source) and body_html (the rendered cache the worker emails). A brand-new draft can have neither populated until first save — body_html is required to send or test-send, and the routes return 400 if it's empty. Yes
scheduled A send-at time is set (scheduled_for). Not yet enqueued — the in-process scheduler picks it up. Yes
sending A send job is running (or about to). No (409)
sent The send finished. Kept permanently for the audit trail (deliveries reference it). No (409)
failed The send was blocked or errored (e.g. unverified sender domain). Yes

The delete guard only blocks sent and sending. Drafts, scheduled, and failed issues can all be deleted.

There are two ways to initiate a send (a third write to sending comes from the scheduler itself — see below):

  • Send nowPOST /issues/:id/send flips the issue to sending and enqueues the newsletters.send-issue job immediately. Requires both subject and body_html.
  • SchedulePOST /issues/:id/schedule with {scheduled_for} (ISO) just sets status='scheduled'. It does not enqueue anything. An in-process scheduler does that later (see below).

Why a send can sit in scheduled

Scheduling does not put a job on the queue. A timer inside the worker process scans for due issues and dispatches them. So there's an inherent lag, and a few ways a send can stall:

  1. Scheduler tick interval. The scheduler runs setInterval every 30 seconds, scanning for issues where status='scheduled' AND scheduled_for <= now(). A schedule set for "right now" waits up to 30s (the worker also runs one scan ~2s after boot, so a fresh restart picks things up quickly). The scheduler's own UPDATE is what flips a due scheduled row to sending — that's the third path into sending, distinct from the send-now route.
  2. Per-tick cap. Each tick claims at most 25 issues (MAX_PER_TICK). If you scheduled 100 issues for the same minute, they go out over several ticks, not all at once. This is deliberate — it bounds the blast radius. Issues over the cap aren't dropped; they're deferred to the next tick.
  3. Jobs service not configured. The scheduler enqueues through the jobs service. If SERVICE_JOBS_URL is unset (common in local dev), getJobsClient() returns null and the scheduler simply no-ops — your scheduled issue sits in scheduled until jobs is configured. No harm done; the row is untouched.
  4. Enqueue failure rollback. If the scheduler claims an issue (flips it to sending) but the enqueue call throws, it rolls the issue back to scheduled so the next tick retries. During that window you may briefly see sending, then scheduled again.

[!WARNING] Send-now does NOT roll back when jobs is unconfigured. Unlike the scheduler, POST /issues/:id/send flips the issue to sending before it checks for the jobs client. If SERVICE_JOBS_URL is unset, the route still returns 202 with job_id: null — but the issue is now stranded in sending permanently. There's no rollback on this path, no job was enqueued, and a sending issue can't be deleted (409). Configure SERVICE_JOBS_URL before using send-now, or you'll have to fix the stuck row out-of-band.

[!NOTE] The scheduler is a poll, not a cron. The race-safe claim is UPDATE … WHERE status='scheduled' — if two worker processes pick the same row, only one flips it and dispatches; the other updates zero rows and skips. Safe under blue/green deploys, but the trade-off is the up-to-30s latency.

If you need to cancel a scheduled send, POST /issues/:id/unschedule reverts it to draft and clears scheduled_for. It returns 409 if the issue isn't currently scheduled.

The Resend webhook: where opens/clicks/bounces/complaints come from

The send worker dispatches the batch and does not wait for delivery outcomes. Everything after "Resend accepted the message" arrives asynchronously through one endpoint:

POST /public/webhooks/resend

The body shape mirrors Resend: { type, data: { email_id | id, ... } }.

[!IMPORTANT] Signature verification happens at the gateway, not in this service. The gateway terminates Resend's svix-signed webhooks, verifies them, and forwards pre-verified events. The newsletters service trusts the gateway and does not re-verify. This is why the route lives under /public/* with no internal-token auth.

On each event the handler:

  1. Looks up the deliveries row by resend_message_id (the per-recipient id the send worker stored). The tenant is read from that row.
  2. Archives the raw event into newsletters.events — append-only, written when a delivery row matches. So if you ever need to debug, the raw payload is on record.
  3. Patches the delivery's state and the matching *_at timestamp.

Event-to-state mapping:

Resend event deliveries.state Timestamp set Error written
email.sent sent sent_at
email.delivered delivered delivered_at
email.opened opened opened_at
email.clicked clicked clicked_at
email.bounced bounced bounced_at errordata.error (or the whole data)
email.complained complained error ← the whole data

A few honest limitations:

  • State is last-write-wins, not a count. deliveries.state is a single column. If a recipient opens then clicks, the row ends on clicked. There's no per-recipient open/click counter — just the latest state and the individual *_at timestamps.
  • No complained_at column. A complaint sets state='complained' and stamps the error blob, but there's no dedicated complaint timestamp on the delivery row.
  • Bounce and complaint store the error slightly differently. A bounce writes error = data.error (falling back to the whole data payload if data.error is absent); a complaint always writes the whole data blob. Minor, but worth knowing if you parse deliveries.error.
  • The same webhook also feeds nurture-sequence sends (nurturing.sends) when the message id matches there instead. That's covered in the nurturing docs; the vocabulary is the same.

Hard signals flip subscription status and suppress the address

A bounce or a complaint is a hard signal — these are the events that damage sender reputation with the receiving provider (Gmail, Yahoo, etc.). When the webhook sees email.bounced or email.complained and the delivery has a subscription_id, it does two things beyond patching the delivery:

  1. Flips the subscription status to bounced or complained — but only on the one subscription this delivery belongs to. Since the normal audience query only includes status='active' subscribers, that address is excluded from the next send to this publication automatically. A sibling publication where the same email is still active is not touched by this status flip — the suppression list (below) is what protects siblings.
  2. Upserts the address into newsletters.suppression_list — a per-tenant "do not send" list, keyed (tenant_id, email), with reason set to bounced or complained.

[!NOTE] The suppression-list schema allows a third reason: manual. The webhook only ever writes bounced / complained, but the table is built to also hold an operator-added entry (reason='manual'), and the send-worker audience filter explicitly accounts for that case. There's no route that writes manual today — it's a schema affordance, not a shipped feature.

The suppression list is what stops the address coming back through a side door:

  • On subscribe: the public subscribe route checks the suppression list first. A suppressed address gets 410 Gone with a clear "this address is on our suppression list and can't be re-subscribed" message — so a bounced address can't re-subscribe to a different newsletter in the same tenant and get re-bounced.
  • On send: the send worker filters the resolved audience against the suppression list as a belt-and-suspenders check, even though bounced/complained subs are already excluded by the status='active' filter. This covers the edge case where an address is still active on one publication but suppressed via a sibling.

[!WARNING] The suppression filter is best-effort, not a hard guarantee. If the suppression-list query errors during a send, the worker fails open — it returns the unfiltered audience and sends anyway, rather than blocking the whole send on a transient DB error. The subscribe-side check is similarly soft: a failed lookup there is swallowed and subscription proceeds. The status='active' exclusion is the primary guard; suppression is the secondary net.

[!NOTE] Suppression is per-tenant, not global across tenants. Someone who blocked Tenant A's mail may legitimately want Tenant B's. The reputation problem is per-sender-domain, so the scope follows the tenant.

When the suppression table was first added, the migration backfilled every existing bounced / complained subscription into the list (on conflict do nothing, tagged notes='backfilled…'), so addresses that went bad before the table existed are covered too.

You can also create these signals manually from the admin side:

  • POST /newsletters/:id/subscribers/:subId/unsubscribe — admin manual unsubscribe. Flips the subscription to unsubscribed, stamps unsubscribed_at, stores the reason under custom_fields.admin_unsubscribe_reason, and does not send the user-facing unsubscribe confirmation email (this is operator-initiated).
  • POST /newsletters/:id/subscribers/:subId/mark-complained — admin manual complaint flag (flips to complained).

Note these admin actions flip the subscription status; the suppression-list upsert path shown above is driven by the webhook.

Verified sender domains

If a newsletter sets a custom from_email, the domain in that address has to be verified before Nukipa will send from it. This is enforced at send time.

The pre-send gate

Before the worker creates a send row or calls Resend, checkSenderDomain() runs. Acceptance order, first match wins:

  1. No from_email → allowed. The gateway falls back to a platform-owned sender, which is always verified on the platform's Resend account.
  2. Host is on PLATFORM_OWNED_SENDER_DOMAINS (an env list, e.g. nukipa.com,kibert.de) → allowed. These are verified once at the platform level; no per-tenant row needed.
  3. A row exists in newsletters.verified_domains with status='verified' for this tenant and domain → allowed.
  4. Otherwise → the send is blocked.

On a block, the worker flips the issue to failed, writes a structured reason into metrics, and no send row is created (nothing was attempted at Resend). The reason is human-readable so the dashboard can show you exactly what to do:

reason Meaning
invalid_from_email The from_email isn't a parseable address.
domain_not_added The domain has no row — add it under Newsletters → Settings → Sender domains.
domain_status_<status> The domain row exists but isn't verified yet (e.g. domain_status_verifying). Publish the DNS records and click "Check now".
gate_schema_missing In production, the verified_domains table or a column is missing (Postgres 42P01 / 42703). The gate fails closed. Apply the newsletters migration.
gate_query_failed Any other DB error while verifying the domain. The send is blocked.

[!WARNING] In production the gate fails closed if the verified_domains table or a column is missing (schema error codes 42P01 / 42703) — a dropped table or a permission glitch can't silently let unverified domains ship. You'll see reason: gate_schema_missing in that case. In dev/staging it fails open by default so a lagging migration doesn't block testing. Override with NEWSLETTERS_GATE_FAIL_OPEN_ON_SCHEMA (true/false).

[!NOTE] A gate-failed issue's metrics is not {sent, failed}. On a block, issues.metrics is overwritten with { error: 'sender_domain_not_verified', reason, from_email }. So GET /issues/:id/metrics on a failed issue returns those keys spread in instead of the dispatch counters — see the metrics section below.

Adding and verifying a domain

The flow mirrors Resend's own:

  1. POST /domains with { domain, region? } — inserts a row (status='pending'), then calls Resend's POST /domains through the gateway and mirrors the returned DKIM/SPF/DMARC records into dns_records. On success you get 201 with the verifying row. Publish those records at your DNS provider.
  2. If Resend provisioning fails after the row insert, the route does not error out — it returns 202 (not 201), leaves the row at status='pending' with last_error populated and warning: 'resend_provision_failed_retry_available'. The poller retries automatically, or "Check now" doubles as a retry-provision path (it re-runs POST /domains when the row never got a Resend id).
  3. A background poller checks pending/verifying/failed rows against Resend (TICK_MS = 2 minutes, with a first run ~5s after boot). It nudges Resend to re-run DNS verification and pulls the latest status in. DNS propagation usually takes 5–60 minutes, so the green "verified" pill appears on its own.
  4. POST /domains/:id/verify — the "Check now" button. Forces an immediate re-check instead of waiting for the next poll tick.

[!NOTE] The poller isn't a guarantee of "every row every 2 minutes." Each tick processes at most 30 rows (MAX_PER_TICK), and it skips any row checked within the last 60 seconds (MIN_CHECK_INTERVAL_MS). With a large backlog of pending domains, individual rows can wait longer than one tick. For a single domain you just added, "Check now" is the fast path.

Resend's status vocabulary is collapsed into a narrower lifecycle: verifiedverified, failedfailed, anything else (including pending) → verifying, so an unknown status never silently parks a row.

Deleting a domain (DELETE /domains/:id) is guarded:

  • 409 domain_in_use if any of the tenant's newsletters has a from_email that contains @…<domain>. This is a substring (ilike '%@<domain>%') match, not an exact host — deleting maltego.com will also count a newsletter sending from x@mail.maltego.com. Change the from-address first.
  • 409 domain_in_flight if a queued/running send is attributed to that domain right now. Wait for it to finish (or cancel the issue) — pulling the domain mid-batch would orphan the rest.

Reputation: tracked, lightly enforced

Beyond the binary verified/not-verified gate, each verified domain carries a reputation snapshot. When you list domains (GET /domains), each row is joined to the newsletters.v_domain_reputation view, which aggregates deliveries (through sends) by the sender domain the send used.

[!NOTE] The join is best-effort. If the view is missing (pre-migration), GET /domains still returns the domain rows with reputation: null and only logs a warning for non-42P01 errors. Don't assume reputation is always populated.

The view reports two windows — 7 days and 30 days — but note the window is anchored on send start time (sends.started_at), not event time. A bounce that lands today on a send that started 8 days ago counts in the 30d window but not the 7d window, even though the bounce itself is fresh. Read "7d" as "deliveries from sends started in the last 7 days," not "events in the last 7 days."

Field Window Notes
sent_30d, delivered_30d, bounced_30d, complained_30d, opened_30d, clicked_30d 30d Raw counts. sent_30d is the count of all delivery rows for sends in the window (the rate denominator), not the count of state='sent' rows.
sent_7d, delivered_7d, bounced_7d, complained_7d 7d Raw counts. No opened/clicked for 7d — the view only computes those four.
delivery_rate_30d, bounce_rate_30d, complaint_rate_30d 30d Percentages, 2 decimals.

Rates are null (not 0) when there are no sends in the window, so a fresh domain shows rather than a misleading "0% bounce". The view is security_invoker, so RLS scopes it to your tenant.

[!IMPORTANT] Enforcement nuance: reputation is tracked, not auto-enforced. Nukipa does not currently pause sends or block a domain because its bounce or complaint rate crossed a threshold — there is no such threshold in the code. The enforcement that does exist is binary and per-address: hard-signal events suppress the individual address (see above), and the send gate is verified/not-verified. The aggregate rates are there for you to watch. If a domain's bounce/complaint rate climbs, that's a manual signal to investigate your list, not something the platform throttles for you.

A send is attributed to a domain via sends.email_domain_id, set from the gate's verified-domain match. Sends from platform-owned domains (nukipa.com/kibert.de) and the system default sender leave it null — they fall into the "platform sender" bucket and don't roll up under a per-tenant verified domain.

Live metrics

There are two metric surfaces, and they don't drift — but they use the word "sent" to mean two different things, so read carefully.

  • issues.metrics — a small snapshot written once when the send completes: { sent, failed }. Here sent is the dispatch total: the count of messages Resend handed back a message id for (per-message id present → delivered; absent → failed; and if a whole batch returns null/shape-mismatch, the entire batch is marked failed). This is the dispatch outcome, not engagement. (For a gate-blocked issue this object is instead { error, reason, from_email } — see the sender-domain section.)
  • GET /issues/:id/metrics — returns metrics merged with a live block that is recomputed from deliveries on every call:
{
  "sent": 1200,
  "failed": 3,
  "live": {
    "delivered": 1180, "opened": 540, "clicked": 96,
    "bounced": 12, "complained": 1, "unsubscribed": 0,
    "sent": 8, "failed": 3
  },
  "recipient_count": 1203
}

Because live is derived from the delivery rows at read time, it tracks webhook updates without a separate rollup job — call it again and it reflects the latest opens/clicks/bounces.

[!WARNING] Top-level sent and live.sent are NOT the same number and are not comparable. Top-level metrics.sent is the dispatch total — every message successfully handed off to Resend at send time (1200 in the example). live.sent is the count of deliveries still sitting in state='sent' — accepted by Resend but with no email.delivered / email.opened / etc. webhook advanced them yet (8 in the example). As delivery/open/click webhooks arrive, rows leave sent and the live.sent count shrinks toward zero. The two numbers diverging is expected, not an error.

[!NOTE] The live counts are mutually exclusive by current state, because each delivery has exactly one state. A recipient who was delivered then opened then clicked counts once, under clicked — not in delivered and opened too. So live.delivered is "delivered and nothing further has happened", not "total delivered". recipient_count is the total number of delivery rows for the issue (the true reach denominator); treat opens/clicks as floors, since engagement that advanced further isn't separately counted under the earlier state.

For the per-recipient breakdown behind these numbers:

  • GET /issues/:id/deliveries?state=&q=&limit=&offset= — paginated delivery rows. Filter by state (delivered/opened/clicked/bounced/complained/unsubscribed/sent/failed/queued), search by email substring with q. limit ≤ 500.
  • GET /issues/:id/sends — the send-execution row(s), one per send-now, with rolled-up recipient_count / delivered / failed and timing.

Every issue is unsubscribe-compliant by construction

This isn't optional and you can't accidentally skip it. The send worker:

  • Injects an unsubscribe footer via ensureUnsubscribeFooter() — a no-op only if your body already contains {{unsubscribe_url}}, otherwise it appends a CASL/CAN-SPAM footer.
  • Mints a per-recipient unsubscribe JWT (it never expires, so old emails stay one-click-compliant) and sets List-Unsubscribe + List-Unsubscribe-Post: List-Unsubscribe=One-Click headers (RFC 8058).

These headers are part of what Gmail/Yahoo's bulk-sender rules (Feb 2024+) expect from large senders. The code grounds the headers being set on every message; the consequence of omitting them (throttling, rejection) is general deliverability behaviour on the receiving side, not something this service measures.

FAQ

My scheduled issue never sent. Why? Most likely the jobs service isn't configured (SERVICE_JOBS_URL unset) — the scheduler no-ops without it and the row stays in scheduled untouched. Otherwise check: is scheduled_for actually in the past, is the issue still status='scheduled', and did you schedule more than 25 issues for the same window (the per-tick cap defers the overflow to later 30s ticks — it doesn't drop them)? The scheduler also retries on the next tick if an enqueue failed.

My send-now issue is stuck in sending and won't delete. Why? You almost certainly ran send-now with SERVICE_JOBS_URL unset. That route flips the issue to sending before checking for the jobs client and doesn't roll back, so the issue is stranded with no job (and sending issues can't be deleted). Configure jobs and re-send through a fresh issue, or fix the stuck row out-of-band.

Why did my issue go to failed immediately with no recipients? The sender-domain gate blocked it. Check issues.metrics.reason — most commonly domain_not_added, domain_status_<status>, or invalid_from_email, but it can also be gate_schema_missing (production, missing table/column) or gate_query_failed (other DB error). No send row is created on a gate block because nothing was sent. Add and verify the domain, or clear the custom from_email to use the platform sender.

A subscriber bounced — will they get the next issue? No. The webhook flipped that subscription to bounced and added the address to the tenant suppression list. They're excluded from the active audience and can't re-subscribe (the subscribe route returns 410). Note the status flip only touches the one subscription the bounce came from; the suppression list is what protects sibling publications.

Do opens and clicks update without a page reload / a rollup job? The numbers themselves are recomputed from deliveries every time you call GET /issues/:id/metrics, so they're always current at read time. There's no scheduled rollup. (Whether the dashboard pushes that update to an open browser tab is a UI concern outside this service.)

Why do sent and live.sent disagree? They mean different things. Top-level sent is the dispatch total (handed to Resend). live.sent is the subset of deliveries still in state='sent' — accepted but not yet advanced by a delivered/opened/clicked webhook. live.sent falls toward zero as those webhooks arrive.

My bounce rate looks high — will Nukipa stop sending from that domain? No. Reputation rates are tracked and shown, but there's no automatic threshold that pauses sends. Per-address suppression is automatic; aggregate enforcement is on you. Watch bounce_rate_30d / complaint_rate_30d (windowed by send start time) and clean your list manually.

Why is the webhook unauthenticated? It isn't, really — Resend's svix signature is verified at the gateway, which forwards only verified events. The service-level route trusts the gateway. It still archives every matched event into newsletters.events, so nothing is lost for debugging.

Served live from the platform · /docs/deliverability-and-metrics