Capture & manage leads

CRM: leads, lifecycle & AI classification

Nukipa includes a lightweight CRM. It's where every lead a customer's site captures lands — form submissions, imports, and API pushes — and it tracks who those people are, where they are in the funnel, and what's happened to them. It's deliberately small: contacts are the developed part, with companies and deals as thin supporting tables (more on that at the end).

This article covers the data model you actually work with: the two parallel state axes on a contact (lifecycle stage vs disposition status), the AI fit/intent classification that scores incoming leads, the append-only activity timeline, and the current shape of companies and deals.

[!NOTE] The CRM is an internal service (@nukipa/crm, schema crm, loopback port 3010). You don't talk to it directly. Agents and the dashboard reach it through the platform — e.g. the MCP tools crm_list_contacts, crm_get_contact, crm_set_contact_stage. Field and enum names below are the real database names; the tool names are how you act on them.

Two state axes: stage vs status

A contact carries two independent state fields. This is the single most important thing to understand about the CRM, because it's easy to assume they're the same axis.

stage is the lifecycle — where the contact sits in the funnel. It's coarse and roughly monotonic.
status is the disposition — what's happening with them right now inside their current stage. It's the working state a salesperson flips as they handle the lead.

They are stored as separate columns, each with its own CHECK constraint, and they move independently. A contact in stage lead can have status working; an mql can be unqualified; an sql can be back in nurture.

Stage (lifecycle) — `stage`

lead → mql → sql → customer
                 ↘ disqualified

Stage	Meaning
`lead`	Default. Every new contact starts here.
`mql`	Marketing-qualified — looks like a fit, not yet sales-ready.
`sql`	Sales-qualified — handed to / accepted by sales.
`customer`	Closed-won. A terminal stage.
`disqualified`	Not a fit, ever. The other terminal stage.

The progression isn't enforced — there's no state machine rejecting lead → customer. The DB CHECK validates only that the value is one of the five. Promotion is a column update via the dedicated route (crm_set_contact_stage), which also stamps stage_changed_at and writes a stage_change activity.

[!NOTE] Moving a contact to customer or disqualified cancels any in-flight nurture enrollments (best-effort). The two terminal stages are treated as "this person is done in the funnel."

Status (disposition) — `status`

Status	Meaning
`new`	Default. Untouched since it arrived.
`working`	Someone is actively on it.
`contacted`	Outreach has gone out.
`qualified`	Confirmed worth pursuing.
`unqualified`	Confirmed not worth pursuing (the disposition counterpart to the `disqualified` stage).
`nurture`	Parked in a longer-term drip.
`converted`	Disposition-level "done / won."

Set via crm_set_contact_status, which writes a status_change activity. Two behaviours are wired to specific status values:

Setting status to unqualified with a reason also writes that reason to the contact's disqualified_reason column, keeping the contact row and the activity row in sync. No other status persists a reason onto the contact.
Setting status to unqualified cancels active nurture (best-effort). No other status transition does — a lead can be moved to working while a sequence keeps running, by design.

[!WARNING] disqualified (a stage value) and unqualified (a status value) are different things on different columns. They commonly travel together when you reject a lead, but nothing forces that. Filter on the right column for what you mean.

AI fit/intent classification

When a lead arrives, the CRM can score it with an LLM. The scoring is a two-number split (the migration calls it the "modern HubSpot v2 split"):

score_fit (0–100) — firmographic match. Does this contact look like a target buyer? Judged against the tenant's ICP and industry.
score_intent (0–100) — behavioural signal. Does their message and activity show demand for what the company actually sells? Calibrated against the tenant's products, USP, and industry.

Both are smallint columns with a 0–100 CHECK constraint, and both are nullable — a missing score is a real, meaningful state (see below), not a zero.

What triggers a classification

Trigger	How
New lead from a website form	The CMS form-submission path creates the contact with `classify: true`, which enqueues a classify job right after insert.
Manual	The "Classify" button in the lead side panel, or `crm_create_contact` with classification, or a direct re-classify call.
Agent	Any agent enqueuing the job for a contact id.

It's fire-and-forget: the enqueue returns a job_id immediately and a background worker (queue crm.classify-contact) does the work. If the jobs service isn't configured in an environment (SERVICE_JOBS_URL unset), the enqueue is a silent no-op and the rest of the CRM keeps serving reads and writes.

The pipeline

The worker runs five steps, reporting progress at each:

Load context (5%) — the contact plus its most recent 20 activities, for behavioural signal.
Load tenant grounding (20%) — pulls the tenant's context documents from the context service by kind: profile (company profile, preferring /profile/company.md), industry, product, icp (every ICP doc — global plus campaign-scoped — concatenated), and usp. Fetching by kind rather than by hardcoded path keeps it resilient to doc paths moving.
Ask the classifier (40%) — one strict-JSON LLM call. Default model claude-sonnet-4-6 (override with CRM_CLASSIFY_MODEL), temperature 0.2, JSON-object response format, 1500-token cap.
Parse + clamp (80%) — strips stray markdown fences, parses, then coerces every field. Out-of-range or wrong-typed values become null rather than throwing — calibrating an unreliable model is the worker's job, not the database's. (Malformed JSON that won't parse at all does throw and fails the run.)
Apply (95%) — writes a contact_classifications audit row, denormalises the scores onto the contact, and emits a classification activity with actor_kind = 'agent'.

The classifier returns this envelope:

{
  "score_fit": 78,
  "score_intent": 64,
  "is_spam": false,
  "suggested_stage": "mql",
  "suggested_status": "qualified",
  "next_best_action": "Book a discovery call",
  "factors": {
    "fit":    { "icp_match": "...", "reasoning": "..." },
    "intent": { "signals": ["..."], "reasoning": "..." }
  }
}

suggested_stage / suggested_status are validated against the same stage/status enums above (anything else becomes null). next_best_action is a short imperative string — the prompt asks for <= 12 words, and the worker hard-truncates it to 200 chars and coerces it to null if the model returns a non-string. factors is the model's free-text justification.

[!NOTE] suggested_stage and suggested_status are suggestions only. The classifier does not move the contact — it records what it would do. Acting on a suggestion is a separate, explicit crm_set_contact_stage / crm_set_contact_status call by a human or agent.

Missing context is a first-class outcome

If the tenant hasn't populated ICP / company-profile context (or the context service is unreachable), the worker does not guess. It returns score_fit = null and explains the gap in factors.fit.reasoning. The prompt explicitly instructs the model to be calibrated — not to score above 70 without explicit evidence in the inputs. So a null score_fit means "we couldn't responsibly score fit," not "bad fit." Treat it that way in filters and dashboards.

Denormalisation: latest run wins (best-effort)

Every classification run is an immutable row in crm.contact_classifications (model, both scores, is_spam, suggestions, factors, next_best_action, timestamp). That table is the audit trail — full history, newest first.

The latest run's score_fit and score_intent are also copied onto the crm.contacts row. This is purely so list views and filters stay cheap — you can sort the leads table by score_fit desc (there's an index for exactly that) without joining the classification history. Three things to know about the copy:

score_* on a contact are owned by the worker — they're absent from the create/upsert/PATCH allow-lists, so client writes to them are silently dropped (no error; the field is just ignored). A re-classify overwrites them with the newest values.
A score is only copied down when it's a real number — a run that returns null for fit leaves the existing denormalised value in place rather than clobbering it.
The denorm update is best-effort: its error is swallowed, so a failed copy does not fail the classification. The audit row and the classification activity can land while the contact's score_* lag behind the latest run. The contact_classifications table is the source of truth; the denormalised columns are a convenience that can occasionally be stale.

[!WARNING] The apply step is not transactional. It does three sequential writes — the classification row, the denorm update, then the activity — with no surrounding transaction. If the worker fails between them you can end up with a partial state (e.g. an audit row but no denorm, or no activity). Cancellation doesn't help here either: see below.

Cancellation

The worker checks ctx.cancelSignal.aborted before the LLM call, before parsing, and before the apply step — i.e. up to the parse phase. A cancel landing in that window stops the run before any writes (the handler just returns). Once the worker reaches the apply phase (95%), cancellation is not honoured — there's no check between "saving" and the three writes, so the run finishes persisting. Treat cancel as "stops cleanly only if it lands before persistence begins."

Spam handling

is_spam is a boolean the classifier sets for obvious bot/junk submissions. A spam contact is not deleted — it stays in the CRM. The only behavioural difference: spam contacts are not enrolled in nurture.

The exact trigger matters. After applying the classification, the worker hands off to the nurture matcher unless is_spam === true:

if (result.is_spam !== true) {
  await enqueueNurtureMatch(req, contactId).catch(/* best-effort */);
}

So nurture is skipped only when is_spam is explicitly true. If the model returns a non-boolean (or omits the field), normaliseResult sets is_spam = null, and null !== true, so an unknown / unparseable spam verdict is treated as not-spam and does proceed to the matcher. There is no third "quarantine" state — it's spam (skip) or anything-else (enrol). The handoff itself is best-effort — a missing jobs service or a matcher failure never retroactively fails a classification that already saved.

The activity timeline

Every contact has an append-only timeline in crm.contact_activities. It's the contact's history — read it to understand a lead without reconstructing state. Newest-first, fetched via crm_get_contact_activities.

"Append-only" is enforced at the database: the table has select and insert RLS policies and deliberately no update or delete policy. There is no edit path and no delete path in the service either. Each insert also bumps the contact's last_activity_at (best-effort — a failed bump doesn't invalidate the activity row).

Activity kinds

The kind column is text not null — there is no DB CHECK on it (unlike stage, status, and actor_kind, which are all DB-constrained). The allow-list lives in app code (emitActivity rejects anything outside ACTIVITY_KINDS). These are the kinds the CRM service itself emits:

Kind	Emitted when	Body
`stage_change`	Lifecycle moved.	`{ from, to, reason }`
`status_change`	Disposition moved.	`{ from, to, reason }`
`owner_change`	Owner reassigned.	`{ from, to }`
`tag_add` / `tag_remove`	Tags mutated.	`{ tags: [...] }`
`note`	A free-text note.	`{ text }`, truncated to 4000 chars
`field_update`	A scalar PATCH.	`{ changed: [keys] }`
`classification`	An AI classification run completed.	scores, suggestions, `next_best_action`, model

Three more kinds are declared in the ACTIVITY_KINDS enum but not emitted by any code in the CRM service — they're reserved for external callers:

Reserved kind	Intended emitter
`form_submission`	The CMS form forwarder (not the CRM).
`sync_in` / `sync_out`	A future external-CRM sync worker (HubSpot/Salesforce). No sync worker ships today — these are aspirational.

The nurture kinds (nurture_match_result, nurture_send, nurture_cancel) are likewise declared here but written by the newsletters service, not the CRM.

actor_kind: who did it

Each activity records who acted. This column is DB-CHECK-constrained to one of three values:

user — a real person in the dashboard. The activity also stores their actor_user_id.
system — deterministic automation. Service callers (e.g. the CMS forwarding a public form submission, role service:cms) default to this, with a null actor_user_id.
agent — an LLM. The classify worker passes this explicitly; that's how a classification row is attributed to the model rather than to a person or to plain automation.

[!TIP] The timeline is published over Supabase Realtime, and the contact row is too (both added to supabase_realtime with replica identity full). The dashboard's lead side panel subscribes, so the timeline updates live as the classify worker writes — no reload. If you build on this data, prefer the realtime subscription over polling.

Companies & deals (basic in v1)

Contacts are the developed part of the CRM. Companies and deals exist but are intentionally thin today — enough to attach a contact to an organisation and track a stub opportunity, not a full sales-ops system.

Companies

crm.companies holds name, domain (a citext, unique per tenant), industry, and size. A contact links to one via company_id.

The one piece of real logic: when a contact is created or edited with a company_name string (e.g. a visitor typed their company into a website form), the CRM resolves it to a company row — case-insensitive name match within the tenant, otherwise it inserts a new company — and links it. Without this, form leads would land with no company and show "—" in the list. Edge cases worth knowing: an empty/whitespace company_name resolves to null (clears the link), and a non-string value is treated as "no change" (leaves the existing link untouched). A new company is inserted with the name only — no domain.

The HTTP surface is a subset: list and create only. GET /companies/:id, PATCH, and DELETE are marked TODO in the code and not implemented yet.

Deals

crm.deals is, in the migration's own words, a stub. The table has name, amount, currency (default EUR), stage, contact_id, company_id, owner_user_id, and expected_close. But:

The surface is GET (list) and POST (create) only — no get-by-id, update, or delete (all TODO).
stage is a free-text column with no enum and no constraint. A new deal defaults to "new" (applied in the service layer, not the DB), but nothing validates or governs deal stages. There is no pipeline model.
Deals are not wired into classification, the activity timeline, or nurture.

[!NOTE] If you're evaluating Nukipa for deal/pipeline management: not yet. Deals today are a placeholder for "record an amount against a contact." The funnel logic lives entirely on the contact's stage/status, not on deals.

Reference: a contact's key fields

Field	Type	Notes
`email`	citext	Required. The natural key for upsert (per tenant).
`stage`	text + CHECK	`lead`/`mql`/`sql`/`customer`/`disqualified`. Default `lead`.
`status`	text + CHECK	`new`/`working`/`contacted`/`qualified`/`unqualified`/`nurture`/`converted`. Default `new`.
`disqualified_reason`	text	Set when status → `unqualified` with a reason.
`score_fit`	smallint 0–100	Denormalised from latest classification. Worker-owned, nullable; client writes silently dropped.
`score_intent`	smallint 0–100	Same.
`owner_user_id`	uuid	Null = unowned. Service-created leads land unowned.
`tags`	text[]	Mutated via the tags route.
`source` / `source_form_id`	text / uuid	Attribution, e.g. `form:contact`.
`utm_*`, `referrer_url`, `landing_url`, `ip_country`	text	First-class attribution columns.
`do_not_contact` / `email_opted_out`	boolean	Flipping either false→true cancels in-flight nurture; flipping back does not re-enrol.
`external_ids`	jsonb	Per-system upstream IDs for sync, e.g. `{ "hubspot": "..." }`.
`custom`	jsonb	Tenant-defined extension fields. Not directly filterable yet.
`last_activity_at` / `stage_changed_at`	timestamptz	Denormalised for list sorting.

FAQ

Why two state fields instead of one? Because lifecycle and disposition answer different questions. "Where are they in the funnel" (stage) is a coarse, mostly-forward axis. "What's the rep doing about them right now" (status) churns within a stage. Collapsing them loses information — you couldn't express "this sql is back in nurture."

A lead has score_fit: null. Is it a bad fit? No. Null means the classifier couldn't responsibly score fit — usually because the tenant hasn't populated ICP/company context. The reason is in factors.fit.reasoning. A genuinely bad fit gets a low number, not null.

Can I set score_fit myself? No. Scores are owned by the classification worker. Create/upsert/PATCH silently drop them — there's no error, the field is just ignored. Run a classification instead; the latest run's scores get denormalised onto the contact.

Does classification change the contact's stage or status? No. It produces suggested_stage / suggested_status and records them in the audit row, but it never moves the contact. Applying a suggestion is a separate explicit call.

If a contact's is_spam is unknown, does it get nurtured? Yes. Nurture is skipped only when is_spam === true. An unparseable or missing spam verdict normalises to null, which is treated as not-spam, so the contact is handed to the nurture matcher.

Can I edit or delete an activity? No. The timeline is append-only by database design — no update/delete policy, no service path. To correct the record, add a new activity (e.g. a note).

What model does classification use? claude-sonnet-4-6 by default, overridable per environment via CRM_CLASSIFY_MODEL. The model id is recorded on every classification row, so you always know which model produced a given score.

Can I run a real sales pipeline on deals? Not yet. Deals are a stub: create/list only, free-text stage with no validation, no pipeline logic, and not connected to classification or nurture. Use the contact's stage/status for funnel state today.

Served live from the platform · /docs/crm-lifecycle-and-classification