Writing posts: the editor, the AI writer, facts & citations
A post in Nukipa is a single markdown document with a small amount of structured metadata around it. You can write it entirely by hand, hand it to the AI writer, or do both — generate a draft, then edit it in place. This article covers the post lifecycle, what the editor stores, how the AI writer pipeline works (research, citations, fact spans), and how the fact-verification pass turns marked claims into checked verdicts.
The post lifecycle
Every post moves through a status field. Public reads only serve published posts, and they serve a snapshot, not the live draft you're editing.
| Status | Meaning |
|---|---|
idea |
Placeholder — title/topic exists, no body yet |
draft |
Being written or edited; not public |
review |
Draft finished by the writer, awaiting your review (this is where AI-generated posts land) |
scheduled |
Has a scheduled_for time; a cron sweep publishes it when due |
published |
Live on the site |
archived |
Retired |
The two important transitions:
- Publish snapshots the current draft — body, components, and sources — into an append-only
post_versionsrow, then points the post'scurrent_version_idat it. Public reads follow that pointer. So editing a published post changes the draft but not what visitors see until you publish again. - Unpublish clears the pointer and drops the post back to
draft. The public URL starts returning 404 immediately. Version history is kept.
[!NOTE] Publishing is always a manual click. The AI writer never auto-publishes — a generated post stops at
reviewso you can read it first. The only automatic publish path is the scheduler: a cron job (cms.scheduled-publish, every 5 minutes) sweeps posts inscheduledwhosescheduled_fortime has passed and publishes them.
You can also restore any past version: it overwrites the live draft from the snapshot (body, components, sources all re-inserted with fresh ids) and leaves the post in draft. You re-publish to make the restore live.
What the editor stores
The body is plain markdown with three kinds of embedded marker. Everything the renderer needs is keyed off these:
| Marker | What it is |
|---|---|
{{component:UUID}} |
A reference to a row in post_components — a CTA, callout, FAQ, chart, table, image, etc. |
{{cite:N}} |
A citation chip; N is the 1-based index into the post's post_sources |
{{fact}}…{{/fact}} |
A span wrapping a verifiable claim; each one becomes a row in post_facts |
The same markdown round-trips through the WYSIWYG editor (TipTap) without lossy conversion, and it's the same format the AI writer emits and the agents exchange. There is no separate "rich" representation — the markdown is the source of truth.
Components are stored separately because their content is a free-form JSON bag whose shape depends on the type. The body just holds the {{component:UUID}} placeholder; the actual content lives in post_components. The writer and the importer both validate against the component registry, which has 14 types:
callout, faq, steps, card, chart, data_table, comparison, process, image_carousel, image, map, widget, cta, contact_form.
See the components reference for the per-type content keys.
What you can hand-author vs generate
There's no hard line here — every field is editable by hand, and most can be generated.
| Part | Hand-author | Generate |
|---|---|---|
| Body markdown | Yes, in the editor | Yes, full draft via the AI writer |
| Title / slug / excerpt | Yes | Writer produces them; you can override |
SEO object (seo) |
Yes | Writer produces it (it's a required writer output, not user-supplied) |
| Citations / sources | You can add/edit source rows by hand | Writer fills them from web search |
| Fact spans | Wrap claims in {{fact}} yourself |
Writer marks them as it writes |
| Components | Add/edit by hand | Writer can emit image, widget, cta blocks |
| Cover + inline images | Upload from your library | Generated async after the body is saved |
[!NOTE] SEO is shape-guarded on persist: the writer is told to treat
seoas a required output, but the import step refuses anything that isn't a plain JSON object (a stray array or string is silently dropped rather than written to the column). In practice the writer returns an object, so this only bites on malformed payloads.
If you author a body by hand (or paste one in) and want it to go through the same pipeline the writer uses — component extraction, citation folding, fact extraction, follow-up image jobs — you call the import-body path (cms_import_post_body) rather than a plain field update. A plain cms_update_post just writes the body text; it doesn't parse markers into rows.
The AI writer pipeline
Generating a single post enqueues the cms.generate-blog-post job (via cms_generate_post, optionally with a briefing). The worker runs a fixed pipeline. The UI subscribes to job progress, so you see the milestones live.
- Resolve context. The worker pulls well-known company docs from the context wiki — company profile, industry, products, ICP, USP, writing-style/voice. If the post is tagged into a campaign, that campaign's writing-style addendum, profile addendum, and pinned documents are merged on top.
- Uniqueness pre-check (see the anti-reference section below).
- Research + draft through an agent loop, so the model can call tools mid-generation (next section).
- Resolve citations from the model's web-search annotations into
{{cite:N}}markers + a deduped sources array. - Extract components —
{{component:type}} … {{/component}}blocks are validated against the component registry, replaced with{{component:UUID}}markers, and inserted. Invalid blocks are dropped with a logged warning. - Extract fact spans — every
{{fact}}…{{/fact}}becomes apost_factsrow, in document order. - Persist — the post body/title/excerpt/slug/seo are patched, and
post_sources,post_facts, andpost_componentsare each replaced wholesale with the new lists.
The post lands in review. Steps 4–7 are the shared importPostBody helper — the exact same code path as a hand-authored import, so the two flows can't diverge on marker handling.
The default model is claude-sonnet-4-6 (env-overridable via CMS_BLOG_WRITER_MODEL). After a successful run the worker also captures a transcript (system prompt, brief, each tool call, truncated tool results, a run summary) into post_writer_transcripts, which backs the editor's Writer tab. That capture is best-effort — a failed insert never fails generation.
Tools the writer can call
The writer runs inside an agent loop so the model can call tools and keep going:
| Tool | What it does |
|---|---|
web_search |
Anthropic's server-side web search. The model decides when to search. Citation annotations come back on the text and get folded into {{cite:N}}. This is the only thing that becomes a footnote citation. |
search_context |
Hybrid vector+lexical search over the company's context wiki — pinned campaign docs, the company profile, ICP/USP/products, prior published posts. Used for voice and prior-art lookup. |
read_signals |
Reads analytics/SEO signals. |
submit_post |
How the model delivers the finished post. It's called exactly once; the field shape (title, slug, excerpt, body, seo) is enforced by the tool's input schema rather than parsed out of free text. |
The system prompt instructs the writer to call search_context first (1–3 calls, to ground itself in voice and prior work), then web_search (3–8 searches) for the actual claims, and to only state facts it found via search.
Writer settings (the knobs)
The New Post / Generate Batch dialogs expose four knobs, passed through to the job verbatim. Defaults: reasoning_effort: medium, post_length: medium, force_widget: true, min_ctas: 1.
| Knob | Values | Effect |
|---|---|---|
post_length |
short / medium / long |
Sets a target word count (600 / 1200 / 2200) and a separate max_tokens cap (4000 / 8000 / 16000). The word count is a soft target — the prompt says don't pad to hit it. |
reasoning_effort |
low / medium / high |
Tunes tool depth, not extended thinking (the pinned SDK doesn't expose that). Higher effort = more web searches (4 / 8 / 12), more agent-loop iterations (4 / 6 / 10), and a cooler temperature (0.6 / 0.4 / 0.3). |
force_widget |
boolean | Hard rule: emit at least one {{component:widget}} block. |
min_ctas |
0–3 | Hard rule: emit at least N {{component:cta}} blocks, placed at natural decision points in the prose (not bolted to the end). |
[!NOTE]
reasoning_effortdoes not change which model runs or give it a "thinking budget" in the usual sense — it only changes how many tool round-trips the writer is allowed. If you want deeper research, raise it; if you want a fast, lightly-sourced draft, lower it.
Citations and sources
The citation model is worth reading carefully, because there are two separate link mechanisms and they behave differently.
Footnote citations come from web search, and only from web search. When the writer states a statistic, quote, or named claim that came from a search result, Anthropic returns a citation annotation ({ url, title, cited_text, … }) on that span of text. The pipeline walks those annotations in document order, finds each cited_text in the body, and inserts {{cite:N}} right after the matched span. Multiple sources backing the same claim collapse onto the same span ({{cite:1}}{{cite:2}}); the same URL reuses the same index.
Inline markdown links are not citations. If the writer (or you) writes [text](https://example.com), that stays inline and renders as a normal <a href> link. It is never hoisted into the footnote list. The two mechanisms are deliberately separate: footnotes are for sourcing a factual claim, inline links are for deliberate hyperlinks (a resource, your own pages, a related post).
[!WARNING] The model is told never to hand-write a "Sources" / "References" section, footnote numbers like
[1], or<cite>tags. If you're hand-authoring, follow the same rule: don't type citation markup. To add a real source by hand, add apost_sourcesrow (it gets anidx) and reference it with{{cite:N}}in the body. The source list survives even if a source has no URL — the renderer just shows plain text instead of a link.
[!TIP] Source titles in the footnote list are derived from the URL (e.g.
hubspot.com — Content marketing pricing 2024), not from the inline anchor phrase. That's intentional: anchors are usually prose-flow ("per HubSpot") and make meaningless reference entries.
The orphan self-cite guard
One real failure mode: the model occasionally "cites" a sibling URL on the tenant's own domain by pattern-matching the current post's slug shape — a page that doesn't actually exist. So after citations resolve, the pipeline checks each source whose host is a tenant-owned domain. If its path doesn't resolve to a published post, it clears the URL — keeping the footnote text and the {{cite:N}} chip intact. The shared renderer then degrades that source to plain text (see the renderer note below). This is soft-fail: any DB hiccup leaves all sources alone. The reasoning is that shipping a footnote with no link beats failing the whole import over a broken one.
Two things narrow when this guard actually fires:
- It only runs for tenants with a registered custom domain. The check loads the tenant's rows from
tenant_domains; if there are none, the guard is a complete no-op. A tenant living only on the platform subdomain never triggers it. - It matches root-level slugs only. The guard inspects the first path segment of a self-host URL. Nested paths like
/blog/<slug>are treated as "not a recognisable self-cite" and left intact — so a hallucinated/blog/fake-postlink would not be caught.
Fact spans and the verification pass
Facts and citations are different things. A citation says "this claim came from this source." A fact span says "this is a checkable claim — go verify it independently."
The writer wraps specific, verifiable claims (a statistic, a date, a research finding, a named quote) in {{fact}}…{{/fact}}, aiming for roughly 3–10 per article. It's told not to wrap opinions, recommendations, or vague statements. Each span becomes a post_facts row in document order (ordinal, 0-based), starting life as status='unverified' with just the claim text.
[!IMPORTANT] Fact spans and their verdicts are editorial-only. The public renderer (
@nukipa/post-content) strips the{{fact}}…{{/fact}}wrappers and renders only the inner claim text — visitors see a normal sentence. Thestatus/confidence/reasoningverdicts live onpost_factsand surface in the editor's facts tab; they never render on the live post.
Running the verification pass
The verifier is a separate job (cms.verify-facts), run two ways:
- Verify all (
cms_verify_factswith no fact id) — re-checks every claim still inunverifiedon the post. After a writer run this fires automatically (unless disabled via env), but only when the post has at least one fact span — so a freshly generated post with facts usually arrives with them already checked. - Verify this claim (
cms_verify_factswith afact_id) — re-checks a single card.
The verifier runs Claude Sonnet (default claude-sonnet-4-6, env-overridable via CMS_FACT_CHECK_MODEL) with web_search (its primary tool) and search_context (for claims about the company itself — "our 2024 ARR," "our SOC 2 report" — which rarely have a public source but do live in the wiki). It returns a verdict per claim, mapped back by index onto the post_facts rows.
A fact's status is one of four states:
| Status | Meaning |
|---|---|
unverified |
The starting state, and where a claim stays if the verifier returned nothing for it (it was skipped — couldn't search it, dropped it). Your signal to retry or fix it by hand. |
verified |
Reliable sources clearly confirm the claim — right number, right name, right context |
contradicted |
Reliable sources directly contradict it. (A stat corroborated with a different number counts as contradicted, not verified.) |
unconfirmed |
The verifier searched and found nothing conclusive either way. Also the fallback the verifier writes when the model returns an unrecognised status string. |
The key distinction: unverified means "no verdict was ever written" (the model skipped this one), while unconfirmed means "the model searched and came up inconclusive." When the verifier does return a result for a claim, that claim always lands on a terminal verdict — verified, contradicted, or unconfirmed — never back in unverified.
Each verdict carries a confidence (high/medium/low), a source_url, a source_title, and a one-or-two-sentence reasoning. The model name and a timestamp are stashed under the fact's metadata so re-runs are auditable.
[!NOTE] The pass is rigorous on purpose: it's told not to mark something verified just because one SEO listicle agrees. A
contradictedorunconfirmedverdict does not change your post — the verifier writes verdicts onto the fact rows, it never edits the body. Acting on a verdict (correcting the number, cutting the claim) is your call in the editor.
Inline image and widget generation
When the writer emits an image block with a prompt (and no asset_id/url) or a widget block, the import step fires a follow-up generation job for it. These are fire-and-forget: the enqueue itself is wrapped, so a failed enqueue is logged but never fails the import.
Each image and widget component carries a status lifecycle — pending → generating → ready, or error (with an error_message) if generation fails. So image generation is not fire-and-succeed: a component can land in error, and you'd see that in the editor rather than the post silently shipping without the image. Manually-curated images (picked from your library) skip the worker entirely — they arrive with asset_id + url already set.
The single-post anti-reference behavior
When you regenerate or write a single post, the writer actively tries not to repeat what you've already published.
Before drafting, the worker searches the cms_post mirror in the context wiki for posts similar to this one's title and briefing, excluding the current post by its slug. Any hit with cosine similarity ≥ 0.75 (the uniqueness threshold of 0.85 minus a 0.10 looser margin) gets pasted into the writer's prompt as an "Existing posts on this topic — write something distinct" block, with each match's similarity score and excerpt. The instruction is to pick a clearly different angle, audience, depth, or framing.
[!NOTE] The single-post flow only nudges — it never refuses. It assumes a regen usually points at a post you genuinely want to rewrite. The batch generator is stricter: there, a proposed angle scoring ≥ 0.85 against an existing post is hard-rejected outright before any writing happens. Same index, different policy.
This depends on prior posts being mirrored into the context wiki, which happens automatically on publish. If the context service isn't configured (e.g. a bare dev setup), the check degrades to a no-op and the writer runs without anti-references.
FAQ
Why didn't my generated post auto-publish? It never does. The writer leaves it in review for you to read and click Publish. The only hands-off publish path is scheduling.
I edited a published post but the site still shows the old version. Public reads serve the snapshot at current_version_id. Editing changes the draft; you have to publish again to roll a new snapshot.
The writer added a [link](url) but it's not in my sources list. Bug? No. Inline markdown links are deliberate hyperlinks and stay inline. Only web-search annotations become footnote citations. The two are separate by design.
A footnote shows source text but isn't clickable. Its URL was cleared — most likely the orphan self-cite guard caught a hallucinated link to a non-existent page on your own domain. The footnote stays as plain text so the {{cite:N}} reference doesn't break. (Note: the guard only runs for tenants with a registered custom domain.)
Can I mark facts in a hand-written post? Yes — wrap them in {{fact}}…{{/fact}} and run the body through the import-body path so they're extracted into rows, then run Verify all. A plain field update won't parse the markers.
A fact came back unconfirmed — is that a failure? No. It means the verifier searched and couldn't find clear evidence either way. The claim is flagged for you to check, but it doesn't block publishing. (unconfirmed is distinct from unverified: the latter means the verifier never returned a verdict for that claim at all.)
Will my fact verdicts show up on the live post? No. Fact markers are stripped on render and verdicts are editorial-only — visitors see the claim text as a normal sentence, never the verified/contradicted/unconfirmed status.