Steeren/Build a site from scratchlive from the platform← site
Get a site live

Build a site from scratch

This is the greenfield path: you have a Nukipa tenant (you signed up; the dashboard provisioned it), and almost nothing in it — empty CMS, blank brand wiki, no USPs. You want a real Next.js marketing site at the end: a homepage, a blog wired to your CMS, and legal pages. This article walks the whole flow, names exactly which steps the agent does versus which steps need you, and where the product is currently thin.

The flow is driven by the nukipa-site-from-scratch skill. It's a variant of nukipa-site-one-shot: same starter project, same blog wiring, same GitHub-deploy step — but the from-scratch variant does the research itself and writes what it learns back into your workspace. The intent is that the rest of the platform (campaigns, briefings, social, future rebuilds) then reads the same brand/USP/ICP context the site was built from, instead of you re-entering it. That's the design rationale; whether each of those surfaces actually consumes the seeded context is out of scope for this article.

[!NOTE] If your tenant already has rich content — a populated brand wiki, USPs, ICPs, published posts — use nukipa-site-one-shot instead. From-scratch exists specifically for a tenant that's minutes old and empty.

What you bring vs. what the agent does

You provide the bare minimum and a couple of decisions; the agent does the research, generation, and assembly.

Step Who Notes
Sign up / provision the tenant You The skill does not create the tenant. It's a one-line dashboard action.
Provide a company URL (or name) You A URL is the best signal. A name alone means the agent finds the URL first.
Deep research the business + competitors Agent Crawls your site, web-searches competitors, infers ICP/USPs.
Correct the research You One review pass after Phase 1. Don't skip it — research is fast but lossy.
Seed brand/USP/ICP/competitor context into the workspace Agent Via MCP tools, so it persists in the platform.
Decide whether to pre-generate starter posts You y/n, or pick a count (default 3, capped at 5).
Generate content + design, assemble on the starter Agent Parallel content + design agents, then mergers onto the starter.
Review the local preview, request changes You Real iteration loop on http://localhost:3000.
Push to GitHub + connect the repo Agent gh repo create to the nukipa-labs org, then nukipa_connect_github_repo.

The agent never asks you to write code or touch the starter. Your job is the two decisions (corrections, post count) and the design/copy feedback.

Inputs

You need one of:

  • A company URL (most common, e.g. https://acme.com) — the best signal source.
  • A company name alone — rarer; the agent finds the URL via web search first.

Plus your Nukipa tenant slug. If a BRIEF.preload.md file is present in the working directory (the platform's "Hand off to Claude Code" flow drops one), it already carries tenant_slug, tenant_id, the tenant host, the Gateway URL, and a NUKIPA_TOKEN. If it's missing, the agent asks you for the URL or name before starting.

The whole run is roughly 15–25 minutes of wall time. Most of that is research and seeding; the actual site generation is about 5 minutes.

Phase 1: Deep research → BRIEF.md

A single research agent crawls the public web and writes a BRIEF.md. It:

  1. Resolves the canonical homepage (web-searching for it first if you gave a name).
  2. Discovers internal pages — tries /sitemap.xml, otherwise walks same-origin links to depth 2. Capped at ~30 pages.
  3. WebFetches each page and categorises it (marketing, about, case_study, blog_index, contact, legal).
  4. Web-searches 5–8 competitors from the inferred category, fetches each homepage, and synthesises what makes you different.
  5. Infers your ICP from testimonials/case studies/customer logos, extracts up to 3 USPs (each one sentence, each defensible from a real quote with a source URL), and reads brand voice, logo URL, primary color, and font.

The agent is told to cite as it goes and to surface gaps rather than fabricate. If it can't find a logo it writes Logo: [NOT FOUND — ask user]; if it can't find a real testimonial it writes // TODO: real testimonials rather than inventing "Sarah Chen at TechCorp". Every [NOT FOUND] and [TODO] is collected into an ## Open questions section at the end of BRIEF.md.

Then it stops and asks you:

"I've gathered: <one-line summary>. Before we go further, anything you'd correct? (industry, ICP, USPs, voice, anything you saw and didn't like)"

This is the one mandatory pause. Your corrections get applied inline to BRIEF.md, which is the source of truth every downstream agent trusts verbatim — so a wrong industry guess here propagates. Catch it now.

[!WARNING] The research is real-internet research, not magic. If the site is sparse, the brand voice or ICP inference will be thin. Your correction pass is where that thinness gets fixed — read the ## Open questions section carefully.

Phase 2: Seed the workspace

This is the step that makes "from scratch" different from a one-off site build. Everything the agent researched gets written into the Nukipa platform via MCP tools, so it lives in the workspace rather than being stranded in a local markdown file.

[!NOTE] Every MCP tool call needs a workspace_id — the tenant UUID (same value as the X-Tenant-Id header). There's no "active workspace" concept; if the agent doesn't have the UUID it calls list_workspaces to get it. (Two tools are scoped differently — see Phase 13 on nukipa_connect_github_repo, which takes no workspace_id.)

The agent seeds, roughly in order:

context_set_brand_theme({ workspace_id, primary_color, accent_color, font_family, … })
context_set_language({ workspace_id, language })          # from the scraped content's dominant language
add_company_usp({ label, description })                    # ×3, one per researched USP
add_company_icp({ label, description, industry, size, role })   # one per ICP cluster
context_discover_competitors({ workspace_id, … })          # platform-suggested
context_confirm_competitors({ workspace_id, competitor_ids })  # the ones it researched
update_company_profile({ description, mission, founding_year, … })
context_set_image_style({ workspace_id, … })               # only if BRIEF captured an aesthetic

After each batch the agent verifies with the matching get_* / list_* tool (e.g. list_company_usps) so a silently-failed insert surfaces here, not at build time. If no brand theme was extractable, the agent asks you for one or two primary colors before continuing.

Once this phase is done, the context lives in the platform. A later run (one-shot or from-scratch) can read the brand/USP/ICP/competitor context straight from the workspace instead of re-crawling.

Phase 3: Synthesis

The agent reads BRIEF.md plus the now-fresh platform context and writes a PROJECT-PLAN.md: the pages to create, the section list per page, an aesthetic profile (chosen from the skill's industry → aesthetic mapping), the color palette and typography pairing, the primary CTA goal. No pause here.

Phase 4: Optional starter posts

A site with an empty blog grid looks half-finished. The agent asks:

"Should I generate 3–5 starter blog posts based on the research, so the site doesn't ship empty? You can edit / publish / delete them freely once the site is up. (y/n, or pick a count)"

The default is 3; you can pick a different count, capped at 5. If you say yes, a seed-content agent picks topics that each demonstrate one USP and speak to one ICP cluster, shows you the list, then for each approved topic creates the post and kicks off the writer.

The real two-step flow uses the actual MCP tools. cms_generate_post does not create a post — it kicks off the writer agent for an existing post. So the agent calls cms_create_post first to get a post_id, then:

cms_create_post({ workspace_id, title, slug, … })          # returns post_id
cms_generate_post({ post_id, briefing: "<one-line angle>", language })   # returns a job_id

cms_generate_post takes post_id (required), and optionally briefing, language, reasoning_effort, post_length, force_widget, min_ctas. It does not take topic, status, or workspace_id, and it returns only a job_id — the writer worker fills in body, excerpt, SEO, and sources asynchronously.

There is no check_job_status MCP tool. To watch a job, the agent polls the post itself with cms_get_post({ post_id }) and inspects its state — sleeping ~10s between checks, until the underlying job reports succeeded or failed. The tool description quotes a typical run of 1–3 minutes per post; the seed agent applies a hard cap of 3 minutes per post and then moves on. The generated posts use the context you just seeded, so they land in your brand voice.

Non-happy paths the seed agent handles:

  • A failed job is retried once. If it fails a second time, the agent drops that one post and continues with the rest — it doesn't abort the whole phase. The final summary notes any gap.
  • If Phase 2 context isn't seeded yet, the writer produces generic output. The agent is told to confirm Phase 2 finished before generating.

Three constraints worth knowing:

  • Posts stay draft. Your first dashboard action is review → publish. The agent never publishes for you.
  • Cap is 5. More than five is a moderation burden, not a head-start.
  • Topics are evergreen and generic — no "2026 trends", no "we just launched X" (you haven't launched anything), no riding a named competitor's positioning.

Optionally the agent groups posts into a folder or two (cms_create_folder + cms_move_post_to_folder) with a name that fits your voice ("Insights", "Guides", "Cases").

If you say no, this phase is skipped entirely and the blog index renders its empty state ("No posts published yet. Check back soon.") until you write some.

Phases 5–10: Generate, assemble, polish

These run without pausing. In short:

  • Wow-effect selection — the agent presents interactive-effect bundles (hero effect + scroll effect + extras) fitting the chosen aesthetic; you pick one.
  • Parallel craft — a content agent writes CONTENT.md (all copy, section ordering) and a design agent writes DESIGN.md (Tailwind tokens, fonts, effect specs), in parallel.
  • Assembly — the agent copies the starter into [tenant-slug]/, writes .env.local, runs npm install, then two mergers fill in the homepage/layout/legal pages and restyle the blog routes, in parallel, before npx next build.
  • Integration, legal, and proofread passes — a safety-net check of the platform wiring, a jurisdiction-aware review of Privacy/Terms/Imprint, and a build + AI-slop + accessibility QA pass.

What "the starter" gives you

The site isn't generated from a blank directory. The skill ships a pre-wired Next.js 15 project (App Router, TypeScript, Tailwind v4) that the agent copies verbatim, including the platform-contract files below.

Path Role
src/lib/nukipa.ts The SDK wrapper. getNukipaClient() for server components, getMiddlewareClient(req) for middleware. Every API call goes through here.
src/middleware.ts Fire-and-forget page-view ping via client.recordVisit(...).
src/app/blog/page.tsx Blog index — listPosts({ limit: 50 }) → grid of <PostCard>.
src/app/blog/[slug]/page.tsx Blog detail — getPostBySlug<PostBody> for the body.
src/components/GateForm.tsx The form shown under a content-gated post; posts to /public/v1/forms/<slug>/submit.
src/components/NukipaFeedback.tsx + public/nukipa-widget.js The floating feedback widget (see Phase 11).
scripts/sanitize.mjs Strips smart quotes, em-dashes, NBSPs, zero-width chars, and stray LLM citation artefacts.

[!NOTE] The two skills disagree on the wording here, so be precise about what's actually true. The shipped starter contains these files pre-wired. The blog merger does regenerate/restyle the two blog routes (references/blog-integration.md even tells the merger to "generate every file from scratch"), but the contract it must preserve is fixed: the same data flow (listPosts/getPostBySlug through src/lib/nukipa.ts) and the same body rendering through <PostBody>. Pick whatever visual styling you want; keep the data flow and body rendering identical.

The reason the blog detail page renders the body through <PostBody> (from @nukipa/post-renderer-react) and not a hand-rolled markdown renderer is that posts can contain interactive islands — CTA buttons that auto-track clicks, inline lead forms. <PostBody> hydrates those client-side and wires their analytics back through the SDK; rendering the markdown yourself collapses them to static markup that tracks nothing.

The starter's .env.example ships with blank NUKIPA_GATEWAY_URL on purpose, so the same skill bundle works from prod and staging. The agent fills .env.local from the BRIEF.preload.md "Gateway URL" line. src/lib/nukipa.ts throws at module load if the value is blank — that's intentional; pointing at the wrong environment silently serves the wrong tenant's data.

Phase 11: Local preview + feedback loop

The agent runs:

cd [tenant-slug] && npm install && npm run dev

and tells you the site is at http://localhost:3000.

[!TIP] Treat this as a real iteration loop, not a courtesy check. Even if it "looks fine", the agent is instructed to surface 2–3 specific things it would improve. Ask for the changes — a second pass is usually where the gap between "looks fine" and "actually good" closes.

The starter ships a floating feedback widget (bottom-right bubble) you can use to leave inline comments and pin sections. It lives in a closed Shadow DOM so the design's CSS can't break it, and re-mounts itself (via a MutationObserver) if a design rewrite removes its host node. Behaviour to be aware of:

  • By default it's shown in production builds and hidden during npm run dev (override with NUKIPA_FEEDBACK_ENABLED=1 to test it locally, or NUKIPA_FEEDBACK_DISABLED=1 to force it off in production after go-live). These are the only two override env vars the widget reads.
  • It stores comments in your browser's localStorage and offers a "Copy all" button. The shipped widget has no backend round-trip at all — its own header says "No backend round-trip. Items accumulate in localStorage." You paste the comments back into the chat when you want the agent to iterate.

For an in-person review you iterate live as you talk. For an async hand-off, the agent pushes to GitHub first (next phase), shares the deployed URL, and you leave widget comments at your own pace.

[!NOTE] Older one-shot skill prose mentions a NUKIPA_FEEDBACK_ENDPOINT that POSTs to /public/v1/feedback. That is not in the shipped widget or .env.example — the widget has zero network code, and the only feedback env vars are NUKIPA_FEEDBACK_DISABLED / NUKIPA_FEEDBACK_ENABLED. Treat localStorage + "Copy all" as the whole story.

Phase 12: Extensions (optional)

Once you're happy with the homepage, blog, and legal pages, the agent asks what else to add. Options include a contact form, a products/services index, a competitor-comparison page, a lead-magnet tool, an events page, or a folder-driven page (/changelog, /case-studies, etc. — these reuse the blog CMS via folders rather than a new schema). Each picked extension is built by a dedicated merger agent. You can also pick nothing and skip straight to deploy.

Phase 13: Push to GitHub + connect

When you confirm the preview is good, the agent finalises and ships. First it sanitizes and re-builds:

cd [tenant-slug] && npm run sanitize && npm run build

The sanitizer is mandatory — it strips the smart quotes, em-dashes, non-breaking spaces, zero-width characters, and stray citation artefacts (fileciteturn0file2, <cite index="…">) that LLM-generated copy carries and that render as visible glitches in production. It prints an aggregate cleaned N count (number of files changed), and skips its own scripts/ directory. The starter files are already clean, so on a fresh run the only thing the count reflects is agent output that needed cleaning.

Then it creates the repo and pushes:

cd [tenant-slug]
git init
git add .
git commit -m "Initial site for [tenant-name]"
gh repo create nukipa-labs/[tenant-slug]-site --source=. --push

The repo must be pushed to the nukipa-labs GitHub org. (The agent confirms gh auth status first; if you're not signed in it asks you to run gh auth login.)

Finally it records the repo on your tenant. Note the call takes only repo_url — there's no workspace_id here; the tenant is resolved from the request context:

nukipa_connect_github_repo({ repo_url: "https://github.com/nukipa-labs/[tenant-slug]-site" })

This tool accepts both the https://github.com/<owner>/<repo> and git@github.com:<owner>/<repo>.git forms and normalises to the https form for storage (a trailing .git is stripped). It's idempotent — calling it again replaces the recorded URL. There is deliberately no "create the repo for me" tool: the agent owns repo creation via gh, which keeps the platform from having to manage a GitHub token.

[!WARNING] Connecting the repo records the URL on your tenant row and enqueues a best-effort deploy job (deploy-tenant-site, described in code as a "Vercel reconciliation"). The enqueue is fire-and-forget: if pg-boss is unavailable it's silently skipped, and the connect call still succeeds. So a deploy job is fired — what's genuinely uncertain is whether that job runs end-to-end to produce a live URL. The MCP tool's own comments say the deployer is "intentionally not wired here yet" at the MCP layer. Don't expect a live URL to appear the instant the connect call returns.

What's thin right now, plainly

  • Deploy is best-effort and not proven end-to-end. The connect call enqueues deploy-tenant-site, but the MCP-layer comments flag the deployer as a stated TODO, and the enqueue is skipped entirely when pg-boss is down. The repo and the local preview are real deliverables today; an automatic live URL is the open question.
  • Research quality tracks the public web. Sparse source sites produce sparse briefs. The Phase 1 correction pass is doing real work, not a formality.
  • The feedback widget is local-only. localStorage + "Copy all", pasted back into chat. There is no platform-collected variant in the shipped code, despite older skill prose implying one.
  • Seeded context is one-directional. Phase 2 writes research into the workspace; it doesn't re-sync if you later edit those USPs/ICPs in the dashboard and rebuild — the next build re-reads whatever's currently in the platform, so keep the dashboard as the source of truth after the first run.

FAQ

Do I have to let it generate blog posts? No. Phase 4 is opt-in. Skip it and the blog index shows its empty state ("No posts published yet. Check back soon.") until you write your own in the CMS.

Can I rebuild the site later without re-doing research? Yes — that's the point of Phase 2. Once the brand/USP/ICP/competitor context is seeded into the workspace, a later run (one-shot or from-scratch) reads it straight from the platform instead of re-crawling.

What gets committed to the repo — does my NUKIPA_TOKEN leak? No. .env.local (where the token and Gateway URL live for local dev) is gitignored, so the secret never reaches the repo. For a deployed build, the shipped next.config.mjs contains only images.remotePatterns — there is no env key by default. Baking the non-secret values (NUKIPA_GATEWAY_URL, NUKIPA_TENANT_HOST, NEXT_PUBLIC_NUKIPA_GATEWAY_URL) into next.config.mjs under an env key is a documented manual production step (see references/blog-integration.md), not something the starter does automatically. Those three are public-read endpoints/hosts, not secrets.

Why does it want a company URL instead of just asking me questions? A URL is the strongest single signal — it gives the agent your real copy, palette, font, testimonials, and existing pages to anchor on. With only a name, it has to find the URL first and the research is thinner. You can give a name alone, but expect more correction work in Phase 1.

The connect call succeeded but nothing deployed. Is that a bug? Not necessarily. Connecting records the repo URL on the tenant and enqueues a best-effort deploy-tenant-site job, but that deployer isn't confirmed to run end-to-end (and the enqueue is skipped if pg-boss is down). The repo and local preview are the reliable deliverables today; a live URL is not guaranteed yet.

Served live from the platform · /docs/build-from-scratch