Architecture

The complete picture of how Amy's backend is built, why each piece was chosen, and where the seams are.

Goals (in priority order)

Backend developer experience is the product. Adding a new client surface (mobile, web, partner integration, AI agent) should feel like "install SDK, call function." If it doesn't, the backend has failed.
AI-agent-readable end-to-end. Docs, errors, schemas all designed to be consumed cleanly by LLMs. This is now a first-class integration vector, not an afterthought.
Durable agent runtime. A turn (the 2-7 minute multi-agent reasoning pipeline) survives worker restarts, Anthropic 5xxs, and code deploys mid-flight.
Cloudflare-native. Stick with the stack we're already paying for and understand.
No premature scale, no premature features. Beta is 10-100 users; design for that and clearly document where scale will bite.

Non-goals for v1

Multi-tenancy / organizations (one user per account).
Outbound webhooks for partners (we design the seam, don't build it).
BYOK / self-hosted (probably never).
Real-time multi-device collaboration.

Five principles

The schema is the contract. A single Zod schema → OpenAPI → SDK → docs. One source of truth, everything else generated.
Resources, not RPC. POST /v1/turns to start a turn, not POST /v1/createTurn.
Every async thing is a Workflow. Turns, lab parsing, future scheduled check-ins all run as Cloudflare Workflows for durable, resumable, observable execution.
Every write is idempotent. The Idempotency-Key header is honored on every POST/PATCH; results are cached in KV for 24h.
Every response carries a request_id. Logs and errors include it. Tracing is free if you wire it up from day one.

System diagram

            CLI (Bun)   Mobile (RN)   Web   Partners/AI Agents
                │           │          │          │
                └───────────┴────┬─────┴──────────┘
                                 │  HTTPS · SSE · (WebSocket later)
                                 ▼
                 ┌─────────────────────────────────────┐
                 │  @amy/sdk-{ts,py,swift}             │
                 │  Generated from OpenAPI; identical  │
                 │  ergonomics across languages.       │
                 └────────────────┬────────────────────┘
                                  ▼
                 ┌─────────────────────────────────────┐
                 │  amy.heyamy.xyz                     │
                 │  Cloudflare Worker · Hono ·         │
                 │  @hono/zod-openapi                  │
                 │                                     │
                 │  /v1/turns       /v1/sources        │
                 │  /v1/data        /v1/labs           │
                 │  /v1/memory      /v1/me             │
                 │  /webhooks/terra                    │
                 │  /openapi.json   /llms.txt          │
                 └──┬────────┬──────────┬──────────────┘
                    │        │          │
                    ▼        ▼          ▼
              ┌────────┐  ┌────┐    ┌──────────┐
              │ D1     │  │ R2 │    │ KV       │
              │ rel.   │  │ blob│   │ idemp    │
              │ data   │  │     │   │ stream   │
              └────────┘  └────┘    └──────────┘
                    ▲
                    │  step state read/written
                    │
            ┌───────┴──────────┐      ┌──────────────────────┐
            │ CF Workflows     │ ←──→ │ CF Queues            │
            │ · TurnWorkflow   │      │ · terra-events       │
            │ · LabParse       │      │ · workflow-dispatch  │
            └──────┬───────────┘      └──────────────────────┘
                   │
                   ▼
            Anthropic · Terra · OpenRouter · Clerk

A single Zod schema package (packages/contracts/) defines every request, response, and event. Every route imports from it; the OpenAPI spec is auto-generated; the TypeScript SDK is auto-generated; the docs embed the spec.

// packages/contracts/src/turns.ts
export const TurnCreate = z.object({
  messages: z.array(MessageSchema),
  stream: z.boolean().default(true),
});

export const Turn = z.object({
  id: z.string().startsWith("turn_"),
  status: z.enum(["queued", "running", "completed", "failed"]),
  created_at: z.string().datetime(),
  result: TurnResult.optional(),
});

// apps/api/src/routes/turns.ts — Hono route consumes the contract
import { TurnCreate, Turn } from "@amy/contracts";

app.openapi(
  { method: "post", path: "/v1/turns",
    request: { body: { content: { "application/json": { schema: TurnCreate } } } },
    responses: { 201: { content: { "application/json": { schema: Turn } } } } },
  async (c) => { /* dispatch to TurnWorkflow, return Turn */ }
);

// any client (CLI, mobile, web) — fully typed
import { Amy } from "@amy/sdk";
const amy = new Amy({ apiKey: process.env.AMY_API_KEY });

const turn = await amy.turns.create({ messages: [...] });
for await (const event of amy.turns.stream(turn.id)) {
  console.log(event.type, event);
}

Why this matters: Once the contract is in place, adding a new client is a 30-minute job. Without it, every client re-defines the same types and drifts.

Layer 2, API surface

Resource-oriented REST. Every resource follows the same shape: POST to create, GET /:id to read, GET to list, PATCH /:id to update, DELETE /:id to remove. Long-running creates return 202 Accepted with a status URL; everything else is 200/201/204.

POST   /v1/turns                       Start a turn → 202 { id, status: "queued", stream_url }
GET    /v1/turns/:id                   Status + result
GET    /v1/turns/:id/events            SSE stream of turn events
GET    /v1/turns                       List (cursor-paginated)

GET    /v1/me                          Current user + connections
PATCH  /v1/me                          Update profile                 (planned)

GET    /v1/sources                     List connected wearables
POST   /v1/sources/terra/connect       → { widget_url } for OAuth   (alias: POST /v1/connect)
DELETE /v1/sources/:id                 Disconnect (src_… id)

POST   /v1/labs/upload                 Upload PDF (multipart) → 200 { upload_id, terra_status }
GET    /v1/labs                        List uploads
GET    /v1/labs/:id                    Status + parsed biomarkers     (planned)

GET    /v1/data/sync?since=...         Delta sync (for offline clients; alias: GET /v1/sync)
GET    /v1/data/biomarkers             Timeseries query               (planned)
GET    /v1/data/summaries/:date        Daily summary                  (planned)

GET    /v1/memory                      Facts Amy remembers            (planned)
POST   /v1/memory                      Add a fact                     (planned)
DELETE /v1/memory/:id                  Remove a fact                  (planned)

POST   /webhooks/terra                 Ingest (HMAC-verified; aliases: /webhook/terra, /webhook, /terra, /)
GET    /cli/login                      CLI browser sign-in page (HTML)
POST   /v1/auth/cli-approve            Mint a 30-day Amy CLI JWT

GET    /reference                      Interactive API explorer (Scalar)
GET    /openapi.json                   Live OpenAPI 3.1
GET    /llms.txt                       AI-agent index
GET    /healthz                        Liveness

Endpoints marked (planned) are designed but not yet in /openapi.json; see the API reference for the shipped surface.

Conventions:

Resource URIs are plural nouns.
IDs are typed prefixes: turn_…, lab_…, src_…, mem_…. Easy to grep, hard to confuse.
Cursor pagination on all list endpoints: ?cursor=…&limit=… → { data: [...], next_cursor: "..." }.
Idempotency on all writes: Idempotency-Key: <client-uuid> header.

Errors follow one shape:

{ "error": { "code": "turn_not_found", "message": "...", "request_id": "req_...", "docs_url": "https://docs.heyamy.xyz/docs/concepts/errors#turn_not_found" } }

Versioning via URL path (/v1/). When v2 ships, both run side-by-side; v1 deprecated with at least 6 months notice.

See API reference for the full schema of every endpoint.

Layer 3, Compute model

The Worker handles everything that fits in <5s of wall time. Anything longer is a Workflow.

TurnWorkflow

A turn is decomposed into discrete, retry-safe steps. Each step's output persists in the workflow's durable state; if step 5 fails on an Anthropic 5xx, the workflow resumes at step 5, steps 1-4 don't replay.

step 1   classify_vagueness     Sonnet         ~3s
step 2   route                  Sonnet         ~2s
step 3   rephrase_per_agent     Sonnet         ~2s
step 4   run_supporting_agents  Opus, parallel ~30–90s each
step 5   run_main_agent         Opus           ~30–120s
step 6   reflection             Sonnet         ~5s
step 7   validation_gates       deterministic + Critic  ~10s
step 8   synthesis              Opus, streams  ~20s
step 9   memory_extraction      Sonnet         ~3s
step 10  finalize               write Turn row, fire turn.completed event

Free wins from Workflows:

Retry/resume on failure.
Observability, every step's input/output visible in the CF dashboard.
Replay, re-run a turn with the same inputs for debugging.
Sleep + waitForEvent, paves the way for human-in-the-loop pauses ("Amy paused for your confirmation").

See Internals: Agent orchestration for the full step list and validation gate spec.

Streaming (the hardest call)

We want the "watch Amy think" UX from the CLI to work on every client.

v1 choice: SSE with KV-buffered events.

Each workflow step writes events to KV: stream:{turn_id}:{seq} and bumps a cursor counter.
GET /v1/turns/:id/events is an SSE Worker that polls KV every 250ms and forwards new events to the client.
Supports Last-Event-Id header for resume after disconnect.
Works on curl, browser EventSource, React Native (with react-native-event-source), Swift URLSession, anything.

Cost: ~250ms event latency. Negligible for LLM streaming where tokens come in chunks anyway.

Upgrade path: swap the KV poll for a Durable Object per active turn that brokers events over WebSocket. The HTTP surface stays identical, clients don't notice the change.

See Concepts: Streaming for the full event type catalog and reconnect protocol.

Layer 4, Storage

Data	Store	Why
Users, sources, turns, biomarkers, daily summaries	D1 (SQLite)	Transactional, indexed, cheap
Lab PDFs, future audio recordings, exports	R2	S3-compatible (Terra likes this), zero egress fees
Idempotency keys, stream event buffer	KV	Short TTL, edge-cached
Long-term agent memory (facts)	D1 + Vectorize (later)	Start relational; add vector retrieval when fuzzy lookups appear
Workflow step state	Workflow runtime	Managed by CF

See Internals: Storage for the D1 schema, R2 layout, KV key patterns, and migration story.

Layer 5, Developer experience

This is the layer that makes everything else worth it.

Repo layout (Bun workspaces, free monorepo)

amy/
├── apps/
│   ├── api/            ← the Cloudflare Worker (was cloud/)
│   ├── cli/            ← the existing CLI (was src/)
│   ├── docs/           ← Fumadocs site → docs.heyamy.xyz
│   ├── mobile/         ← (later) React Native app
│   └── web/            ← (later) marketing + dashboard
├── packages/
│   ├── contracts/      ← Zod schemas, the source of truth
│   ├── sdk-ts/         ← generated TS SDK (npm: @amy/sdk)
│   ├── sdk-py/         ← (later) PyPI: amy-sdk
│   ├── agents/         ← the runTurn pipeline (used by apps/api)
│   └── eval/           ← offline evals on agents/
└── tooling/
    ├── openapi-gen/    ← script: routes → openapi.json
    └── llms-txt-gen/   ← script: docs → llms.txt

The CLI keeps working throughout. The first migration step is just extracting packages/contracts/ from the duplicated schemas, purely a refactor, no behavior change.

SDKs

Language	How it's built	When it ships
TypeScript	Hono's `hc()` typed client wrapping fetch, generated at build time from the OpenAPI spec	Day one
Python	Fern free tier from OpenAPI	When the first Python user asks
Swift	Fern or Stainless	When the native iOS app starts

Every SDK ships with:

Fully typed methods (matching the contract package).
Automatic retries with exponential backoff for transient failures.
Auto-generated Idempotency-Key on writes (UUIDv4).
An async iterator for streaming endpoints.
Typed error classes with stable code fields.

See SDK reference.

Docs site

Fumadocs (open-source, Next.js, MDX) served at docs.heyamy.xyz. Sections mirror this directory:

Getting Started, 5 minutes to your first turn.
Concepts, turns, streaming, memory, webhooks, errors.
Guides, how-to articles, ordered.
Recipes, end-to-end builds, including the mobile app one.
API Reference, embedded Scalar component fed from /openapi.json.
SDK Reference, auto-generated from SDK source.

The AI-agent surface (the differentiator):

GET /llms.txt, llmstxt.org standard, indexes every doc page with a 1-line description.
GET /llms-full.txt, all docs concatenated, for one-shot context loading.
Every docs page available as raw markdown at <url>.md.
Every API error includes a docs_url pointing to the relevant page.
OpenAPI spec includes rich description, examples, and per-language x-codeSamples for every endpoint.

When Claude Code (or any agent) integrates with Amy: it fetches llms.txt, picks the relevant pages, fetches them as .md, and writes code. No HTML parsing, no scraping, no guessing.

Local dev loop

bun dev              # wrangler dev (api) + docs site + cli watch — all in one
bun test             # vitest across all packages
bun openapi          # regenerate openapi.json from routes
bun sdk:gen          # regenerate the TS SDK from the spec
bun docs:dev         # docs site hot reload

Edit a route → SDK types update in <2s → CLI/docs reflect it. That's the inner loop we're building toward.

Trade-offs (explicit)

Decision	Choice	Cost
Schema language	Zod	Not as expressive as TypeSpec for API design, but already familiar and used everywhere in the codebase
Initial SDK	`hc()` for TS only	Other languages delayed until first ask
Streaming	SSE + KV poll	~250ms latency vs WebSocket; trivial upgrade path
Orchestration	CF Workflows	Deeper Cloudflare lock-in (we were already deep)
Monorepo	Bun workspaces	More files to navigate; mitigated by a README per package
Docs site	Self-hosted Fumadocs	More setup than hosted Mintlify; zero recurring cost
Auth	Clerk	Vendor dependency; mitigated by hiding it behind our own `/v1/me`
Versioning	URL path (`/v1/`)	Less flexible than header-based; simpler to reason about
Adapters	Terra-first	Skips per-vendor OAuth dance; locked to Terra's coverage

Migration sketch (current state → v1)

You're closer than it looks. The order matters; steps 1-3 are pure refactors that don't change behavior but unlock everything else.

Carve out packages/contracts/ from the duplicated src/data/schema.ts and cloud/src/schema.ts. Both sides import from it. No behavior change.
Adopt @hono/zod-openapi in the existing Worker. Refactor routes to declare their schemas; auto-generate /openapi.json.
Move the CLI client to hc() from manual fetch. Delete the bespoke wrapper in src/cloud/client.ts.
Build TurnWorkflow. Wrap the existing runTurn from src/orchestrator/index.ts; step-decompose; persist state. Add POST /v1/turns and GET /v1/turns/:id/events.
Restructure to monorepo. src/ → apps/cli/, cloud/ → apps/api/. Bun workspaces.
Stand up apps/docs/ with Fumadocs. Hook Scalar to /openapi.json. Write the llms.txt generator.
Deploy amy.heyamy.xyz and docs.heyamy.xyz. First real-world v1 use.

After (1-3) the codebase is dramatically easier to work with even before (4) lands. After (4) the first mobile screen is buildable.

What we'll revisit at scale

These are seams to watch as the system grows. Each has a documented upgrade path; none requires a rewrite.

D1 row limits (~10 GB per database). Past ~10k active users, shard by user_id range or move biomarker timeseries to ClickHouse or Tinybird.
KV-based streaming. Past ~100 concurrent active turns, the 250ms poll wastes reads. Cut over to Durable Object + WebSocket. The HTTP surface stays identical.
Workflow step counts. Cloudflare has limits on steps per workflow (~100 today). If a turn balloons past ~30 steps, decompose into sub-workflows.
Synthesis prompt size. Currently includes the full Fact Sheet. At some point use Workers AI Vectorize for selective retrieval.
Eval infrastructure. packages/eval/ needs to grow into a real regression harness against frozen agent traces. This is critical before any agent change ships.
Multi-region. Workers are already edge-distributed; D1 is single-region. When global latency starts to matter, evaluate D1 read replicas (in beta) or partition by user region.

Where to next

New to the system? Read Getting started.
Building a mobile app? Read Build a mobile app.
Looking for an endpoint? Read API reference.
Adding a new wearable? Read Add a new adapter.