Architecture
The complete picture of how Amy's backend is built, why each piece was chosen, and where the seams are.
Goals (in priority order)
- Backend developer experience is the product. Adding a new client surface (mobile, web, partner integration, AI agent) should feel like "install SDK, call function." If it doesn't, the backend has failed.
- AI-agent-readable end-to-end. Docs, errors, schemas all designed to be consumed cleanly by LLMs. This is now a first-class integration vector, not an afterthought.
- Durable agent runtime. A turn (the 2–7 minute multi-agent reasoning pipeline) survives worker restarts, Anthropic 5xxs, and code deploys mid-flight.
- Cloudflare-native. Stick with the stack we're already paying for and understand.
- No premature scale, no premature features. Beta is 10–100 users; design for that and clearly document where scale will bite.
Non-goals for v1
- Multi-tenancy / organizations (one user per account).
- Outbound webhooks for partners (we design the seam, don't build it).
- BYOK / self-hosted (probably never).
- Real-time multi-device collaboration.
Five principles
- The schema is the contract. A single Zod schema → OpenAPI → SDK → docs. One source of truth, everything else generated.
- Resources, not RPC.
POST /v1/turnsto start a turn, notPOST /v1/createTurn. - Every async thing is a Workflow. Turns, lab parsing, future scheduled check-ins all run as Cloudflare Workflows for durable, resumable, observable execution.
- Every write is idempotent. The
Idempotency-Keyheader is honored on every POST/PATCH; results are cached in KV for 24h. - Every response carries a
request_id. Logs and errors include it. Tracing is free if you wire it up from day one.
System diagram
CLI (Bun) Mobile (RN) Web Partners/AI Agents
│ │ │ │
└───────────┴────┬─────┴──────────┘
│ HTTPS · SSE · (WebSocket later)
▼
┌─────────────────────────────────────┐
│ @amy/sdk-{ts,py,swift} │
│ Generated from OpenAPI; identical │
│ ergonomics across languages. │
└────────────────┬────────────────────┘
▼
┌─────────────────────────────────────┐
│ api.amy.health │
│ Cloudflare Worker · Hono · │
│ @hono/zod-openapi │
│ │
│ /v1/turns /v1/sources │
│ /v1/data /v1/labs │
│ /v1/memory /v1/me │
│ /webhooks/terra │
│ /openapi.json /llms.txt │
└──┬────────┬──────────┬──────────────┘
│ │ │
▼ ▼ ▼
┌────────┐ ┌────┐ ┌──────────┐
│ D1 │ │ R2 │ │ KV │
│ rel. │ │ blob│ │ idemp │
│ data │ │ │ │ stream │
└────────┘ └────┘ └──────────┘
▲
│ step state read/written
│
┌───────┴──────────┐ ┌──────────────────────┐
│ CF Workflows │ ←──→ │ CF Queues │
│ · TurnWorkflow │ │ · terra-events │
│ · LabParse │ │ · workflow-dispatch │
└──────┬───────────┘ └──────────────────────┘
│
▼
Anthropic · Terra · OpenRouter · ClerkThe five layers
Layer 1 — Contract (the highest-use piece)
A single Zod schema package (packages/contracts/) defines every
request, response, and event. Every route imports from it; the OpenAPI
spec is auto-generated; the TypeScript SDK is auto-generated; the docs
embed the spec.
// packages/contracts/src/turns.ts
export const TurnCreate = z.object({
messages: z.array(MessageSchema),
stream: z.boolean().default(true),
});
export const Turn = z.object({
id: z.string().startsWith("turn_"),
status: z.enum(["queued", "running", "completed", "failed"]),
created_at: z.string().datetime(),
result: TurnResult.optional(),
});// apps/api/src/routes/turns.ts — Hono route consumes the contract
import { TurnCreate, Turn } from "@amy/contracts";
app.openapi(
{ method: "post", path: "/v1/turns",
request: { body: { content: { "application/json": { schema: TurnCreate } } } },
responses: { 201: { content: { "application/json": { schema: Turn } } } } },
async (c) => { /* dispatch to TurnWorkflow, return Turn */ }
);// any client (CLI, mobile, web) — fully typed
import { Amy } from "@amy/sdk";
const amy = new Amy({ apiKey: process.env.AMY_API_KEY });
const turn = await amy.turns.create({ messages: [...] });
for await (const event of amy.turns.stream(turn.id)) {
console.log(event.type, event);
}Why this matters: Once the contract is in place, adding a new client is a 30-minute job. Without it, every client re-defines the same types and drifts.
Layer 2 — API surface
Resource-oriented REST. Every resource follows the same shape: POST to
create, GET /:id to read, GET to list, PATCH /:id to update,
DELETE /:id to remove. Long-running creates return 202 Accepted with
a status URL; everything else is 200/201/204.
POST /v1/turns Start a turn → 201 { id, status: "queued" }
GET /v1/turns/:id Status + result
GET /v1/turns/:id/events SSE stream of turn events
GET /v1/turns List (cursor-paginated)
GET /v1/me Current user
PATCH /v1/me Update profile
GET /v1/sources List connected wearables
POST /v1/sources/terra/connect → { widget_url } for OAuth
DELETE /v1/sources/:provider Disconnect
POST /v1/labs Upload PDF (multipart) → 202 { id }
GET /v1/labs List uploads
GET /v1/labs/:id Status + parsed biomarkers
GET /v1/data/sync?cursor=... Delta sync (for offline clients)
GET /v1/data/biomarkers Timeseries query
GET /v1/data/summaries/:date Daily summary
GET /v1/memory Facts Amy remembers
POST /v1/memory Add a fact
DELETE /v1/memory/:id Remove a fact
POST /webhooks/terra Ingest (HMAC-verified)
POST /v1/auth/cli/start Device flow start
POST /v1/auth/cli/approve Device flow approve
GET /openapi.json Live OpenAPI 3.1
GET /llms.txt AI-agent index
GET /healthz LivenessConventions:
- Resource URIs are plural nouns.
- IDs are typed prefixes:
turn_…,lab_…,src_…,mem_…. Easy to grep, hard to confuse. - Cursor pagination on all list endpoints:
?cursor=…&limit=…→{ data: [...], next_cursor: "..." }. - Idempotency on all writes:
Idempotency-Key: <client-uuid>header. - Errors follow one shape:
{ "error": { "code": "turn_not_found", "message": "...", "request_id": "req_...", "docs_url": "https://docs.amy.health/concepts/errors#turn_not_found" } } - Versioning via URL path (
/v1/). When v2 ships, both run side-by-side; v1 deprecated with at least 6 months notice.
See API reference for the full schema of every endpoint.
Layer 3 — Compute model
The Worker handles everything that fits in <5s of wall time. Anything
longer is a Workflow.
TurnWorkflow
A turn is decomposed into discrete, retry-safe steps. Each step's output persists in the workflow's durable state; if step 5 fails on an Anthropic 5xx, the workflow resumes at step 5 — steps 1–4 don't replay.
step 1 classify_vagueness Sonnet ~3s
step 2 route Sonnet ~2s
step 3 rephrase_per_agent Sonnet ~2s
step 4 run_supporting_agents Opus, parallel ~30–90s each
step 5 run_main_agent Opus ~30–120s
step 6 reflection Sonnet ~5s
step 7 validation_gates deterministic + Critic ~10s
step 8 synthesis Opus, streams ~20s
step 9 memory_extraction Sonnet ~3s
step 10 finalize write Turn row, fire turn.completed eventFree wins from Workflows:
- Retry/resume on failure.
- Observability — every step's input/output visible in the CF dashboard.
- Replay — re-run a turn with the same inputs for debugging.
- Sleep + waitForEvent — paves the way for human-in-the-loop pauses ("Amy paused for your confirmation").
See Internals: Agent orchestration for the full step list and validation gate spec.
Streaming (the hardest call)
We want the "watch Amy think" UX from the CLI to work on every client.
v1 choice: SSE with KV-buffered events.
- Each workflow step writes events to
KV: stream:{turn_id}:{seq}and bumps acursorcounter. GET /v1/turns/:id/eventsis an SSE Worker that polls KV every 250ms and forwards new events to the client.- Supports
Last-Event-Idheader for resume after disconnect. - Works on curl, browser EventSource, React Native (with
react-native-event-source), SwiftURLSession, anything.
Cost: ~250ms event latency. Negligible for LLM streaming where tokens come in chunks anyway.
Upgrade path: swap the KV poll for a Durable Object per active turn that brokers events over WebSocket. The HTTP surface stays identical — clients don't notice the change.
See Concepts: Streaming for the full event type catalog and reconnect protocol.
Layer 4 — Storage
| Data | Store | Why |
|---|---|---|
| Users, sources, turns, biomarkers, daily summaries | D1 (SQLite) | Transactional, indexed, cheap |
| Lab PDFs, future audio recordings, exports | R2 | S3-compatible (Terra likes this), zero egress fees |
| Idempotency keys, stream event buffer | KV | Short TTL, edge-cached |
| Long-term agent memory (facts) | D1 + Vectorize (later) | Start relational; add vector retrieval when fuzzy lookups appear |
| Workflow step state | Workflow runtime | Managed by CF |
See Internals: Storage for the D1 schema, R2 layout, KV key patterns, and migration story.
Layer 5 — Developer experience
This is the layer that makes everything else worth it.
Repo layout (Bun workspaces — free monorepo)
amy/
├── apps/
│ ├── api/ ← the Cloudflare Worker (was cloud/)
│ ├── cli/ ← the existing CLI (was src/)
│ ├── docs/ ← Fumadocs site → docs.amy.health
│ ├── mobile/ ← (later) React Native app
│ └── web/ ← (later) marketing + dashboard
├── packages/
│ ├── contracts/ ← Zod schemas, the source of truth
│ ├── sdk-ts/ ← generated TS SDK (npm: @amy/sdk)
│ ├── sdk-py/ ← (later) PyPI: amy-sdk
│ ├── agents/ ← the runTurn pipeline (used by apps/api)
│ └── eval/ ← offline evals on agents/
└── tooling/
├── openapi-gen/ ← script: routes → openapi.json
└── llms-txt-gen/ ← script: docs → llms.txtThe CLI keeps working throughout. The first migration step is just
extracting packages/contracts/ from the duplicated schemas — purely a
refactor, no behavior change.
SDKs
| Language | How it's built | When it ships |
|---|---|---|
| TypeScript | Hono's hc() typed client wrapping fetch, generated at build time from the OpenAPI spec | Day one |
| Python | Fern free tier from OpenAPI | When the first Python user asks |
| Swift | Fern or Stainless | When the native iOS app starts |
Every SDK ships with:
- Fully typed methods (matching the contract package).
- Automatic retries with exponential backoff for transient failures.
- Auto-generated
Idempotency-Keyon writes (UUIDv4). - An async iterator for streaming endpoints.
- Typed error classes with stable
codefields.
See SDK reference.
Docs site
Fumadocs (open-source, Next.js, MDX)
served at docs.amy.health. Sections mirror this directory:
- Getting Started — 5 minutes to your first turn.
- Concepts — turns, streaming, memory, webhooks, errors.
- Guides — how-to articles, ordered.
- Recipes — end-to-end builds, including the mobile app one.
- API Reference — embedded Scalar component
fed from
/openapi.json. - SDK Reference — auto-generated from SDK source.
The AI-agent surface (the differentiator):
GET /llms.txt— llmstxt.org standard, indexes every doc page with a 1-line description.GET /llms-full.txt— all docs concatenated, for one-shot context loading.- Every docs page available as raw markdown at
<url>.md. - Every API error includes a
docs_urlpointing to the relevant page. - OpenAPI spec includes rich
description,examples, and per-languagex-codeSamplesfor every endpoint.
When Claude Code (or any agent) integrates with Amy: it fetches
llms.txt, picks the relevant pages, fetches them as .md, and writes
code. No HTML parsing, no scraping, no guessing.
Local dev loop
bun dev # wrangler dev (api) + docs site + cli watch — all in one
bun test # vitest across all packages
bun openapi # regenerate openapi.json from routes
bun sdk:gen # regenerate the TS SDK from the spec
bun docs:dev # docs site hot reloadEdit a route → SDK types update in <2s → CLI/docs reflect it. That's the
inner loop we're building toward.
Trade-offs (explicit)
| Decision | Choice | Cost |
|---|---|---|
| Schema language | Zod | Not as expressive as TypeSpec for API design, but already familiar and used everywhere in the codebase |
| Initial SDK | hc() for TS only | Other languages delayed until first ask |
| Streaming | SSE + KV poll | ~250ms latency vs WebSocket; trivial upgrade path |
| Orchestration | CF Workflows | Deeper Cloudflare lock-in (we were already deep) |
| Monorepo | Bun workspaces | More files to navigate; mitigated by a README per package |
| Docs site | Self-hosted Fumadocs | More setup than hosted Mintlify; zero recurring cost |
| Auth | Clerk | Vendor dependency; mitigated by hiding it behind our own /v1/me |
| Versioning | URL path (/v1/) | Less flexible than header-based; simpler to reason about |
| Adapters | Terra-first | Skips per-vendor OAuth dance; locked to Terra's coverage |
Migration sketch (current state → v1)
You're closer than it looks. The order matters; steps 1–3 are pure refactors that don't change behavior but unlock everything else.
- Carve out
packages/contracts/from the duplicated src/data/schema.ts and cloud/src/schema.ts. Both sides import from it. No behavior change. - Adopt
@hono/zod-openapiin the existing Worker. Refactor routes to declare their schemas; auto-generate/openapi.json. - Move the CLI client to
hc()from manual fetch. Delete the bespoke wrapper in src/cloud/client.ts. - Build
TurnWorkflow. Wrap the existingrunTurnfrom src/orchestrator/index.ts; step-decompose; persist state. AddPOST /v1/turnsandGET /v1/turns/:id/events. - Restructure to monorepo.
src/→apps/cli/,cloud/→apps/api/. Bun workspaces. - Stand up
apps/docs/with Fumadocs. Hook Scalar to/openapi.json. Write thellms.txtgenerator. - Deploy
api.amy.healthanddocs.amy.health. First real-world v1 use.
After (1–3) the codebase is dramatically easier to work with even before (4) lands. After (4) the first mobile screen is buildable.
What we'll revisit at scale
These are seams to watch as the system grows. Each has a documented upgrade path; none requires a rewrite.
- D1 row limits (~10 GB per database). Past ~10k active users, shard
by
user_idrange or move biomarker timeseries to ClickHouse or Tinybird. - KV-based streaming. Past ~100 concurrent active turns, the 250ms poll wastes reads. Cut over to Durable Object + WebSocket. The HTTP surface stays identical.
- Workflow step counts. Cloudflare has limits on steps per workflow (~100 today). If a turn balloons past ~30 steps, decompose into sub-workflows.
- Synthesis prompt size. Currently includes the full Fact Sheet. At some point use Workers AI Vectorize for selective retrieval.
- Eval infrastructure.
packages/eval/needs to grow into a real regression harness against frozen agent traces. This is critical before any agent change ships. - Multi-region. Workers are already edge-distributed; D1 is single-region. When global latency starts to matter, evaluate D1 read replicas (in beta) or partition by user region.
Where to next
- New to the system? Read Getting started.
- Building a mobile app? Read Build a mobile app.
- Looking for an endpoint? Read API reference.
- Adding a new wearable? Read Add a new adapter.
Amy — Documentation
The complete reference for Amy, the personal health agent. Backend on Cloudflare, SDK in TypeScript, and recipes for shipping clients on top of it.
API Reference
Every endpoint, every shape. The authoritative source is the live OpenAPI spec at /openapi.json on any deployed backend — this page mirrors it in human-readable form.