Amy
Internals

Internals, Runtime

How Amy's cloud actually executes on Cloudflare. One Worker (cloud/) ties together D1, R2, KV, Queues, and Cron. There are no Workflows yet, async work runs via a Queue consumer in the same Worker.…

How Amy's cloud actually executes on Cloudflare. One Worker (cloud/) ties together D1, R2, KV, Queues, and Cron. There are no Workflows yet, async work runs via a Queue consumer in the same Worker. This page maps every binding, request path, and limit, grounded in cloud/wrangler.toml and cloud/src/.

Note: Architecture describes a TurnWorkflow and Workflows-based long-running step state. That is the target. The current deployed runtime is the simpler shape documented here, a single Worker with a Queue consumer and two crons. Migration sketch lives in architecture.md → "Migration sketch".


Quick navigation


The deploy unit

One Worker, one bundle. amy-cloud ships as a single Cloudflare Worker script, HTTP routes, Queue consumer, and Cron handler are all exports of the same module (cloud/src/index.ts, lines 86-96):

export default {
  fetch: app.fetch,                          // HTTP requests (Hono router)
  async queue(batch, env, ctx)  { ... },     // Queue consumer for terra-events
  async scheduled(controller, env, ctx) { }  // Cron triggers
};

That means a single wrangler deploy from cloud/ atomically replaces all three handlers. There is no separate "API service" and "worker service", there is one Worker, with three entry points.

PropertyValue
Worker nameamy-cloud (prod) / amy-cloud-dev (env=dev)
Entrycloud/src/index.ts
compatibility_date2026-05-12
compatibility_flagsnodejs_compat (needed by @clerk/backend)
Routing frameworkHono with the cors and logger middleware
Deploy commandcd cloud && bunx wrangler deploy
Local devwrangler dev (Bun is the install / script runner, not the runtime)
Tailwrangler tail (or the cloud:tail script in cloud/package.json)

Environments

Only one named environment exists: env.dev (amy-cloud-dev) which inherits all bindings from the top of wrangler.toml and just overrides ENVIRONMENT=development. Production is the unnamed default.

There is no staging environment today. If you need one, add an [env.staging] block in wrangler.toml mirroring the dev pattern.


Bindings

Every binding is declared in wrangler.toml and typed in cloud/src/types.ts:

export interface Env {
  DB: D1Database;                       // [[d1_databases]] binding = "DB"
  LAB_REPORTS: R2Bucket;                // [[r2_buckets]] binding = "LAB_REPORTS"
  TERRA_EVENTS: Queue<QueueMessage>;    // [[queues.producers]] binding = "TERRA_EVENTS"
  CACHE: KVNamespace;                   // [[kv_namespaces]] binding = "CACHE"
  // ... secrets and vars ...
}

D1: DBamy-db

[[d1_databases]]
binding = "DB"
database_name = "amy-db"
database_id = "f2c1dc51-6237-46b8-a86d-d6a52b988a42"
migrations_dir = "migrations"

The relational source of truth. Schema lives in cloud/migrations/ (see storage.md for the full layout). Migrations are applied with wrangler d1 migrations apply amy-db --remote (or --local for the local dev sqlite). The numeric prefix on each .sql file is the migration order.

R2: LAB_REPORTSamy-lab-reports

[[r2_buckets]]
binding = "LAB_REPORTS"
bucket_name = "amy-lab-reports"

Blob storage for uploaded lab PDFs/images. Layout, lifecycle, and Terra access are documented in storage.md → R2.

Queue: TERRA_EVENTSterra-events

[[queues.producers]]
binding = "TERRA_EVENTS"
queue = "terra-events"

[[queues.consumers]]
queue = "terra-events"
max_batch_size = 10
max_batch_timeout = 5
max_retries = 5
dead_letter_queue = "terra-events-dlq"

The Worker is both producer (the webhook handler enqueues; the cron drain re-enqueues) and consumer (queue() exports consumeTerraEvents). Messages carry the integer ID of a raw_events row plus an optional request_id for tracing:

// cloud/src/types.ts
export interface QueueMessage {
  rawEventId: number;
  request_id?: string;
}

DLQ: failed messages after 5 retries land on terra-events-dlq. The DLQ has no consumer today; it's a parking lot. See data-pipeline.md → Dead-letter queue for the manual triage procedure.

KV: CACHE

[[kv_namespaces]]
binding = "CACHE"
id = "b9b96afdc02c4d42b859d9a741f4959c"

Reserved for short-TTL caches and idempotency keys. The current code does not actively use it (no env.CACHE.get/put calls in cloud/src/), it's wired up for the upcoming idempotency middleware and the SSE stream buffer described in architecture.md → Streaming. Treat the binding as forward-compatible plumbing.

Cron

[triggers]
crons = ["*/5 * * * *", "0 3 * * *"]

Two schedules, dispatched by cloud/src/cron.ts:

ScheduleHandlerWhat it does
*/5 * * * *drainStuckEventsRe-enqueues up to 50 raw_events rows where processed_at IS NULL and received_at > now - 24h, excluding rows explicitly marked skipped:* or no_connection_for_terra_user. Safety net for queue.send hiccups and consumer crashes.
0 3 * * *reconcileRecentFor each active terra_connections row, calls requestBackfill for activity / sleep / daily / body over the last 7 days. Terra streams chunks back via webhook → ingest pipeline. Catches dropped webhooks.

Secrets

Provided by wrangler secret put (or .dev.vars for wrangler dev):

SecretUsed by
CLERK_SECRET_KEYclerkAuth middleware (cloud/src/middleware/clerk.ts)
CLERK_PUBLISHABLE_KEYInjected into the hosted CLI sign-in page HTML (routes/auth.ts)
TERRA_DEV_IDEvery Terra API call
TERRA_API_KEYEvery Terra API call
TERRA_WEBHOOK_SECRETverifyTerraSignature (cloud/src/lib/hmac.ts)
AMY_JWT_SECRETHS256 sign/verify of long-lived CLI JWTs (cloud/src/lib/amy-jwt.ts)
AMY_ADMIN_KEYGates /admin/* routes

Local dev pulls these from cloud/.dev.vars (gitignored). Use bun run sync-dev-vars (in cloud/) to regenerate .dev.vars from the root .env.

Vars

[vars]
ENVIRONMENT = "production"

Just one. Used in /healthz and admin responses; the dev environment overrides it to "development". Anything that varies between environments should go here, not in code.


Request flow

┌────────────────────────────────────────────────────────────────────────────┐
│  CLIENT                                                                    │
│  CLI · mobile (future) · Terra (webhook) · cron (Cloudflare-internal)      │
└─────────────────────────────────────┬──────────────────────────────────────┘
                                      │ HTTPS

┌────────────────────────────────────────────────────────────────────────────┐
│  Cloudflare edge: amy-cloud Worker (single isolate)                        │
│                                                                            │
│  ┌──── fetch() ───────────────────────────────────────────────────────┐    │
│  │  Hono router (cloud/src/index.ts)                                  │    │
│  │   1. requestId middleware → x-amy-request-id (UUID) on c + resp    │    │
│  │   2. hono/logger middleware                                        │    │
│  │   3. CORS for /v1/*                                                │    │
│  │   4. Route dispatch:                                               │    │
│  │       /, /webhook, /webhook/terra, /terra → handleTerraWebhook     │    │
│  │       /v1/me, /v1/connect, /v1/labs,                               │    │
│  │       /v1/sync, /v1/import      → clerkAuth → handler              │    │
│  │       /admin/*                  → x-admin-key check → handler      │    │
│  │       /cli/login, /v1/auth/*    → public / clerkAuth as needed     │    │
│  │       /healthz                  → 200                              │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                            │
│  ┌──── queue() ──────────────────────────────────────────────────────┐     │
│  │  consumeTerraEvents (cloud/src/queue/consumer.ts)                 │     │
│  │   for each msg in batch (max 10, max 5s wait):                    │     │
│  │     load raw_events row → normalizeEvent (dispatch by event_type) │     │
│  │     write to D1 normalized tables → mark processed_at             │     │
│  │     ack | retry (5 max) | DLQ                                     │     │
│  └───────────────────────────────────────────────────────────────────┘     │
│                                                                            │
│  ┌──── scheduled() ──────────────────────────────────────────────────┐     │
│  │  handleCron (cloud/src/cron.ts)                                   │     │
│  │    */5  → drainStuckEvents                                        │     │
│  │    3am  → reconcileRecent (per-user 7-day Terra backfill)         │     │
│  └───────────────────────────────────────────────────────────────────┘     │
└─────────────────────────────────────┬──────────────────────────────────────┘

       ┌──────────────┬───────────────┼───────────────┬──────────────┐
       ▼              ▼               ▼               ▼              ▼
   ┌───────┐      ┌──────┐      ┌─────────┐     ┌─────────┐    ┌─────────┐
   │  D1   │      │  R2  │      │ Queues  │     │   KV    │    │ Terra   │
   │ DB    │      │ LAB_ │      │  TERRA_ │     │  CACHE  │    │ API     │
   │       │      │ REPS │      │  EVENTS │     │         │    │         │
   └───────┘      └──────┘      └─────────┘     └─────────┘    └─────────┘

Step-by-step: a Terra wearable webhook

This is the load-bearing path. Everything else is variations on it.

  1. Terra POST lands on https://<worker>/ (or /webhook, /webhook/terra, /terra, all four mount the same handler, because Terra's dashboard Host field is hostname-only). See cloud/src/index.ts lines 46-49.

  2. requestId middleware assigns c.req.header("x-amy-request-id") ?? crypto.randomUUID() and echoes it in the response header. This id threads through every log line and the Queue message body.

  3. handleTerraWebhook (cloud/src/routes/webhook-terra.ts):

    • Reads the raw body bytes.
    • verifyTerraSignature(raw, header, TERRA_WEBHOOK_SECRET), HMAC-SHA256 check on ${t}.${raw} with a 5-minute replay window. Failures return 401 invalid_signature with { reason: ... }.
    • JSON-parses the body. Detects lab-report shape structurally (no type, has upload_id, has data array).
    • Computes dedup_key = sha256Hex(raw).
    • INSERT INTO raw_events ... RETURNING id, the UNIQUE (event_type, terra_user_id, dedup_key) index makes duplicate deliveries a no-op (the catch branch returns { ok: true, duplicate: true }).
    • c.executionCtx.waitUntil(env.TERRA_EVENTS.send({ rawEventId, request_id })) , fire-and-forget queue publish so the HTTP response can return immediately.
    • Returns 200 { ok: true, raw_event_id, type, request_id }.

    Total wall time at the edge: ~50ms (HMAC + one INSERT + a waitUntil'd queue write).

  4. Queue consumer wakes up within seconds. consumeTerraEvents loads the row, dispatches by event_type to cloud/src/normalize/*, writes to D1 normalized tables, and updates raw_events.processed_at. See data-pipeline.md for the full normalize logic.

  5. Logger writes both to console.log (captured by Workers observability) and to D1 trace_events via waitUntil. The request_id ties the journey end-to-end so /admin/traces?request_id=<uuid> returns every step.

Auth flow on /v1/* routes

client → Authorization: Bearer <token>


   clerkAuth middleware (cloud/src/middleware/clerk.ts)

         ├── looksLikeAmyToken(token)? → verifyAmyToken (HS256, AMY_JWT_SECRET)
         │     └── if valid → set userId = claims.sub, return

         └── otherwise → verifyToken from @clerk/backend (RS256 JWKS)
               └── if valid → set userId = claims.sub, set email


   if neither → 401 invalid_token
   else → insert users (id, email) on conflict do nothing  ← lazy upsert
   next()

Two token shapes coexist:

  • Clerk session JWTs (RS256, 60-second expiry), used by the browser page that mounts <SignIn>.
  • Amy JWTs (HS256, 30-day expiry), minted by POST /v1/auth/cli-approve after a Clerk-authenticated browser handshake, so the CLI can hold a long-lived credential without a browser. Format is unsurprising: header {alg:HS256,typ:JWT}, claims {sub, iat, exp, v:1}.

The lazy users upsert means the first authed call from a brand-new sign-up implicitly bootstraps the row that every other table FKs into. There is no explicit /v1/users create endpoint.


Cold start

Workers V8 isolates start cold whenever a region hasn't served traffic for this script. For amy-cloud the cold start cost is dominated by:

CostAmountNotes
Module evaluation~10-30msHono + @clerk/backend + ~30 source files.
First D1 query~50-80msAdds the connection latency on top of the query.
First Clerk JWKS fetch~100-300msverifyToken lazily pulls JWKS the first time. Cached in-isolate after.

In practice the first authed /v1/me after an idle period takes ~300-500ms; warm calls are 80-150ms. The webhook path stays under ~50ms even cold because it skips Clerk (HMAC-only auth).

There's no warm-up keep-alive today. The /healthz endpoint is cheap enough to use as one if monitoring polls it every 30s.


Per-resource limits

These limits matter and are where things break first. All are current as of the 2026-05-12 compatibility date.

Worker (paid plan)

LimitWhere it bites
30s CPU time per requestFine for HTTP. Long-running ingestion is offloaded to the Queue consumer, that runs in a separate Worker invocation per batch with its own 30s budget.
128 MB memoryHit if a single Terra batch payload is huge. The largest raw_events.payload observed is ~200 KB; batches of 10 = ~2 MB. Headroom is fine.
6 sub-request limitWebhook path: 1 D1 insert + 1 queue send = 2. me route: 1 Terra listSubscriptions + N D1 upserts + 1 select. Watch the upsert loop if a user has >5 connections.
6 outbound connectionsConcurrent fetches. The 3am cron does N × 4 sequential requestBackfill calls (Promise.all per type, sequential per connection), limited by the per-request sub-request cap, not connections.

D1

LimitWhere it bites
10 GB per databaseThe raw_events table is the bloat risk, every webhook payload is stored verbatim. At 10 events × 5 KB × 100 days/user × 10 users that's only ~50 MB, but if Terra spams large_request_processing chunks during a multi-year backfill, watch the row count.
100k writes/day on free tierNot relevant, we're on paid.
10 MB row sizeLab PDFs DO NOT land in D1, they go to R2. The raw_events.payload for a normal Terra event is ~5-50 KB, well under the cap.
Statement size: ~100 KBFine for our INSERTs. The biomarkers_wide view does MAX(CASE WHEN code = ...) for ~25 markers, a wide projection but short text.
SQLite single-writerAll ingest goes through the Queue consumer with max_batch_size=10. There's no fan-out write storm.

R2

LimitWhere it bites
5 GB per objectLab PDFs are capped at 10 MB upstream (routes/labs.ts:27). No issue.
Class A operations (write/list/delete): $4.50/MLab uploads are infrequent.
Class B operations (read): $0.36/MReads are only from Terra's OCR pass and admin debug; trivial.
No egress feesWhy R2 was chosen over S3.

KV

LimitWhere it bites
25 MB per valueStream events are tiny (~1 KB each). Fine.
1 KB key sizePatterns like stream:turn_xxx:00042 are ~25 bytes.
60s minimum TTLThe planned idempotency cache uses 24h.
Eventually consistent readsFirst-write-then-read in the same region usually sees the value within ~1s, but cross-region can lag. Don't use KV for strong consistency.

Queues

LimitWhere it bites
max_batch_size: 10Set in wrangler.toml. Higher batches risk exceeding the consumer's 30s CPU on a slow normalize.
max_batch_timeout: 5sWait up to 5s to fill a batch before consuming.
max_retries: 5After 5 redeliveries the message → DLQ. Combined with the cron drain, this means a Terra event has effectively unlimited retries within the first 24 hours.
Single in-flight batch per consumer instanceOne isolate processes one batch at a time. Cloudflare auto-scales consumer concurrency under load.
Message body 128 KBOur messages are { rawEventId, request_id }, under 100 bytes.

Cron

LimitWhere it bites
30s CPU per scheduled invocationdrainStuckEvents is bounded by DRAIN_LIMIT = 50 and one D1 query + one sendBatch. Fast. reconcileRecent is O(connections × 4) Terra HTTP calls; at ≤10 users this is ~40 sequential fetches, well within budget.
No fan-outEach cron tick is one Worker invocation per cron expression.

Logging and observability

Two layers, both written from cloud/src/lib/logger.ts:

1. console.log → Cloudflare Workers observability

[observability]
enabled = true

Every log.info(...) does console.log(JSON.stringify(rec)). Cloudflare's built-in observability captures these and surfaces them in:

  • wrangler tail, live tail of console output.
  • cloud:tail script (bun run scripts/cloud-logs.ts --tail), formatted wrapper around wrangler tail.
  • The Workers dashboard "Logs" tab.

This is the fastest way to see what just happened in production. There's no retention beyond Cloudflare's default (~24h).

2. D1 trace_events table → durable, queryable history

The same logger also INSERTs each record into D1:

-- cloud/migrations/0002_observability.sql
create table trace_events (
  id integer primary key autoincrement,
  ts text not null default (datetime('now')),
  request_id text, user_id text, terra_user_id text, raw_event_id integer,
  level text not null, event text not null, message text, data text,
  duration_ms integer, error_name text, error_stack text
);

Inserts are wrapped in ctx.waitUntil(...) when an ExecutionContext is available (HTTP handlers) so they don't add latency to the response path. Queue and cron callers omit ctx and let the promise dangle, Workers hold the isolate for pending I/O up to a limit, which is fine at our scale.

Query via the admin endpoints in cloud/src/routes/admin.ts:

# All trace events for one webhook (the full journey):
curl -H "x-admin-key: $AMY_ADMIN_KEY" \
  "https://api.amy.health/admin/traces?request_id=<uuid>"

# Errors only across the last 24h:
curl -H "x-admin-key: $AMY_ADMIN_KEY" \
  "https://api.amy.health/admin/traces?level=error"

# Recent activity for a single user:
curl -H "x-admin-key: $AMY_ADMIN_KEY" \
  "https://api.amy.health/admin/traces?user_id=user_2abc..."

The Logger.start(event) pattern emits a <event>_start row and returns a { end, fail } pair; calling .end(extra) writes a <event>_complete row with duration_ms set. That gives you free per-step timings:

const t = log.start("normalize");
const result = await normalizeEvent(env, row);
t.end({ ok: result.ok, rows: result.rows });

Observability hooks

HookWhere
x-amy-request-id response headerSet on every response by middleware/request-id.ts. Clients should log this for support reports.
/admin/healthzTop-line counters: users, active_connections, raw_events_total/unprocessed/errored/last_hour, daily_summary, activities, biomarkers, lab_uploads. The ok field flips false at >100 unprocessed or >20 errored.
/admin/dlqRecent rows where process_error IS NOT NULL (excluding skipped:*).
/admin/raw-events/:idFull row including the verbatim Terra payload, invaluable for repro.
/admin/user/:userIdPer-user snapshot of every counter + recent traces.

Logging gotchas

  • The logger never throws. D1 insert failures are caught and re-routed to a console.error("trace_events_insert_failed", ...).
  • data is JSON.stringify'd via a safeJson helper that falls back to String(x) if stringify fails. You can put anything in.
  • Logger emits before the waitUntil resolves, if the Worker is killed mid-request (rare), the D1 insert can be lost but console.log was already flushed.

Where to next

  • The end-to-end ingest path lives in data-pipeline.md, webhook → normalize → D1.
  • The 9-step agent pipeline (which currently runs in the CLI, not the Worker, but will move into a TurnWorkflow) is in agent-orchestration.md.
  • D1 schema, R2 layout, KV key patterns, and migration story are in storage.md.
  • The target architecture (Workflows, SSE streaming, /v1/turns) is in architecture.md, start there before adding long-running endpoints to this Worker.

On this page