Amy
Concepts

Turns

A turn is one round-trip of the agent: the user asks, Amy answers, and every quantitative claim in the answer has been checked. It's the unit of work the backend dispatches, persists, streams, and bi…

A turn is one round-trip of the agent: the user asks, Amy answers, and every quantitative claim in the answer has been checked. It's the unit of work the backend dispatches, persists, streams, and bills.

A turn is what you POST to start, what you GET to read, and what you subscribe to with SSE. If you understand turns, you understand 80% of the API.

Quick navigation


What a turn is

   user message ──► [ turn_01HX... ] ──► answer  +  fact_sheet  +  trace

A turn carries one user message into the multi-agent pipeline and emerges with:

  • a plain-language answer (result.answer),
  • a fact sheet, every number the answer cites, with provenance (result.fact_sheet),
  • a trace of which agents ran, which validation gates fired, and what each cost,
  • new memory entries extracted from the exchange.

A turn is a Cloudflare Workflow under the hood. That means:

  • Durable. A worker restart, an Anthropic 5xx, or a code deploy mid-turn doesn't lose the turn, it resumes at the last successful step.
  • Observable. Every step's input/output is visible in the CF dashboard.
  • Replayable. Same inputs → same step decomposition → easy debugging.

You always reference a turn by its ID:

turn_01HX2K3M4N5P6Q7R8S9T0V1W2X

That ID is stable forever. The Turn row is kept indefinitely; the SSE event replay buffer is kept for 1 hour after completion (see Streaming).


The lifecycle

A turn moves through four statuses. Three are terminal-or-progressing; one (failed) is terminal.

stateDiagram-v2
  [*] --> queued: POST /v1/turns
  queued --> running: workflow dispatched
  running --> running: pipeline step N → N+1
  running --> completed: synthesis + verification passed
  running --> failed: unrecoverable error
  completed --> [*]
  failed --> [*]
StatusMeaningHas result?Has error?
queuedThe turn row exists; the workflow hasn't started its first step yet. Usually <1s.nono
runningOne of the 9 pipeline steps is executing. May last 2-7 minutes.nono
completedSynthesis finished and numeric verification passed. result is populated.yesno
failedAn unrecoverable error occurred (validation runner crashed, upstream timed out past the retry budget, the model returned a malformed plan after the cap).noyes

You will almost always see running for the bulk of a turn's life. Subscribe to /events to watch it progress; poll GET /v1/turns/:id if you only need the terminal state.

A turn cannot transition backwards. There is no paused state in v1, future versions may add human-in-the-loop pauses.


The pipeline (high level)

Internally, a turn decomposes into up to 9 steps. Each step is a discrete workflow operation: its output persists, and if step N fails it resumes at step N, steps 1 through N−1 do not replay.

 1  classify_vagueness     ~3s     pick the right path
 2  route                  ~2s     main + supporting agents
 3  rephrase_per_agent     ~2s     focused sub-questions
 4  run_supporting_agents  ~30–90s each (parallel)
 5  run_main_agent         ~30–120s
 6  reflection             ~5s     do we need follow-ups?
 7  validation_gates       ~10s    7 gates + Critic + Assessment
 8  synthesis              ~20s    streams to /events
 9  memory_extraction      ~3s     durable facts written

Two pre-step branches matter:

  • Vagueness = high ("is anything interesting in my data?") skips the standard route and runs the Hypothesis Investigator first. The Investigator scans the user's data, ranks candidate hypotheses, and feeds the top-K through the validator before answering.
  • Routing returns no agent triggers a plain conversational fallback , one cheap LLM call, no pipeline, no validator. (For "thanks!" and similar.)

For the full step-by-step contract, what each step reads, writes, and guarantees, see Internals: Agent orchestration.

Why this matters for clients

You don't need to call any of these directly. But understanding the shape explains things you'll observe:

  • Why does agent.thought start streaming ~10 seconds in? Steps 1-3 run first; the first sub-agent doesn't begin until step 4.
  • Why is the answer short for a vague question, then suddenly long? High vagueness routes through the Investigator first, its briefing is the answer for that turn (steps 5-8 don't run).
  • Why does validator output appear before the final answer? Step 7 always runs before step 8. Synthesis is constrained to the fact sheet, which doesn't exist until validation completes.

The Turn object

The full schema, lifted from the API reference:

{
  "id": "turn_01HX2K3M4N5P6Q7R8S9T0V1W2X",
  "status": "completed",
  "created_at": "2026-05-25T10:00:00Z",
  "completed_at": "2026-05-25T10:03:42Z",
  "messages": [
    { "role": "user", "content": "Is my sleep score drop meaningful?" }
  ],
  "result": {
    "answer": "Short answer: no — by the most defensible read of your own data, this isn't a clinically meaningful decline...",
    "fact_sheet": [
      { "claim": "ds-001.effect", "value": -0.374, "unit": null,
        "source": "data_science", "n": 87, "window": "last 90 days" },
      { "claim": "ds-001.ci_low", "value": -0.42, "unit": null,
        "source": "data_science", "n": 87, "window": "last 90 days" }
    ],
    "agents_used": ["data_science", "domain_expert"],
    "validator": {
      "findings_total": 3,
      "findings_validated": 2,
      "findings_conditional": 0,
      "findings_rejected": 1
    },
    "cost_usd": 0.1288,
    "duration_ms": 222000
  },
  "error": null
}
FieldTypeNotes
idstringTyped prefix turn_…, ULID under the prefix. Sortable by creation time.
statusenumqueued · running · completed · failed.
created_atISO-8601When the POST landed.
completed_atISO-8601 | nullSet when status flips to completed or failed.
messagesMessage[]The exact conversation you sent. Last message is always from user.
result.answerstringMarkdown. May be multi-paragraph. Safe to render directly.
result.fact_sheetFactEntry[]Every number the answer cites, with provenance. See What "completed" guarantees.
result.agents_usedstring[]data_science, domain_expert, health_coach, investigator.
result.validatorobjectAggregate validation counts.
result.cost_usdnumberTotal LLM spend across all steps.
result.duration_msnumberWall time from queue to completion.
errorobject | nullPopulated when status === "failed". See Errors.

result is null while status is not completed. error is non-null only when status === "failed".

The Message shape

{ "role": "user" | "assistant", "content": "string" }

content is plain text in v1. Future versions may add tool calls, attachments, and structured content; new fields will be additive.


Patterns

One-shot question

The simplest possible turn. No history, no streaming, no extras.

curl -X POST https://api.amy.health/v1/turns \
  -H "Authorization: Bearer $AMY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{ "role": "user", "content": "What is my average HRV this month?" }],
    "stream": false
  }'

With "stream": false the request blocks until the turn completes (up to ~7 minutes) and returns the full Turn object directly. No SSE, no polling.

Use this for batch jobs, scripts, and anywhere wall-time latency isn't the user's problem.

Conversational

Pass the entire conversation in messages on each turn. Amy doesn't maintain server-side conversation state, you do.

const history = [
  { role: "user",      content: "How was my sleep last week?" },
  { role: "assistant", content: "Average 7.2h, REM was lower than your..." },
  { role: "user",      content: "Why might REM have dropped?" },
];

const turn = await amy.turns.create({ messages: history });

The last message must be from user. Sending an assistant message last returns 400 invalid_request.

Memory (next section) is separate from conversation: memory persists across turns automatically; messages do not.

With / without memory injection

By default every turn injects the user's persistent memory (goals, preferences, prior insights) into the agent context. To opt out:

const turn = await amy.turns.create({
  messages: [...],
  context: { include_memory: false }
});

When to skip memory injection:

ReasonExample
You're running an evaluation and need isolated turns.Regression suite over a frozen agent.
The question is intentionally context-free."Explain HRV in 3 sentences."
You're testing a memory-extraction change.Want to see what Amy would remember without re-using prior memory.

The flip side, include_biomarkers: false, drops the latest lab snapshot from context. Both default to true.

Blocking vs streaming

Use streaming whenever a user is watching:

const turn = await amy.turns.create({ messages: [...] });  // stream: true by default

for await (const event of amy.turns.stream(turn.id)) {
  if (event.type === "agent.thought")  render(event.delta);
  if (event.type === "turn.completed") finish(event.result);
}

Use blocking when nothing's watching:

const turn = await amy.turns.create({ messages: [...], stream: false });
// turn.status === "completed", turn.result is populated.

See Streaming for the full SSE protocol.


Cost expectations

Turns vary by routing. Costs measured on Claude Sonnet for routing/reflection and Opus for the heavy agents. Your cost_usd will land in these bands for a typical user with 60-180 days of data.

Turn shapeTypical duration_msTypical cost_usd
Conversational fallback ("thanks!")1,500-3,000$0.001-$0.003
Direct data lookup ("avg RHR?")60,000-90,000$0.08-$0.15
Data + domain ("is LDL 124 a concern?")90,000-180,000$0.10-$0.25
Coaching ("plan my deep sleep recovery")120,000-240,000$0.15-$0.35
Investigator (vague: "anything to worry about?")180,000-360,000$0.30-$0.90
Reflection follow-ups added+30,000-90,000+$0.05-$0.20

A cost_warning event fires at AMY_COST_WARN_USD (default $0.50) but the turn does not abort, the warning is advisory. Set per-user caps server-side if you need a hard ceiling.

Concurrency cap is 3 in-flight turns per user. Exceeding it returns 429 concurrency_limit_exceeded (see Errors).


What "completed" guarantees

When status === "completed", the following invariants hold:

  1. result.answer is non-empty. Synthesis ran and emitted text.
  2. Every quantitative claim in result.answer is in result.fact_sheet. The fact sheet is the deterministic source of truth, synthesis is constrained to it, and a post-synthesis numeric verification pass flags any digit in the rendered reply that doesn't trace back. (One ledger entry, with a tolerance of 2% relative or 0.05 absolute, to accommodate rounding and ratio derivations.)
  3. Every fact sheet entry survived validation. Findings the Critic rejected do not enter the fact sheet. Conditional findings appear with a verdict: "conditional" tag and the answer is required to hedge.
  4. agents_used is the actual set of agents the orchestrator ran. Not the candidate set, the executed set.
  5. cost_usd is the sum of every LLM call charged to this turn, including failed retries. Use this for per-user billing or per-feature accounting.
  6. duration_ms is wall time from the workflow's first step beginning to the final memory write completing. It excludes queue time (queue → first step is typically <1s but can spike).

The flip side: if status === "failed", none of these hold. result is null; error.code tells you why. See Errors.


Common mistakes

Polling GET /v1/turns/:id instead of streaming

You'll pay for the latency twice: once in your poll loop, once in the KV-backed event buffer that exists anyway. Subscribe to /events once; the iterator handles reconnects for you. Polling is only for non-interactive use (cron jobs, retries after disconnect).

Treating queued or running as terminal

Both are transient. If you read a turn and status !== "completed" && status !== "failed", the work isn't done. result is null by design, not by bug.

Sending an assistant message last

messages[messages.length - 1].role must be "user". Sending an assistant message last looks like you're trying to make Amy continue its own thought, which isn't supported in v1. Returns 400 invalid_request.

Expecting deterministic results between turns

Two identical messages arrays will not produce byte-identical answers. The LLM is stochastic; the validator is deterministic but operates on LLM-emitted findings that vary by run. Use the fact sheet for deterministic claim-level comparisons, not the answer text.

Forgetting that conversation state is client-side

Amy does not store conversations. The Turn row stores the messages you sent for that turn, nothing more. If you want a multi-turn chat, your client maintains the array.

Reading result.fact_sheet.length to gauge confidence

A short fact sheet can mean a focused, high-confidence answer (one metric, one validated number) or a low-confidence answer where nothing survived validation. Read result.validator.findings_validated / findings_total for confidence, not length.

Assuming the SSE replay window is forever

It's 1 hour after completed_at in v1. After that, the events are GC'd from KV. The Turn row stays forever; only the live event stream expires. Persist events client-side if you need them long-term.

Cancelling a turn

There's no DELETE /v1/turns/:id for in-flight turns in v1. Once a turn is queued, it runs to completion or failure. You can close your SSE subscription, but the backend keeps working (and keeps charging). Use the per-user concurrency cap (3) as your back-pressure mechanism.


Where to next

On this page