Turns
A turn is one round-trip of the agent: the user asks, Amy answers, and every quantitative claim in the answer has been checked. It's the unit of work the backend dispatches, persists, streams, and bi…
A turn is one round-trip of the agent: the user asks, Amy answers, and every quantitative claim in the answer has been checked. It's the unit of work the backend dispatches, persists, streams, and bills.
A turn is what you POST to start, what you GET to read, and what you
subscribe to with SSE. If you understand turns, you understand 80% of the
API.
Quick navigation
- What a turn is
- The lifecycle
- The pipeline (high level)
- The Turn object
- Patterns
- Cost expectations
- What "completed" guarantees
- Common mistakes
What a turn is
user message ──► [ turn_01HX... ] ──► answer + fact_sheet + traceA turn carries one user message into the multi-agent pipeline and emerges with:
- a plain-language answer (
result.answer), - a fact sheet, every number the answer cites, with provenance
(
result.fact_sheet), - a trace of which agents ran, which validation gates fired, and what each cost,
- new memory entries extracted from the exchange.
A turn is a Cloudflare Workflow under the hood. That means:
- Durable. A worker restart, an Anthropic 5xx, or a code deploy mid-turn doesn't lose the turn, it resumes at the last successful step.
- Observable. Every step's input/output is visible in the CF dashboard.
- Replayable. Same inputs → same step decomposition → easy debugging.
You always reference a turn by its ID:
turn_01HX2K3M4N5P6Q7R8S9T0V1W2XThat ID is stable forever. The Turn row is kept indefinitely; the SSE event replay buffer is kept for 1 hour after completion (see Streaming).
The lifecycle
A turn moves through four statuses. Three are terminal-or-progressing;
one (failed) is terminal.
stateDiagram-v2
[*] --> queued: POST /v1/turns
queued --> running: workflow dispatched
running --> running: pipeline step N → N+1
running --> completed: synthesis + verification passed
running --> failed: unrecoverable error
completed --> [*]
failed --> [*]| Status | Meaning | Has result? | Has error? |
|---|---|---|---|
queued | The turn row exists; the workflow hasn't started its first step yet. Usually <1s. | no | no |
running | One of the 9 pipeline steps is executing. May last 2-7 minutes. | no | no |
completed | Synthesis finished and numeric verification passed. result is populated. | yes | no |
failed | An unrecoverable error occurred (validation runner crashed, upstream timed out past the retry budget, the model returned a malformed plan after the cap). | no | yes |
You will almost always see running for the bulk of a turn's life.
Subscribe to /events to watch it progress; poll GET /v1/turns/:id if
you only need the terminal state.
A turn cannot transition backwards. There is no paused state in v1,
future versions may add human-in-the-loop pauses.
The pipeline (high level)
Internally, a turn decomposes into up to 9 steps. Each step is a discrete workflow operation: its output persists, and if step N fails it resumes at step N, steps 1 through N−1 do not replay.
1 classify_vagueness ~3s pick the right path
2 route ~2s main + supporting agents
3 rephrase_per_agent ~2s focused sub-questions
4 run_supporting_agents ~30–90s each (parallel)
5 run_main_agent ~30–120s
6 reflection ~5s do we need follow-ups?
7 validation_gates ~10s 7 gates + Critic + Assessment
8 synthesis ~20s streams to /events
9 memory_extraction ~3s durable facts writtenTwo pre-step branches matter:
- Vagueness =
high("is anything interesting in my data?") skips the standard route and runs the Hypothesis Investigator first. The Investigator scans the user's data, ranks candidate hypotheses, and feeds the top-K through the validator before answering. - Routing returns no agent triggers a plain conversational fallback , one cheap LLM call, no pipeline, no validator. (For "thanks!" and similar.)
For the full step-by-step contract, what each step reads, writes, and guarantees, see Internals: Agent orchestration.
Why this matters for clients
You don't need to call any of these directly. But understanding the shape explains things you'll observe:
- Why does
agent.thoughtstart streaming ~10 seconds in? Steps 1-3 run first; the first sub-agent doesn't begin until step 4. - Why is the answer short for a vague question, then suddenly long? High vagueness routes through the Investigator first, its briefing is the answer for that turn (steps 5-8 don't run).
- Why does validator output appear before the final answer? Step 7 always runs before step 8. Synthesis is constrained to the fact sheet, which doesn't exist until validation completes.
The Turn object
The full schema, lifted from the API reference:
{
"id": "turn_01HX2K3M4N5P6Q7R8S9T0V1W2X",
"status": "completed",
"created_at": "2026-05-25T10:00:00Z",
"completed_at": "2026-05-25T10:03:42Z",
"messages": [
{ "role": "user", "content": "Is my sleep score drop meaningful?" }
],
"result": {
"answer": "Short answer: no — by the most defensible read of your own data, this isn't a clinically meaningful decline...",
"fact_sheet": [
{ "claim": "ds-001.effect", "value": -0.374, "unit": null,
"source": "data_science", "n": 87, "window": "last 90 days" },
{ "claim": "ds-001.ci_low", "value": -0.42, "unit": null,
"source": "data_science", "n": 87, "window": "last 90 days" }
],
"agents_used": ["data_science", "domain_expert"],
"validator": {
"findings_total": 3,
"findings_validated": 2,
"findings_conditional": 0,
"findings_rejected": 1
},
"cost_usd": 0.1288,
"duration_ms": 222000
},
"error": null
}| Field | Type | Notes |
|---|---|---|
id | string | Typed prefix turn_…, ULID under the prefix. Sortable by creation time. |
status | enum | queued · running · completed · failed. |
created_at | ISO-8601 | When the POST landed. |
completed_at | ISO-8601 | null | Set when status flips to completed or failed. |
messages | Message[] | The exact conversation you sent. Last message is always from user. |
result.answer | string | Markdown. May be multi-paragraph. Safe to render directly. |
result.fact_sheet | FactEntry[] | Every number the answer cites, with provenance. See What "completed" guarantees. |
result.agents_used | string[] | data_science, domain_expert, health_coach, investigator. |
result.validator | object | Aggregate validation counts. |
result.cost_usd | number | Total LLM spend across all steps. |
result.duration_ms | number | Wall time from queue to completion. |
error | object | null | Populated when status === "failed". See Errors. |
result is null while status is not completed. error is
non-null only when status === "failed".
The Message shape
{ "role": "user" | "assistant", "content": "string" }content is plain text in v1. Future versions may add tool calls,
attachments, and structured content; new fields will be additive.
Patterns
One-shot question
The simplest possible turn. No history, no streaming, no extras.
curl -X POST https://api.amy.health/v1/turns \
-H "Authorization: Bearer $AMY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [{ "role": "user", "content": "What is my average HRV this month?" }],
"stream": false
}'With "stream": false the request blocks until the turn completes
(up to ~7 minutes) and returns the full Turn object directly. No SSE,
no polling.
Use this for batch jobs, scripts, and anywhere wall-time latency isn't the user's problem.
Conversational
Pass the entire conversation in messages on each turn. Amy doesn't
maintain server-side conversation state, you do.
const history = [
{ role: "user", content: "How was my sleep last week?" },
{ role: "assistant", content: "Average 7.2h, REM was lower than your..." },
{ role: "user", content: "Why might REM have dropped?" },
];
const turn = await amy.turns.create({ messages: history });The last message must be from user. Sending an assistant message
last returns 400 invalid_request.
Memory (next section) is separate from conversation: memory persists across turns automatically; messages do not.
With / without memory injection
By default every turn injects the user's persistent memory (goals, preferences, prior insights) into the agent context. To opt out:
const turn = await amy.turns.create({
messages: [...],
context: { include_memory: false }
});When to skip memory injection:
| Reason | Example |
|---|---|
| You're running an evaluation and need isolated turns. | Regression suite over a frozen agent. |
| The question is intentionally context-free. | "Explain HRV in 3 sentences." |
| You're testing a memory-extraction change. | Want to see what Amy would remember without re-using prior memory. |
The flip side, include_biomarkers: false, drops the latest lab snapshot
from context. Both default to true.
Blocking vs streaming
Use streaming whenever a user is watching:
const turn = await amy.turns.create({ messages: [...] }); // stream: true by default
for await (const event of amy.turns.stream(turn.id)) {
if (event.type === "agent.thought") render(event.delta);
if (event.type === "turn.completed") finish(event.result);
}Use blocking when nothing's watching:
const turn = await amy.turns.create({ messages: [...], stream: false });
// turn.status === "completed", turn.result is populated.See Streaming for the full SSE protocol.
Cost expectations
Turns vary by routing. Costs measured on Claude Sonnet for routing/reflection
and Opus for the heavy agents. Your cost_usd will land in these
bands for a typical user with 60-180 days of data.
| Turn shape | Typical duration_ms | Typical cost_usd |
|---|---|---|
| Conversational fallback ("thanks!") | 1,500-3,000 | $0.001-$0.003 |
| Direct data lookup ("avg RHR?") | 60,000-90,000 | $0.08-$0.15 |
| Data + domain ("is LDL 124 a concern?") | 90,000-180,000 | $0.10-$0.25 |
| Coaching ("plan my deep sleep recovery") | 120,000-240,000 | $0.15-$0.35 |
| Investigator (vague: "anything to worry about?") | 180,000-360,000 | $0.30-$0.90 |
| Reflection follow-ups added | +30,000-90,000 | +$0.05-$0.20 |
A cost_warning event fires at AMY_COST_WARN_USD (default $0.50) but
the turn does not abort, the warning is advisory. Set per-user
caps server-side if you need a hard ceiling.
Concurrency cap is 3 in-flight turns per user. Exceeding it returns
429 concurrency_limit_exceeded (see Errors).
What "completed" guarantees
When status === "completed", the following invariants hold:
result.answeris non-empty. Synthesis ran and emitted text.- Every quantitative claim in
result.answeris inresult.fact_sheet. The fact sheet is the deterministic source of truth, synthesis is constrained to it, and a post-synthesis numeric verification pass flags any digit in the rendered reply that doesn't trace back. (One ledger entry, with a tolerance of 2% relative or 0.05 absolute, to accommodate rounding and ratio derivations.) - Every fact sheet entry survived validation. Findings the Critic
rejected do not enter the fact sheet. Conditional findings appear
with a
verdict: "conditional"tag and the answer is required to hedge. agents_usedis the actual set of agents the orchestrator ran. Not the candidate set, the executed set.cost_usdis the sum of every LLM call charged to this turn, including failed retries. Use this for per-user billing or per-feature accounting.duration_msis wall time from the workflow's first step beginning to the final memory write completing. It excludes queue time (queue → first step is typically<1sbut can spike).
The flip side: if status === "failed", none of these hold. result
is null; error.code tells you why. See Errors.
Common mistakes
Polling GET /v1/turns/:id instead of streaming
You'll pay for the latency twice: once in your poll loop, once in the
KV-backed event buffer that exists anyway. Subscribe to /events
once; the iterator handles reconnects for you. Polling is only for
non-interactive use (cron jobs, retries after disconnect).
Treating queued or running as terminal
Both are transient. If you read a turn and status !== "completed" && status !== "failed", the work isn't done. result is null by design,
not by bug.
Sending an assistant message last
messages[messages.length - 1].role must be "user". Sending an
assistant message last looks like you're trying to make Amy continue
its own thought, which isn't supported in v1. Returns 400 invalid_request.
Expecting deterministic results between turns
Two identical messages arrays will not produce byte-identical answers.
The LLM is stochastic; the validator is deterministic but operates on
LLM-emitted findings that vary by run. Use the fact sheet for
deterministic claim-level comparisons, not the answer text.
Forgetting that conversation state is client-side
Amy does not store conversations. The Turn row stores the messages
you sent for that turn, nothing more. If you want a multi-turn chat,
your client maintains the array.
Reading result.fact_sheet.length to gauge confidence
A short fact sheet can mean a focused, high-confidence answer (one
metric, one validated number) or a low-confidence answer where nothing
survived validation. Read result.validator.findings_validated / findings_total for confidence, not length.
Assuming the SSE replay window is forever
It's 1 hour after completed_at in v1. After that, the events are
GC'd from KV. The Turn row stays forever; only the live event stream
expires. Persist events client-side if you need them long-term.
Cancelling a turn
There's no DELETE /v1/turns/:id for in-flight turns in v1. Once a
turn is queued, it runs to completion or failure. You can close your
SSE subscription, but the backend keeps working (and keeps charging).
Use the per-user concurrency cap (3) as your back-pressure mechanism.
Where to next
- Streaming, the SSE event catalog and reconnect protocol.
- Memory, what Amy remembers across turns.
- Errors, every error code and how to recover.
- API reference: Turns, the full endpoint reference.
- Internals: Agent orchestration, the per-step contract for the 9-step pipeline.
- SDK: TypeScript, typed
amy.turnsmethods.
API Reference
Every endpoint, every shape. The authoritative source is the live OpenAPI spec at /openapi.json on any deployed backend, this page mirrors it in human-readable form.
Streaming
Amy streams every step of a turn, routing, agent thoughts, validator verdicts, the synthesised answer, over Server-Sent Events. One HTTP connection, one durable replay buffer, no websockets require…