tl;dr. query() is the entire entry point.
it takes a prompt (string or async iterable) and an options bag, and
returns an async iterable of typed events — assistant messages, tool
uses, tool results, the final summary. you write a for await
loop and discriminate on event.type. that's the loop.
one function, one shape
most agent libraries hand you a class. Agent, Runner,
Executor — pick a metaphor. the claude agent sdk hands you a
function. query() takes a config and returns an async iterable.
you iterate. each iteration is a structured event. there's no
.run(), no .start(), no event emitter to
subscribe to. for await is the entire api surface for
consumption.
import { query } from "@anthropic-ai/claude-agent-sdk";
const run = query({
prompt: "list the top-level files in the current directory",
options: { allowedTools: ["Glob"] },
});
for await (const event of run) {
// event is a discriminated union — narrow on .type
if (event.type === "assistant") {
for (const block of event.message.content) {
if (block.type === "text") {
process.stdout.write(block.text);
}
}
}
} behind that loop is a fixed lifecycle. the sdk sends your prompt to the model, the model decides whether to answer or call a tool, the sdk runs any tools the model wanted, feeds the results back, and repeats — until the model emits a final answer or a stopping condition fires. each step in that lifecycle becomes one event in your iterator.
five event types, in order
- system fires once at the start. carries the resolved configuration — model id, cwd, tool list, mcp servers, permission mode. assert on this in tests; it's how you know the sdk parsed your config the way you meant it.
- assistant
one per model turn. the
message.contentis an array of blocks:text,tool_use, sometimes both. you stream text to the user; the sdk handles the tool_use blocks. - user the sdk packaging a tool result back to the model as a user-role message. you don't produce these — you observe them. useful for telemetry: "what did Read return on that call?"
- stream_event
partial deltas during a model turn — token-level streaming. only
emitted if you set
includePartialMessages: true. off by default; turn it on for chat uis, leave it off for batch. - result
fires once at the end.
subtypetells you why the loop stopped —success,error_max_turns,error_during_execution. plus duration, turn count, token usage, cost.
break out of the for await early, the
iterator's return() is called and the run is cancelled
cleanly — no dangling api calls. that's a real api guarantee, not a
convention.
two prompt shapes
the simple form takes a string — one prompt, one run. the streaming form takes an async iterable that yields user messages, which lets you keep the loop alive across multiple user turns. the second form is what you reach for when wrapping the sdk in a chat ui, a repl, or any long-running session.
// streaming-input mode: pass an async iterable of user messages.
// useful for repls, chat uis, anything that's not "one prompt → one answer".
async function* messages() {
yield { type: "user", message: { role: "user", content: "hi" } };
// wait for the first turn to finish, then send another
await new Promise((r) => setTimeout(r, 100));
yield { type: "user", message: { role: "user", content: "list 3 files" } };
}
const run = query({ prompt: messages(), options: { allowedTools: ["Glob"] } });
for await (const ev of run) console.log(ev.type); stopping the loop
runs end one of three ways: the model emits a final answer (success), it
hits the maxTurns ceiling (error_max_turns), or you cancel
explicitly. run.interrupt() is the cancel — equivalent to
pressing esc in claude code, but exposed as a method so you
can wire it to a button, a timeout, or a websocket disconnect.
const run = query({ prompt: "...", options: { allowedTools: ["Bash"] } });
setTimeout(() => run.interrupt(), 5000); // hard stop after 5s
for await (const ev of run) {
if (ev.type === "result") console.log(ev.subtype); // "error_during_execution" if interrupted
} what to notice
- events are read-only. you can't mutate an assistant message in flight. to influence the loop you reach for permissions, hooks, or sub-agents — covered in the next four lessons.
- errors don't throw — they emit.
most failures (rate limit, tool error, max turns) come back as a
resultevent with a non-success subtype. only network- and config-level failures throw out of the iterator. - one query = one transcript.
the loop builds the conversation in memory. for multi-session
conversations,
resumea prior session id rather than stitching transcripts yourself.