the query loop — your first agent · claude agent sdk

query() as an async generator. message events, tool-use events, and the shape of one full agent turn.

tl;dr. query() is the entire entry point. it takes a prompt (string or async iterable) and an options bag, and returns an async iterable of typed events — assistant messages, tool uses, tool results, the final summary. you write a for await loop and discriminate on event.type. that's the loop.

one function, one shape

most agent libraries hand you a class. Agent, Runner, Executor — pick a metaphor. the claude agent sdk hands you a function. query() takes a config and returns an async iterable. you iterate. each iteration is a structured event. there's no .run(), no .start(), no event emitter to subscribe to. for await is the entire api surface for consumption.

the smallest useful loop

import { query } from "@anthropic-ai/claude-agent-sdk";

const run = query({
  prompt: "list the top-level files in the current directory",
  options: { allowedTools: ["Glob"] },
});

for await (const event of run) {
  // event is a discriminated union — narrow on .type
  if (event.type === "assistant") {
    for (const block of event.message.content) {
      if (block.type === "text") {
        process.stdout.write(block.text);
      }
    }
  }
}

behind that loop is a fixed lifecycle. the sdk sends your prompt to the model, the model decides whether to answer or call a tool, the sdk runs any tools the model wanted, feeds the results back, and repeats — until the model emits a final answer or a stopping condition fires. each step in that lifecycle becomes one event in your iterator.

five event types, in order

system fires once at the start. carries the resolved configuration — model id, cwd, tool list, mcp servers, permission mode. assert on this in tests; it's how you know the sdk parsed your config the way you meant it.
assistant one per model turn. the message.content is an array of blocks: text, tool_use, sometimes both. you stream text to the user; the sdk handles the tool_use blocks.
user the sdk packaging a tool result back to the model as a user-role message. you don't produce these — you observe them. useful for telemetry: "what did Read return on that call?"
stream_event partial deltas during a model turn — token-level streaming. only emitted if you set includePartialMessages: true. off by default; turn it on for chat uis, leave it off for batch.
result fires once at the end. subtype tells you why the loop stopped — success, error_max_turns, error_during_execution. plus duration, turn count, token usage, cost.

iterator semantics matter. if you break out of the for await early, the iterator's return() is called and the run is cancelled cleanly — no dangling api calls. that's a real api guarantee, not a convention.

two prompt shapes

the simple form takes a string — one prompt, one run. the streaming form takes an async iterable that yields user messages, which lets you keep the loop alive across multiple user turns. the second form is what you reach for when wrapping the sdk in a chat ui, a repl, or any long-running session.

streaming-input — multi-turn sessions

// streaming-input mode: pass an async iterable of user messages.
// useful for repls, chat uis, anything that's not "one prompt → one answer".
async function* messages() {
  yield { type: "user", message: { role: "user", content: "hi" } };
  // wait for the first turn to finish, then send another
  await new Promise((r) => setTimeout(r, 100));
  yield { type: "user", message: { role: "user", content: "list 3 files" } };
}

const run = query({ prompt: messages(), options: { allowedTools: ["Glob"] } });
for await (const ev of run) console.log(ev.type);

stopping the loop

runs end one of three ways: the model emits a final answer (success), it hits the maxTurns ceiling (error_max_turns), or you cancel explicitly. run.interrupt() is the cancel — equivalent to pressing esc in claude code, but exposed as a method so you can wire it to a button, a timeout, or a websocket disconnect.

hard stop after 5 seconds

const run = query({ prompt: "...", options: { allowedTools: ["Bash"] } });

setTimeout(() => run.interrupt(), 5000); // hard stop after 5s

for await (const ev of run) {
  if (ev.type === "result") console.log(ev.subtype); // "error_during_execution" if interrupted
}

what to notice

events are read-only. you can't mutate an assistant message in flight. to influence the loop you reach for permissions, hooks, or sub-agents — covered in the next four lessons.
errors don't throw — they emit. most failures (rate limit, tool error, max turns) come back as a result event with a non-success subtype. only network- and config-level failures throw out of the iterator.
one query = one transcript. the loop builds the conversation in memory. for multi-session conversations, resume a prior session id rather than stitching transcripts yourself.