tl;dr. the Agent tool lets the parent
spawn a sub-agent — a fresh context window, a narrower toolset, and
a focused job. only the sub-agent's summary returns to the
parent. the technique that keeps long runs from drowning in tool
output and lets you parallelise unrelated research.
why isolation
a single agent loop accumulates context. every tool result, every file read, every search hit lands in the parent's transcript. for a fast one-shot that's fine; for a multi-hour run it's death by paper-cuts — the model spends more and more tokens re-reading its own scrollback before it can think. sub-agents fix this by delegating: the sub-agent reads the file, the sub-agent gets the 1000-line search result, the sub-agent's whole context goes in the bin once it's done, and the parent only ever sees a one-paragraph summary.
same pattern as a process boundary. parent forks, child does the work, child exits, parent reads stdout. except here the "process" is another claude turn and "stdout" is a markdown summary. you can scope the sub-agent's toolset independently of the parent's, and gate its calls with the same permission rules.
defining sub-agents inline
the cleanest path: declare them on the options.agents
bag. each entry is a spec — description, allowed tools, system
prompt, optional model override. the parent gets the
Agent tool wired up automatically and can dispatch
them by name.
import { query } from "@anthropic-ai/claude-agent-sdk";
const run = query({
prompt: "find every TODO in src/, group them by file, and write a markdown summary",
options: {
allowedTools: ["Agent", "Glob", "Read", "Write"],
agents: {
"todo-finder": {
description: "search a single file for TODO comments and return a list",
tools: ["Read"],
prompt: `read the given file. return a json array of { line, text } for every line containing 'TODO'. nothing else.`,
model: "haiku",
},
},
},
});
note the model: "haiku" on the sub-agent. cheap, fast
sub-agents wrapped by a more capable parent is the pattern that
drops both latency and cost without losing quality. the parent
decides what to do; the sub-agent does the rote work.
defining sub-agents on disk
for sub-agents you reuse across runs, drop a markdown file under
.claude/agents/. the sdk loads any matching file and
exposes it the same way as an inline definition. version-controlled,
diff-able, no inline string-walls in your application code.
// .claude/agents/code-reviewer.md
// ----------------------------------
// ---
// description: review a single typescript file for obvious issues
// tools: Read, Grep
// model: sonnet
// ---
//
// you are a typescript code reviewer. for the file you're given:
// 1. read it
// 2. flag any console.log, any 'any' type, any TODO older than the file's
// age would suggest, and any function over 50 lines.
// 3. return a brief markdown list of issues. nothing else.
const run = query({
prompt: "review every src/api/*.ts file",
options: { allowedTools: ["Agent", "Glob"] },
// the SDK auto-loads any .claude/agents/*.md as available sub-agents.
}); tools, it inherits the
parent's full set. that's a footgun on long-running agents — always
list the smallest toolset the sub-agent needs.
parallel fan-out
sub-agents shine on embarrassingly-parallel work. a parent agent can
dispatch three or four Agent calls in one turn; the sdk
runs them concurrently and stitches their summaries back into the
parent's next turn. for research, scraping, and audit-style tasks
this is the move that turns a 4-minute serial run into a 70-second
parallel one.
// dispatching three sub-agents in parallel — each gets its own context
// window, only the summary returns to the parent. the parent stays small.
const run = query({
prompt: `research three competitors (linear.app, height.app, shortcut.com).
for each, summarise pricing tiers and main differentiators in 5 bullets.
use one research-agent invocation per competitor — in parallel.`,
options: {
allowedTools: ["Agent", "WebFetch", "WebSearch"],
agents: {
"research-agent": {
description: "research one company; return 5 bullets",
tools: ["WebFetch", "WebSearch"],
prompt: "research the company at the given url. return exactly 5 bullets.",
},
},
},
}); what survives the handoff
only the sub-agent's final response reaches the parent. not the tool calls, not the intermediate reasoning, not the file contents it read along the way. that's a feature: it forces you to think about what the parent actually needs to know, and to write sub-agent prompts that produce summaries fit for that need.
what to notice
- sub-agents are stateless across calls.two invocations of the same sub-agent share nothing. if you want continuity, pass it in the input each time — there is no implicit memory.
- match model size to the job.a parent on sonnet/opus dispatching haiku sub-agents for grunt work is the right default. don't pay for opus tokens to count TODO comments.
- deny Agent itself for sub-agents.unless you really want recursive sub-agent trees, leave Agent off the sub-agent's tool list. one level deep is almost always plenty.
- name sub-agents like cli commands."todo-finder", "code-reviewer", "research-agent" — verbs and nouns. the model picks the right one from its name; vague names cause vague dispatches.