sub-agents — context isolation · claude agent sdk

the Agent tool, when to spawn one, and what survives the handoff back. keeping the parent context clean.

tl;dr. the Agent tool lets the parent spawn a sub-agent — a fresh context window, a narrower toolset, and a focused job. only the sub-agent's summary returns to the parent. the technique that keeps long runs from drowning in tool output and lets you parallelise unrelated research.

why isolation

a single agent loop accumulates context. every tool result, every file read, every search hit lands in the parent's transcript. for a fast one-shot that's fine; for a multi-hour run it's death by paper-cuts — the model spends more and more tokens re-reading its own scrollback before it can think. sub-agents fix this by delegating: the sub-agent reads the file, the sub-agent gets the 1000-line search result, the sub-agent's whole context goes in the bin once it's done, and the parent only ever sees a one-paragraph summary.

same pattern as a process boundary. parent forks, child does the work, child exits, parent reads stdout. except here the "process" is another claude turn and "stdout" is a markdown summary. you can scope the sub-agent's toolset independently of the parent's, and gate its calls with the same permission rules.

defining sub-agents inline

the cleanest path: declare them on the options.agents bag. each entry is a spec — description, allowed tools, system prompt, optional model override. the parent gets the Agent tool wired up automatically and can dispatch them by name.

inline sub-agents — defined where they're used

import { query } from "@anthropic-ai/claude-agent-sdk";

const run = query({
  prompt: "find every TODO in src/, group them by file, and write a markdown summary",
  options: {
    allowedTools: ["Agent", "Glob", "Read", "Write"],
    agents: {
      "todo-finder": {
        description: "search a single file for TODO comments and return a list",
        tools: ["Read"],
        prompt: `read the given file. return a json array of { line, text } for every line containing 'TODO'. nothing else.`,
        model: "haiku",
      },
    },
  },
});

note the model: "haiku" on the sub-agent. cheap, fast sub-agents wrapped by a more capable parent is the pattern that drops both latency and cost without losing quality. the parent decides what to do; the sub-agent does the rote work.

defining sub-agents on disk

for sub-agents you reuse across runs, drop a markdown file under .claude/agents/. the sdk loads any matching file and exposes it the same way as an inline definition. version-controlled, diff-able, no inline string-walls in your application code.

.claude/agents/code-reviewer.md auto-loads at run time

// .claude/agents/code-reviewer.md
// ----------------------------------
// ---
// description: review a single typescript file for obvious issues
// tools: Read, Grep
// model: sonnet
// ---
//
// you are a typescript code reviewer. for the file you're given:
// 1. read it
// 2. flag any console.log, any 'any' type, any TODO older than the file's
//    age would suggest, and any function over 50 lines.
// 3. return a brief markdown list of issues. nothing else.

const run = query({
  prompt: "review every src/api/*.ts file",
  options: { allowedTools: ["Agent", "Glob"] },
  // the SDK auto-loads any .claude/agents/*.md as available sub-agents.
});

tool inheritance is opt-out, not opt-in. if a sub-agent doesn't list tools, it inherits the parent's full set. that's a footgun on long-running agents — always list the smallest toolset the sub-agent needs.

parallel fan-out

sub-agents shine on embarrassingly-parallel work. a parent agent can dispatch three or four Agent calls in one turn; the sdk runs them concurrently and stitches their summaries back into the parent's next turn. for research, scraping, and audit-style tasks this is the move that turns a 4-minute serial run into a 70-second parallel one.

three competitors, one parent turn, parallel sub-agent runs

// dispatching three sub-agents in parallel — each gets its own context
// window, only the summary returns to the parent. the parent stays small.
const run = query({
  prompt: `research three competitors (linear.app, height.app, shortcut.com).
for each, summarise pricing tiers and main differentiators in 5 bullets.
use one research-agent invocation per competitor — in parallel.`,
  options: {
    allowedTools: ["Agent", "WebFetch", "WebSearch"],
    agents: {
      "research-agent": {
        description: "research one company; return 5 bullets",
        tools: ["WebFetch", "WebSearch"],
        prompt: "research the company at the given url. return exactly 5 bullets.",
      },
    },
  },
});

what survives the handoff

only the sub-agent's final response reaches the parent. not the tool calls, not the intermediate reasoning, not the file contents it read along the way. that's a feature: it forces you to think about what the parent actually needs to know, and to write sub-agent prompts that produce summaries fit for that need.

what to notice

sub-agents are stateless across calls.two invocations of the same sub-agent share nothing. if you want continuity, pass it in the input each time — there is no implicit memory.
match model size to the job.a parent on sonnet/opus dispatching haiku sub-agents for grunt work is the right default. don't pay for opus tokens to count TODO comments.
deny Agent itself for sub-agents.unless you really want recursive sub-agent trees, leave Agent off the sub-agent's tool list. one level deep is almost always plenty.
name sub-agents like cli commands."todo-finder", "code-reviewer", "research-agent" — verbs and nouns. the model picks the right one from its name; vague names cause vague dispatches.