shipping — cli, server, ide · claude agent sdk

packaging an agent as a cli, embedding it in a backend, or driving it from an ide. the diff to production.

tl;dr. the same query() call ships three different ways: a one-shot cli, a server endpoint streaming sse, an ide extension routing permissions through editor ux. the agent loop doesn't change. what changes is the surface around it — settings inheritance, permission ux, error handling, and what you do with the events.

the only thing that's different per surface

if you've gotten this far, you've seen every primitive the sdk exposes. shipping isn't another primitive — it's a question of how those primitives compose with the surface around them. three surfaces, three sets of correct answers.

cli — the developer surface

a script. argv parsing, an env file, no auth, a tty for stdout. the surface where you'd run a refactor, a one-shot review, a release note generator. the agent runs as the developer, with the developer's settings.

a one-shot cli — settings come from the project on disk

#!/usr/bin/env node
// scripts/agent.ts — a one-shot cli wrapper.
// usage: ./agent.ts "do the thing"
import { query } from "@anthropic-ai/claude-agent-sdk";

const prompt = process.argv.slice(2).join(" ");
if (!prompt) { console.error("usage: agent.ts <prompt>"); process.exit(1); }

const run = query({
  prompt,
  options: {
    permissionMode: "acceptEdits",
    settingSources: ["project"],
    allowedTools: ["Read", "Edit", "Glob", "Grep", "Bash"],
  },
});

for await (const ev of run) {
  if (ev.type === "assistant") {
    for (const b of ev.message.content) {
      if (b.type === "text") process.stdout.write(b.text);
    }
  }
  if (ev.type === "result" && ev.subtype !== "success") {
    process.stderr.write(`agent ended: ${ev.subtype}\n`);
    process.exit(2);
  }
}

settingSources: ["project"]project conventions matter; user-level claude code preferences usually don't.
acceptEdits is fine herethe developer is at the terminal. ask-style permissions create unwanted prompts; trust the rules in the project's .claude/settings.json.
exit codes matteragents in ci pipe their result to $?. write the exit code from the result event's subtype.

server — the production surface

an http endpoint, probably streaming sse. running on a vps, in a container, behind a load balancer. zero developer context — the agent has to be self-contained. this is where most permission bugs ship to production.

hono + sse — the canonical streaming agent endpoint

// server-side: an http endpoint that streams sse to the browser.
// the http surface is yours; the agent is the same query() call.
import { Hono } from "hono";
import { streamSSE } from "hono/streaming";
import { query } from "@anthropic-ai/claude-agent-sdk";

const app = new Hono();

app.post("/agent", async (c) => {
  const { prompt } = await c.req.json();
  return streamSSE(c, async (stream) => {
    const run = query({
      prompt,
      options: {
        permissionMode: "default",
        // SERVER-SIDE: never inherit user/local settings. project only.
        settingSources: ["project"],
        // canUseTool is your bouncer for anything destructive.
        canUseTool: async (name, input) =>
          name === "Bash"
            ? { behavior: "deny", message: "no shell from the http agent" }
            : { behavior: "allow", updatedInput: input },
      },
    });

    for await (const ev of run) {
      await stream.writeSSE({ data: JSON.stringify(ev), event: ev.type });
    }
  });
});

export default { fetch: app.fetch, port: 8787 };

never inherit user / local settingsa server has no developer, no ~/.claude/. the only valid settingSources is ["project"] or omitted.
canUseTool is mandatoryon a server, "ask the user" is meaningless and the absence of a callback means defaults. write a deliberate one — even if it just denies Bash.
timeout the runwrap the iterator in AbortSignal.timeout() or call run.interrupt() from a setTimeout. an agent that loops in production is an unbounded api bill.
isolate the working dirset cwd to a temp folder per request. don't let one user's run see another user's files.

ide — the editor surface

a vs code extension, a jetbrains plugin, a custom editor. the agent has rich ux: showInputBox, showWarningMessage, output channels, diff views. permission ux belongs in the editor, not in stdout.

vs code — canUseTool routes through showWarningMessage

// vs code extension — drive the same sdk, surface events in a webview.
// the agent doesn't know it's in an editor; the editor knows it has an agent.
import * as vscode from "vscode";
import { query } from "@anthropic-ai/claude-agent-sdk";

export function activate(ctx: vscode.ExtensionContext) {
  ctx.subscriptions.push(
    vscode.commands.registerCommand("scalable.agent.run", async () => {
      const prompt = await vscode.window.showInputBox({ prompt: "agent prompt" });
      if (!prompt) return;
      const out = vscode.window.createOutputChannel("scalable agent");
      out.show();

      const run = query({
        prompt,
        options: {
          cwd: vscode.workspace.workspaceFolders?.[0]?.uri.fsPath,
          permissionMode: "acceptEdits",
          settingSources: ["project", "user"],
          // route ask-permission through the editor, not stdin.
          canUseTool: async (name, input) => {
            const ok = await vscode.window.showWarningMessage(
              `Run ${name}? ${JSON.stringify(input).slice(0, 80)}`,
              "allow", "deny",
            );
            return ok === "allow"
              ? { behavior: "allow", updatedInput: input }
              : { behavior: "deny", message: "user denied" };
          },
        },
      });

      for await (const ev of run) {
        if (ev.type === "assistant") {
          for (const b of ev.message.content) {
            if (b.type === "text") out.append(b.text);
          }
        }
      }
    }),
  );
}

canUseTool is your permission uxthe user is one click away. surface tool calls in the editor's native dialogs, not in a terminal.
cwd is the workspace rootvscode.workspace.workspaceFolders[0].uri.fsPath — anchoring the agent to the open workspace prevents accidental writes elsewhere.
project + user settings ok hereunlike a server, an ide extension runs as the developer. inheriting user-level claude code preferences is part of why they installed the extension.

what doesn't change

everything you've learned about tool selection, permissions, hooks, sub-agents, and system prompts and skills applies in all three surfaces. the agent loop doesn't care whether the next event is going to a tty, a websocket, or a webview. that's the point of the sdk — one loop, many surfaces.

operational checklist

api key handlingalways env-driven (ANTHROPIC_API_KEY). for ide extensions, store via the editor's secret storage; never check in.
cost guardrailsread result.total_cost_usd and result.usage on every run. log the totals; cap with rate-limiting at the surface, not in the loop.
observabilitythe event stream is the trace. ship it to your logs as ndjson, with a session id field, and you've got per-turn replay for free.
model pinningproduction agents pin a specific model id (e.g. claude-sonnet-4-6). never use the family alias for prod — ship the explicit version, bump on a calendar.
replay your own runssave the prompt + the events to disk. when an agent does something weird, replaying with the same model and same input is the fastest path to a fix.

the harness is the product. in eight lessons we've never taught you a prompt-engineering trick or a clever model parameter. the value of the sdk is the harness: permissions, hooks, sub-agents, the event stream. when an agent behaves well in production, that's almost always why.