extensions — wrapping the base loop · agentic patterns

memory injection, retrieval, safety, telemetry. the pattern equivalent of middleware.

the middleware pattern, for agents

extensions are the middleware of the agent world. they wrap the base loop — the thing that takes a message, runs the model, maybe calls a skill, returns a response — and layer in everything that's around the core job: memory injection, retrieval, retries, rate limits, caching, telemetry, safety filters, token budgeting. if you've written express middleware or a rails interceptor, this is that, for agent turns.

the reason to call them a pattern, not just "some code you wrap things with": treating them as composable (ctx, next) ⇒ result objects is what lets you add and remove concerns without touching the loop, and lets you reason about them independently.

composition — retrieval, telemetry, safety

wrap a base loop with three extensions — click run

ready

stdout

error

three small rules keep extensions clean and debuggable:

one concern per extension. retrieval is one thing, telemetry is another, safety is a third. an extension that does two things becomes the extension that always needs changing.
mutate ctx, don't replace it. extensions share a ctx object. adding to it is fine (see how withRetrieval writes to ctx.retrieved); swapping it for a new object breaks the chain because downstream extensions already have references to the old one.
call next exactly once. zero calls short-circuits the loop (sometimes you want this — a cache hit, a hard block). two calls runs the agent twice and produces duplicate telemetry. one is the contract.

order matters — the onion

observe the onion — before → before → before → base → after → after → after

ready

stdout

error

outer extensions see everything — including what inner extensions did. telemetry usually goes outermost so it captures retries; safety often goes just inside retrieval so it can inspect what the model produced with context. there's no universal stack, just the one you document for your team.

why this is worth the layer. the alternative is mixing retrieval logic into the agent's own prompt construction, then mixing telemetry into the tool-call code, then mixing safety into the response handler. six months later nobody can reason about the turn because it has twelve responsibilities in one function. extensions are the pattern that keeps the core boring. the agents i've had to put on call for are the ones where the turn function grew responsibilities; the ones i still trust are the ones where every cross-cutting concern lives in its own layer.

when it breaks

hidden state across extensions. extension A stashes something on ctx that extension C relies on. now the order of A and C is load-bearing and invisible. document the contract or use a typed ctx with explicit fields.
retry extensions that don't know about idempotency. a naive retry layer will re-send the same tool call that already charged a card. either make the skills idempotent or make the retry layer aware of which skills are safe to replay.
safety at the wrong layer. moderation run before the model doesn't catch the model's own output. moderation run after tool calls doesn't stop the tool call that already happened. pick the point in the turn that actually corresponds to what you're trying to prevent.

next: personas — the bundle of prompt, voice, allowed skills, and guardrails that makes one loop behave like many different agents.