scalable/ai/agentic patterns/lesson 06 cph / /
lesson 06 / 07 · 13 min · updated ·

loops — react, reflection, and when to stop

the control flow inside a single agent turn. stopping criteria, budget, and failure modes.

the control flow inside a turn

loops are the internal cycle a single agent runs to solve a problem: think, act, observe, repeat, until it has an answer or has run out of budget. the react pattern — reasoning plus acting — is the canonical shape. everything else (reflection, plan-and-execute, tree-of-thought) is a variation on who gets to think, when, and how many times.

the pattern worth internalising is not a specific loop, but the discipline around any loop: every one needs an explicit budget, an explicit stopping rule, and observability that lets you see what the loop actually did on a bad day.

react — think, act, observe, repeat

a two-turn react loop — click run, watch the trace
ready

four things this loop gets right:

stopping rules — watch the no-progress rule fire

a stuck loop — no-progress rule stops it before maxTurns
ready

a real production loop has several stopping rules: turn budget, wall clock, token ceiling, and the one teams most often skip — no-progress. that's the rule that says "if the agent has made the same action three times in a row, stop". it catches the loop burning $40 on a single conversation because the model decided a flaky api must be retried. i learned this the expensive way the first time an upstream provider started 500ing mid-request.

reflection — the one extra loop worth adding

critique and revise — a single pass of reflection
ready

reflection is a second pass where a critic — the same persona or a different one — reviews the draft and, if it finds a problem, the author revises. for many classes of task (long-form writing, reasoning-heavy answers, code generation) one reflection pass buys a meaningful quality bump for a predictable cost. two passes usually don't. three passes are a signal to redesign something upstream.

budgets are a design choice, not a safety net. a budget of 20 turns with a 60-second wall clock is a different product than a budget of 4 turns and 10 seconds. pick the numbers deliberately. the budget shapes what the loop can attempt.

when it breaks

next: evals — how you know any of this is actually working and not just demo-ing well.

scalable labs·cvr 30091604·github·linkedin·hello@scalable.dk