the control flow inside a turn
loops are the internal cycle a single agent runs to solve a problem: think, act, observe, repeat, until it has an answer or has run out of budget. the react pattern — reasoning plus acting — is the canonical shape. everything else (reflection, plan-and-execute, tree-of-thought) is a variation on who gets to think, when, and how many times.
the pattern worth internalising is not a specific loop, but the discipline around any loop: every one needs an explicit budget, an explicit stopping rule, and observability that lets you see what the loop actually did on a bad day.
react — think, act, observe, repeat
four things this loop gets right:
- the budget is explicit.
maxTurnsis a number you can change. an agent with no budget can, and eventually will, run forever. the second-worst outage is the one you couldn't stop; the worst is the one you couldn't measure. - history is a value. the loop builds an array of turn objects. every reasoning step, every action, every observation — inspectable, loggable, replayable. the trace in the pane shows you the whole sequence.
- "final" is a flag, not a parse. the loop checks a typed field. it does not regex the model's reasoning for "I believe I am done". structured output is one of the cheapest reliability wins you have.
- unknown skill throws. a model that hallucinates a skill name should fail loud, not get a silent "skill not found" that it then reasons around for three more turns.
stopping rules — watch the no-progress rule fire
a real production loop has several stopping rules: turn budget, wall clock, token ceiling, and the one teams most often skip — no-progress. that's the rule that says "if the agent has made the same action three times in a row, stop". it catches the loop burning $40 on a single conversation because the model decided a flaky api must be retried. i learned this the expensive way the first time an upstream provider started 500ing mid-request.
reflection — the one extra loop worth adding
reflection is a second pass where a critic — the same persona or a different one — reviews the draft and, if it finds a problem, the author revises. for many classes of task (long-form writing, reasoning-heavy answers, code generation) one reflection pass buys a meaningful quality bump for a predictable cost. two passes usually don't. three passes are a signal to redesign something upstream.
when it breaks
- unbounded retries on transient errors. an upstream api 500s, the loop retries inside the skill, the agent also retries at the loop level, the extension wraps the whole thing in another retry. now one flake is 27 calls. cap retries at one layer.
- reflection that never says "good enough". critique prompts that optimise for "find a flaw" always find one. the critique needs an explicit bar and permission to say "ship it". without that, reflection is just expensive procrastination.
- silent truncation of history. at turn 9 the prompt hits the context window, someone trims the oldest turns silently, the agent "forgets" the user's name and starts over. summarise explicitly, keep the fact that truncation happened visible to the loop.
- no visibility into why the loop stopped. did we return because the agent said final? because of the turn cap? because of the wall clock? log the reason every time. the distribution of stop-reasons is one of the best health metrics you have.
next: evals — how you know any of this is actually working and not just demo-ing well.