observability — reading the trace · agentcore

spans, tool-call boundaries, cost attribution. the parts cloudwatch doesn't show you.

what you get for free

every lesson so far, the trace panel has been showing you exactly what agentcore's observability surface looks like. the runtime traces the invocation; memory traces reads and writes; gateway traces tool calls; identity traces principal reads and token issuance; browser and code interpreter trace their own events. you haven't written a single line of observability code.

in production, these events become opentelemetry spans. cloudwatch gets them automatically; so do x-ray, datadog, honeycomb — anywhere that speaks otel. the same trace you see here is the trace your on-call engineer sees at 2am.

anatomy of an automatic trace

every primitive traces itself

ready

stdout

error

read top to bottom. app.init constructs the app. each primitive emits its own *.init. app.invoke opens a span for this invocation; everything nested under it is attached as child spans. app.yield and app.done close the span. in cloudwatch this becomes a waterfall you can click through.

adding your own spans

automatic tracing covers the primitives. your agent's logic — the retrieval step, the reranking, the generation call, the custom business rule — doesn't trace itself. that's what span is for.

three custom spans around a rag pipeline

ready

stdout

error

each await span(name, attrs, fn) emits a span.start event with the attributes you pass, runs fn, then emits span.end when it returns. the attributes (model name, topK, anything) become searchable tags in cloudwatch. "show me every invocation where the reranker model was cohere-rerank-3 and latency was over 500ms" is a query you can actually write.

what happens when spans fail

a key error inside a span

ready

stdout

error

span emits span.error when fn throws, records the exception type and message, then lets the error propagate. in production this becomes a marked error span in the waterfall, plus a cloudwatch metric you can alert on. you don't need try/catch to get error observability — you need it only when you want to recover.

two different things. spans are for performance and diagnosis — where did time go, what failed. memory events are for agent continuity — what did the user say, what did the agent reply. don't mix them. don't put user prompts in span attributes, don't put span timings in memory.

what cloudwatch adds on top

traces the waterfall view. every span nested under its parent, timings drawn to scale. click a span to see its attributes and exception details. one view per invocation.
metrics aggregated counters and histograms: p50/p95/p99 latency per span name, error rates per tool, memory event throughput. no extra instrumentation — metrics are derived from the same spans.
logs anything you console.log (or write through the context-scoped log) ends up in cloudwatch logs, tagged with the trace id. jumping from a trace to its logs is one click.
cost attribution model tokens, sandbox minutes, memory storage — all billed per invocation and visible in the same trace. you can answer "which agent flow costs the most per turn" without leaving the console.

what to actually watch

three signals worth an alarm: app.error rate (errors per invocation); p95 of the app.invoke → app.done span (total latency); tool call failure rate (filtered on gateway.error). everything else can wait until it matters.

next: shipping this to real aws.