scalable systems handle growth without breaking
← back · 2 min read ·

prompt caching costs less than you think

with numbers from a real workload — not a benchmark.

prompt caching is the cheapest infrastructure win in the anthropic api right now. i didn’t believe it until i ran the numbers on a real workload.

the workload

an internal coding agent with a 28k-token system prompt — tools, conventions, examples — and a per-turn user message of roughly 600 tokens. before caching: every turn paid for 28k input tokens at full rate. with caching: the first turn writes the cache (10% premium), every following turn within five minutes reads at 10% of the input rate.

the math

across a typical 6-turn session:

the cache ttl is 5 minutes. that ruins it for anything where users disappear and come back. but for a chat session where someone’s actively iterating, it’s almost free.

the gotchas

the cost of adding caching to an existing claude api integration is roughly twenty lines of code (cache_control: { type: "ephemeral" } on the relevant content blocks). the cost of not adding it is about 70-90% of your input bill, depending on workload shape.

run the numbers on yours. mine surprised me.

scalable labs·cvr 30091604·github·linkedin·hello@scalable.dk