✨ Visual Editor

close

Thread Truncated

Only the first 20 tweets are shown to ensure high-quality rendering and prevent image size issues.

palette Canvas & Background

Gradient:arrow_forward
Text Color:
135°

style Card Style

40px
16px

text_fields Typography

16px
Thariq
@trq212
It is often said in engineering that "Cache Rules Everything Around Me", and the same rule holds for agents.
Thread image
Thariq
@trq212
Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost.
Thariq
@trq212
What is prompt caching, how does it work and how do you implement it technically?
Thariq
@trq212
At Claude Code, we build our entire harness around prompt caching. A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans, so we run alerts on our prompt cache hit rate and declare SEVs if they're too low.
Thariq
@trq212
These are the (often unintuitive) lessons we've learned from optimizing prompt caching at scale.
Thariq
@trq212
## Lay Out Your Prompt for Caching
Thariq
@trq212
Thread image
Thariq
@trq212
Prompt caching works by prefix matching — the API caches everything from the start of the request up to each cache_control breakpoint. This means the order you put things in matters enormously, you want as many of your requests to share a prefix as possible.
Thariq
@trq212
The best way to do this is static content first, dynamic content last. For Claude Code this looks like:
Thariq
@trq212
1. Static system prompt & Tools (globally cached)
Thariq
@trq212
1. Claude.MD (cached within a project)
Thariq
@trq212
1. Session context (cached within a session)
Thariq
@trq212
1. Conversation messages
Thariq
@trq212
This way we maximize how many sessions share cache hits.
Thariq
@trq212
But this can be surprisingly fragile! Examples of reasons we’ve broken this ordering before include: putting an in-depth timestamp in the static system prompt, shuffling tool order definitions non-deterministically, updating parameters of tools (e.g. what agents the AgentTool can call), etc.
Thariq
@trq212
## Use Messages for Updates
Thariq
@trq212
There may be times when the information you put in your prompt becomes out of date, for example if you have the time or if the user changes a file. It may be tempting to update the prompt, but that would result in a cache miss and could end up being quite expensive for the user.
Thariq
@trq212
Consider if you can pass in this information via messages in the next turn instead. In Claude Code, we add a <system-reminder> tag in the next user message or tool result with the updated information for the model (e.g. it is now Wednesday), which helps preserve the cache.
Thariq
@trq212
## Don't change Models Mid-Session
Thariq
@trq212
Prompt caches are unique to models and this can make the math of prompt caching quite unintuitive.
Generated by Thread Navigator
100%
view_carousel Carousel Studio NEW
Press + S to quick-export