| Thread Navigator

Thread Truncated (Cap Enforced)

Only the first 20 tweets are unrolled into slides to ensure reliable PDF exporting and high server performance.

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Thariq

@trq212

It is often said in engineering that "Cache Rules Everything Around Me", and the same rule holds for agents.

Apply Image

Drag Post #2

Thariq

@trq212

Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost.

Drag Post #3

Thariq

@trq212

What is prompt caching, how does it work and how do you implement it technically? <a target="_blank" href="https://x.com/RLanceMartin/status/2024573404888911886" color="blue">Read more in @RLanceMartin's piece on prompt caching and our new auto-caching launch.</a>

Drag Post #4

Thariq

@trq212

At Claude Code, we build our entire harness around prompt caching. A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans, so we run alerts on our prompt cache hit rate and declare SEVs if they're too low.

Drag Post #5

Thariq

@trq212

These are the (often unintuitive) lessons we've learned from optimizing prompt caching at scale.

Drag Post #6

Thariq

@trq212

## Lay Out Your Prompt for Caching

Drag Post #7

Thariq

@trq212

Apply Image

Drag Post #8

Thariq

@trq212

Prompt caching works by prefix matching — the API caches everything from the start of the request up to each cache_control breakpoint. This means the order you put things in matters enormously, you want as many of your requests to share a prefix as possible.

Drag Post #9

Thariq

@trq212

The best way to do this is static content first, dynamic content last. For Claude Code this looks like:

Drag Post #10

Thariq

@trq212

1. Static system prompt & Tools (globally cached)

Drag Post #11

Thariq

@trq212

1. Claude.MD (cached within a project)

Drag Post #12

Thariq

@trq212

1. Session context (cached within a session)

Drag Post #13

Thariq

@trq212

1. Conversation messages

Drag Post #14

Thariq

@trq212

This way we maximize how many sessions share cache hits.

Drag Post #15

Thariq

@trq212

But this can be surprisingly fragile! Examples of reasons we’ve broken this ordering before include: putting an in-depth timestamp in the static system prompt, shuffling tool order definitions non-deterministically, updating parameters of tools (e.g. what agents the AgentTool can call), etc.

Drag Post #16

Thariq

@trq212

## Use Messages for Updates

Drag Post #17

Thariq

@trq212

There may be times when the information you put in your prompt becomes out of date, for example if you have the time or if the user changes a file. It may be tempting to update the prompt, but that would result in a cache miss and could end up being quite expensive for the user.

Drag Post #18

Thariq

@trq212

Consider if you can pass in this information via messages in the next turn instead. In Claude Code, we add a <system-reminder> tag in the next user message or tool result with the updated information for the model (e.g. it is now Wednesday), which helps preserve the cache.

Drag Post #19

Thariq

@trq212

## Don't change Models Mid-Session

Drag Post #20

Thariq

@trq212

Prompt caches are unique to models and this can make the math of prompt caching quite unintuitive.