Gemini Introduced Context Caching for AI - It's 4x Cheaper but No One Will Use It

Google just announced Context Caching for Gemini Flash and Pro 1.5 models. Let's break down the numbers and see who it's really for:

Rob Balian

CTO

📊 The Basics

For Gemini Flash 1.5:

  • Non-cached input: $0.35 / 1M tokens (already super cheap compared to $5 / 1M tokens for GPT-4o)

  • Cached input: $0.0875 / 1M tokens - that's 4x cheaper than non-cached!

  • Cache storage cost: $1.00 / 1M tokens per hour

  • Max context window: still 1M tokens whether it's using cached or non-cached

🎣 So What's the Cache?

Sounds great, right? 4x savings for my prompts! But here's the catch:

  • 🚫 Minimum cacheable context: 32k tokens. What?! That's like 50 pages of text. The average chatbot context is < ~4k tokens

  • Most AI applications, including chatbots, use nowhere near 32k tokens per interaction. And the chat responses themselves can't be cached because they change constantly.

So Who's It For?

  • ✅ Specialized apps with massive, static contexts

  • ❌ Most chatbots and general AI applications

What About RAG Applications?

Retrieval-Augmented Generation (RAG) apps might seem like a perfect fit, but there's a twist:

  • Typical RAG retrieves ~4,000 tokens per query

  • This falls far short of the 32k minimum for caching

  • Caching the entire knowledge base could be prohibitively expensive

The Takeaway

  • Caching is overkill for most: If you're not regularly using 32k+ token prompts, you won't see savings.

  • Storage costs add up: $1/hour per 1M tokens cached. That's $720/month for just one full context.

  • Management overhead: Implementing and maintaining caches isn't trivial.

While Context Caching could be revolutionary for niche, large-context applications, it's likely irrelevant for the vast majority of AI use cases.

I hope that Google and Sam start including caching automatically behind the scenes to drive token prices down, instead of offering it as a service that doesn't seem to work for anyone.

You Want the Numbers? Here You Go:

Scenario 1: Customer Support Chatbot with a Massive 32k Context Window

Without Caching:

  • Tokens per Message: 32,000 (context)

  • Cached Tokens per Message: 0

  • Non-Cached Tokens per Message: 32,000

  • Messages per Hour: 1,200

  • Token Cost per Hour: $13.44

  • Storage Cost per Hour: $0.00

  • Total Cost per Hour: $13.44

With Caching:

  • Tokens per Message: 32,000 (cached)

  • Cached Tokens per Message: 32,000

  • Non-Cached Tokens per Message: 0

  • Messages per Hour: 1,200

  • Token Cost per Hour: $3.36

  • Storage Cost per Hour: $0.032

  • Total Cost per Hour: $3.392 (75% cost DECREASE)

Scenario 2: Customer Support Chatbot with Reasonable Context Window of 4k

Without Caching:

  • Tokens per Message: 4,000 (context) + 1,000 (overhead) = 5,000

  • Cached Tokens per Message: 0

  • Non-Cached Tokens per Message: 5,000

  • Messages per Hour: 1,200

  • Token Cost per Hour: $2.10

  • Storage Cost per Hour: $0.00

  • Total Cost per Hour: $2.10

With Caching (adhering to the minimum 32k cached context window):

  • Tokens per Message: 32,000 (cached) + 1,000 (overhead) = 33,000

  • Cached Tokens per Message: 32,000

  • Non-Cached Tokens per Message: 1,000

  • Messages per Hour: 1,200

  • Token Cost per Hour: $3.465

  • Storage Cost per Hour: $0.032

  • Total Cost per Hour: $3.497 (+66% cost INCREASE)

Scenario 3: Retrieval-Augmented Generation (RAG) Model

Without Caching:

  • Tokens per Message: 4,000 (retrieved) + 1,000 (overhead) = 5,000

  • Cached Tokens per Message: 0

  • Non-Cached Tokens per Message: 5,000

  • Messages per Hour: 1,200

  • Token Cost per Hour: $2.10

  • Storage Cost per Hour: $0.00

  • Total Cost per Hour: $2.10

With Caching:

  • Tokens per Message: 300,000 (cached) + 1,000 (overhead) = 301,000

  • Cached Tokens per Message: 300,000

  • Non-Cached Tokens per Message: 1,000

  • Messages per Hour: 1,200

  • Token Cost per Hour: $31.50 (cached input) + $0.42 (overhead input) = $31.92

  • Storage Cost per Hour: $0.30

  • Total Cost per Hour: $32.22 (1400% cost INCREASE)

Social

© Reprompt AI