Gemini context caching on Reprompt

We're not using context caching, but it sure is cool

Gemini Introduced Context Caching for AI - It's 4x Cheaper but No One Will Use It

June 28, 2024

Google just announced Context Caching for Gemini Flash and Pro 1.5 models. Let's break down the numbers and see who it's really for:

📊 The Basics

For Gemini Flash 1.5:

  • Non-cached input: $0.35 / 1M tokens (already super cheap compared to $5 / 1M tokens for GPT-4o)
  • Cached input: $0.0875 / 1M tokens - that's 4x cheaper than non-cached!
  • Cache storage cost: $1.00 / 1M tokens per hour
  • Max context window: still 1M tokens whether it's using cached or non-cached

🎣 So What's the Cache?

Sounds great, right? 4x savings for my prompts! But here's the catch:

  • 🚫 Minimum cacheable context: 32k tokens. What?! That's like 50 pages of text. The average chatbot context is < ~4k tokens
  • Most AI applications, including chatbots, use nowhere near 32k tokens per interaction. And the chat responses themselves can't be cached because they change constantly.

So Who's It For?

  • ✅ Specialized apps with massive, static contexts
  • ❌ Most chatbots and general AI applications

What About RAG Applications?

Retrieval-Augmented Generation (RAG) apps might seem like a perfect fit, but there's a twist:

  • Typical RAG retrieves ~4,000 tokens per query
  • This falls far short of the 32k minimum for caching
  • Caching the entire knowledge base could be prohibitively expensive

The Takeaway

  • Caching is overkill for most: If you're not regularly using 32k+ token prompts, you won't see savings.
  • Storage costs add up: $1/hour per 1M tokens cached. That's $720/month for just one full context.
  • Management overhead: Implementing and maintaining caches isn't trivial.

While Context Caching could be revolutionary for niche, large-context applications, it's likely irrelevant for the vast majority of AI use cases.

I hope that Google and Sam start including caching automatically behind the scenes to drive token prices down, instead of offering it as a service that doesn't seem to work for anyone.

You Want the Numbers? Here You Go:

Scenario 1: Customer Support Chatbot with a Massive 32k Context Window

Without Caching:

  • Tokens per Message: 32,000 (context)
  • Cached Tokens per Message: 0
  • Non-Cached Tokens per Message: 32,000
  • Messages per Hour: 1,200
  • Token Cost per Hour: $13.44
  • Storage Cost per Hour: $0.00
  • Total Cost per Hour: $13.44

With Caching:

  • Tokens per Message: 32,000 (cached)
  • Cached Tokens per Message: 32,000
  • Non-Cached Tokens per Message: 0
  • Messages per Hour: 1,200
  • Token Cost per Hour: $3.36
  • Storage Cost per Hour: $0.032
  • Total Cost per Hour: $3.392 (75% cost DECREASE)

Scenario 2: Customer Support Chatbot with Reasonable Context Window of 4k

Without Caching:

  • Tokens per Message: 4,000 (context) + 1,000 (overhead) = 5,000
  • Cached Tokens per Message: 0
  • Non-Cached Tokens per Message: 5,000
  • Messages per Hour: 1,200
  • Token Cost per Hour: $2.10
  • Storage Cost per Hour: $0.00
  • Total Cost per Hour: $2.10

With Caching (adhering to the minimum 32k cached context window):

  • Tokens per Message: 32,000 (cached) + 1,000 (overhead) = 33,000
  • Cached Tokens per Message: 32,000
  • Non-Cached Tokens per Message: 1,000
  • Messages per Hour: 1,200
  • Token Cost per Hour: $3.465
  • Storage Cost per Hour: $0.032
  • Total Cost per Hour: $3.497 (+66% cost INCREASE)

Scenario 3: Retrieval-Augmented Generation (RAG) Model

Without Caching:

  • Tokens per Message: 4,000 (retrieved) + 1,000 (overhead) = 5,000
  • Cached Tokens per Message: 0
  • Non-Cached Tokens per Message: 5,000
  • Messages per Hour: 1,200
  • Token Cost per Hour: $2.10
  • Storage Cost per Hour: $0.00
  • Total Cost per Hour: $2.10

With Caching:

  • Tokens per Message: 300,000 (cached) + 1,000 (overhead) = 301,000
  • Cached Tokens per Message: 300,000
  • Non-Cached Tokens per Message: 1,000
  • Messages per Hour: 1,200
  • Token Cost per Hour: $31.50 (cached input) + $0.42 (overhead input) = $31.92
  • Storage Cost per Hour: $0.30
  • Total Cost per Hour: $32.22 (1400% cost INCREASE)