We're not using context caching, but it sure is cool
June 28, 2024
Google just announced Context Caching for Gemini Flash and Pro 1.5 models. Let's break down the numbers and see who it's really for:
For Gemini Flash 1.5:
Sounds great, right? 4x savings for my prompts! But here's the catch:
Retrieval-Augmented Generation (RAG) apps might seem like a perfect fit, but there's a twist:
While Context Caching could be revolutionary for niche, large-context applications, it's likely irrelevant for the vast majority of AI use cases.
I hope that Google and Sam start including caching automatically behind the scenes to drive token prices down, instead of offering it as a service that doesn't seem to work for anyone.
Without Caching:
With Caching:
Without Caching:
With Caching (adhering to the minimum 32k cached context window):
Without Caching:
With Caching: