docs(js/plugins): add more context cache docs

firebase · Nov 18, 2024 · 754f8f1 · 754f8f1
1 parent 4db6af2
commit 754f8f1
Show file tree

Hide file tree

Showing 3 changed files with 170 additions and 24 deletions.
diff --git a/docs/plugins/google-genai.md b/docs/plugins/google-genai.md
@@ -172,6 +172,9 @@ const llmResponse = await ai.generate({
     },
   ],
   model: gemini15Flash,
+  config: {
+    version: 'gemini-1.5-flash-001', // Only 001 currently supports context caching
+  },
   prompt: 'Describe Pierre’s transformation throughout the novel.',
 });
 ```
@@ -210,20 +213,78 @@ const llmResponse = await ai.generate({
     },
   ],
   model: gemini15Flash,
+  config: {
+    version: 'gemini-1.5-flash-001', // Only 001 currently supports context caching
+  },
   prompt: 'Analyze the relationship between Pierre and Natasha.',
 });
 ```
 
-### Benefits of Context Caching
-1. **Improved Performance**: Reduces the need for repeated processing of large inputs.
-2. **Cost Efficiency**: Decreases API usage for redundant data, optimizing token consumption.
-3. **Better Latency**: Speeds up response times for repeated or related queries.
+### Caching other modes of content
+
+The Gemini models are multi-modal, and other modes of content are allowed to be cached as well.
+
+For example, to cache a long piece of video content, you must first upload using the file manager from the Google AI SDK:
+
+```ts
+import { GoogleAIFileManager } from '@google/generative-ai/server';
+
+```
+
+```ts
+const fileManager = new GoogleAIFileManager(
+  process.env.GOOGLE_GENAI_API_KEY
+);
+
+// Upload video to Google AI using the Gemini Files API
+const uploadResult = await fileManager.uploadFile(videoFilePath, {
+  mimeType: 'video/mp4', // Adjust according to the video format
+  displayName: 'Uploaded Video for Analysis',
+});
+
+const fileUri = uploadResult.file.uri;
+
+```
+Now you may configure the cache in your calls to `ai.generate`:
+```ts
+const analyzeVideoResponse = await ai.generate({
+  messages: [
+    {
+      role: 'user',
+      content: [
+        {
+          media: {
+            url: fileUri, // Use the uploaded file URL
+            contentType: 'video/mp4',
+          },
+        },
+      ],
+    },
+    {
+      role: 'model',
+      content: [
+        {
+          text: 'This video seems to contain several key moments. I will analyze it now and prepare to answer your questions.',
+        },
+      ],
+      // Everything up to (including) this message will be cached.
+      metadata: {
+        cache: true,
+      },
+    },
+  ],
+  config: {
+    version: 'gemini-1.5-flash-001', // Only 001 versions support context caches
+  },
+  model: gemini15Flash,
+  prompt: query,
+});
+```
 
 ### Supported Models for Context Caching
 
 Only specific models, such as `gemini15Flash` and `gemini15Pro`, support context caching. If an unsupported model is used, an error will be raised, indicating that caching cannot be applied.
 
----
-
-By leveraging context caching, you can significantly enhance the performance and scalability of applications involving large datasets or conversational interactions.
+### Further Reading 
 
+See more information regarding context caching on Google AI in their [documentation](https://ai.google.dev/gemini-api/docs/caching?lang=node).
diff --git a/docs/plugins/vertex-ai.md b/docs/plugins/vertex-ai.md
@@ -421,4 +421,94 @@ See the code samples for:
 
 *   [Vertex Vector Search + BigQuery](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-bigquery)
 *   [Vertex Vector Search + Firestore](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-firestore)
-*   [Vertex Vector Search + a custom DB](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-custom)
+*   [Vertex Vector Search + a custom DB](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-custom)
+
+## Context Caching
+
+The Vertex AI Genkit plugin supports **Context Caching**, which allows models to reuse previously cached content to optimize token usage when dealing with large pieces of content. This feature is especially useful for conversational flows or scenarios where the model references a large piece of content consistently across multiple requests.
+
+### How to Use Context Caching
+
+To enable context caching, ensure your model supports it. For example, `gemini15Flash` and `gemini15Pro` are models that support context caching, and you will have to specify version number `001`.
+
+You can define a caching mechanism in your application like this:
+
+```ts
+const ai = genkit({
+  plugins: [googleAI()],
+});
+
+const llmResponse = await ai.generate({
+  messages: [
+    {
+      role: 'user',
+      content: [{ text: 'Here is the relevant text from War and Peace.' }],
+    },
+    {
+      role: 'model',
+      content: [
+        {
+          text: 'Based on War and Peace, here is some analysis of Pierre Bezukhov’s character.',
+        },
+      ],
+      metadata: {
+        cache: {
+          ttlSeconds: 300, // Cache this message for 5 minutes
+        },
+      },
+    },
+  ],
+  model: gemini15Flash,
+  prompt: 'Describe Pierre’s transformation throughout the novel.',
+});
+```
+
+In this setup:
+- **`messages`**: Allows you to pass conversation history.
+- **`metadata.cache.ttlSeconds`**: Specifies the time-to-live (TTL) for caching a specific response.
+
+### Example: Leveraging Large Texts with Context
+
+For applications referencing long documents, such as *War and Peace* or *Lord of the Rings*, you can structure your queries to reuse cached contexts:
+
+```ts
+
+const textContent = await fs.readFile('path/to/war_and_peace.txt', 'utf-8');
+
+const llmResponse = await ai.generate({
+  messages: [
+    {
+      role: 'user',
+      content: [{ text: textContent }], // Include the large text as context
+    },
+    {
+      role: 'model',
+      content: [
+        {
+          text: 'This analysis is based on the provided text from War and Peace.',
+        },
+      ],
+      metadata: {
+        cache: {
+          ttlSeconds: 300, // Cache the response to avoid reloading the full text
+        },
+      },
+    },
+  ],
+  model: gemini15Flash,
+  prompt: 'Analyze the relationship between Pierre and Natasha.',
+});
+```
+
+### Benefits of Context Caching
+1. **Improved Performance**: Reduces the need for repeated processing of large inputs.
+2. **Cost Efficiency**: Decreases API usage for redundant data, optimizing token consumption.
+3. **Better Latency**: Speeds up response times for repeated or related queries.
+
+### Supported Models for Context Caching
+
+Only specific models, such as `gemini15Flash` and `gemini15Pro`, support context caching, and currently only on version numbers `001`. If an unsupported model is used, an error will be raised, indicating that caching cannot be applied.
+
+### Further Reading 
+
+See more information regarding context caching on Vertex AI in their [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview).
diff --git a/js/pnpm-lock.yaml b/js/pnpm-lock.yaml