Skip to content

Commit

Permalink
docs(js/plugins): add more context cache docs
Browse files Browse the repository at this point in the history
  • Loading branch information
cabljac committed Nov 18, 2024
1 parent 4db6af2 commit 754f8f1
Show file tree
Hide file tree
Showing 3 changed files with 170 additions and 24 deletions.
75 changes: 68 additions & 7 deletions docs/plugins/google-genai.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,9 @@ const llmResponse = await ai.generate({
},
],
model: gemini15Flash,
config: {
version: 'gemini-1.5-flash-001', // Only 001 currently supports context caching
},
prompt: 'Describe Pierre’s transformation throughout the novel.',
});
```
Expand Down Expand Up @@ -210,20 +213,78 @@ const llmResponse = await ai.generate({
},
],
model: gemini15Flash,
config: {
version: 'gemini-1.5-flash-001', // Only 001 currently supports context caching
},
prompt: 'Analyze the relationship between Pierre and Natasha.',
});
```

### Benefits of Context Caching
1. **Improved Performance**: Reduces the need for repeated processing of large inputs.
2. **Cost Efficiency**: Decreases API usage for redundant data, optimizing token consumption.
3. **Better Latency**: Speeds up response times for repeated or related queries.
### Caching other modes of content

The Gemini models are multi-modal, and other modes of content are allowed to be cached as well.

For example, to cache a long piece of video content, you must first upload using the file manager from the Google AI SDK:

```ts
import { GoogleAIFileManager } from '@google/generative-ai/server';
```

```ts
const fileManager = new GoogleAIFileManager(
process.env.GOOGLE_GENAI_API_KEY
);
// Upload video to Google AI using the Gemini Files API
const uploadResult = await fileManager.uploadFile(videoFilePath, {
mimeType: 'video/mp4', // Adjust according to the video format
displayName: 'Uploaded Video for Analysis',
});
const fileUri = uploadResult.file.uri;
```
Now you may configure the cache in your calls to `ai.generate`:
```ts
const analyzeVideoResponse = await ai.generate({
messages: [
{
role: 'user',
content: [
{
media: {
url: fileUri, // Use the uploaded file URL
contentType: 'video/mp4',
},
},
],
},
{
role: 'model',
content: [
{
text: 'This video seems to contain several key moments. I will analyze it now and prepare to answer your questions.',
},
],
// Everything up to (including) this message will be cached.
metadata: {
cache: true,
},
},
],
config: {
version: 'gemini-1.5-flash-001', // Only 001 versions support context caches
},
model: gemini15Flash,
prompt: query,
});
```

### Supported Models for Context Caching

Only specific models, such as `gemini15Flash` and `gemini15Pro`, support context caching. If an unsupported model is used, an error will be raised, indicating that caching cannot be applied.

---

By leveraging context caching, you can significantly enhance the performance and scalability of applications involving large datasets or conversational interactions.
### Further Reading

See more information regarding context caching on Google AI in their [documentation](https://ai.google.dev/gemini-api/docs/caching?lang=node).
92 changes: 91 additions & 1 deletion docs/plugins/vertex-ai.md
Original file line number Diff line number Diff line change
Expand Up @@ -421,4 +421,94 @@ See the code samples for:

* [Vertex Vector Search + BigQuery](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-bigquery)
* [Vertex Vector Search + Firestore](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-firestore)
* [Vertex Vector Search + a custom DB](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-custom)
* [Vertex Vector Search + a custom DB](https://github.com/firebase/genkit/tree/main/js/testapps/vertexai-vector-search-custom)

## Context Caching

The Vertex AI Genkit plugin supports **Context Caching**, which allows models to reuse previously cached content to optimize token usage when dealing with large pieces of content. This feature is especially useful for conversational flows or scenarios where the model references a large piece of content consistently across multiple requests.

### How to Use Context Caching

To enable context caching, ensure your model supports it. For example, `gemini15Flash` and `gemini15Pro` are models that support context caching, and you will have to specify version number `001`.

You can define a caching mechanism in your application like this:

```ts
const ai = genkit({
plugins: [googleAI()],
});
const llmResponse = await ai.generate({
messages: [
{
role: 'user',
content: [{ text: 'Here is the relevant text from War and Peace.' }],
},
{
role: 'model',
content: [
{
text: 'Based on War and Peace, here is some analysis of Pierre Bezukhov’s character.',
},
],
metadata: {
cache: {
ttlSeconds: 300, // Cache this message for 5 minutes
},
},
},
],
model: gemini15Flash,
prompt: 'Describe Pierre’s transformation throughout the novel.',
});
```

In this setup:
- **`messages`**: Allows you to pass conversation history.
- **`metadata.cache.ttlSeconds`**: Specifies the time-to-live (TTL) for caching a specific response.

### Example: Leveraging Large Texts with Context

For applications referencing long documents, such as *War and Peace* or *Lord of the Rings*, you can structure your queries to reuse cached contexts:

```ts
const textContent = await fs.readFile('path/to/war_and_peace.txt', 'utf-8');
const llmResponse = await ai.generate({
messages: [
{
role: 'user',
content: [{ text: textContent }], // Include the large text as context
},
{
role: 'model',
content: [
{
text: 'This analysis is based on the provided text from War and Peace.',
},
],
metadata: {
cache: {
ttlSeconds: 300, // Cache the response to avoid reloading the full text
},
},
},
],
model: gemini15Flash,
prompt: 'Analyze the relationship between Pierre and Natasha.',
});
```

### Benefits of Context Caching
1. **Improved Performance**: Reduces the need for repeated processing of large inputs.
2. **Cost Efficiency**: Decreases API usage for redundant data, optimizing token consumption.
3. **Better Latency**: Speeds up response times for repeated or related queries.

### Supported Models for Context Caching

Only specific models, such as `gemini15Flash` and `gemini15Pro`, support context caching, and currently only on version numbers `001`. If an unsupported model is used, an error will be raised, indicating that caching cannot be applied.

### Further Reading

See more information regarding context caching on Vertex AI in their [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview).
27 changes: 11 additions & 16 deletions js/pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 754f8f1

Please sign in to comment.