Skip to content

Commit

Permalink
docs(firestore-multimodal-genai): update docs (#602)
Browse files Browse the repository at this point in the history
  • Loading branch information
cabljac authored Nov 29, 2024
1 parent 31963ff commit 81cfd8d
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 10 deletions.
10 changes: 6 additions & 4 deletions firestore-multimodal-genai/POSTINSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,15 @@ For Vertex AI, the list of models is [here](https://cloud.google.com/vertex-ai/d

#### Multimodal Prompts

Many of the Gemini models accept multimodal prompts. This extension allows for multimodal prompting with images using this model.
Many of the Gemini models accept multimodal prompts. This extension allows for multimodal prompting with images using such models. Note that this feature is not supported for models such as `gemini-1.0-pro` which do not allow multimodal prompts.

On installation you may pick an `image` field. The image field must be the Cloud Storage URL of an object (e.g `gs://my-bucket.appspot.com/filename.png`). This image will then be provided as part of the prompt to Gemini Pro Vision.
On installation you may pick an `image` field. The image field must be the Cloud Storage URL of an object (e.g `gs://my-bucket.appspot.com/filename.png`). This image will then be provided as part of the prompt.

Note that Google AI requires prompts to have both an image and text part, whereas Vertex AI allows gemini-pro-vision to be prompted with text only as well.
##### Gemini Pro Vision (deprecated)

If you have selected to use the Gemini Pro Vision model (deprecated) and have Google AI as a provider then any document handled by the extension must contain an image field.
This extension has historically supported calls to the (now deprecated) Gemini Pro Vision model on Google AI and Vertex AI APIs.

For the Gemini Pro Vision models Google AI requires prompts to have both an image and text part, whereas Vertex AI allows gemini-pro-vision to be prompted with text only as well. If you have selected to use the Gemini Pro Vision model (deprecated) and have Google AI as a provider then any document handled by the extension must contain an image field.

The Gemini Pro Vision API has a limit on image sizes. For Google AI this limit is currently 1MB, and for Vertex AI this limit is 4MB. This extension compress and resize images that fall above this limit.

Expand Down
15 changes: 12 additions & 3 deletions firestore-multimodal-genai/PREINSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,20 @@ For Vertex AI, the list of models is [here](https://cloud.google.com/vertex-ai/d

#### Multimodal Prompts

This extension supports providing multimodal prompts. To use this feature, select the Gemini Pro Vision model on installation, and provide an Image Field parameter. The Image Field parameter should be the name of a document field in firestore.
Many Gemini models, such as **Gemini 1.5 Flash**, support multimodal prompts, allowing both text and image inputs. This feature is not supported by text-only models like `gemini-1.0-pro`.

When you select these options, any document handled by the extension must contain an image field. The image field must be a string, and can either be the Cloud Storage URL of an object (e.g `gs://my-bucket.appspot.com/filename.png`). This image will then be provided as part of the prompt to Gemini Pro Vision.
**Image Field Configuration:**
During installation, you may specify an **Image Field**. This installation parameter is a string which corresponds to a field in Cloud Firestore documents.

The Gemini Pro Vision API has a limit on image sizes. For Google AI this limit is currently 1MB, and for Vertex AI this limit is 4MB. This extension will compress and resize images that fall above this limit.
When you select these options, any document handled by the extension must contain an image field. The image field must be a string, and can either be the Cloud Storage URL of an object (e.g `gs://my-bucket.appspot.com/filename.png`).

##### Gemini Pro Vision (deprecated)

This extension has historically supported calls to the (now deprecated) Gemini Pro Vision model on Google AI and Vertex AI APIs.

For the Gemini Pro Vision models Google AI requires prompts to have both an image and text part, whereas Vertex AI allows gemini-pro-vision to be prompted with text only as well. If you have selected to use the Gemini Pro Vision model (deprecated) and have Google AI as a provider then any document handled by the extension must contain an image field.

The Gemini Pro Vision API has a limit on image sizes. For Google AI this limit is currently 1MB, and for Vertex AI this limit is 4MB. This extension compress and resize images that fall above this limit.

### Troubleshooting timeout/PROCESSING errors

Expand Down
15 changes: 12 additions & 3 deletions firestore-multimodal-genai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,20 @@ For Vertex AI, the list of models is [here](https://cloud.google.com/vertex-ai/d

#### Multimodal Prompts

This extension supports providing multimodal prompts. To use this feature, select the Gemini Pro Vision model on installation, and provide an Image Field parameter. The Image Field parameter should be the name of a document field in firestore.
Many Gemini models, such as **Gemini 1.5 Flash**, support multimodal prompts, allowing both text and image inputs. This feature is not supported by text-only models like `gemini-1.0-pro`.

When you select these options, any document handled by the extension must contain an image field. The image field must be a string, and can either be the Cloud Storage URL of an object (e.g `gs://my-bucket.appspot.com/filename.png`). This image will then be provided as part of the prompt to Gemini Pro Vision.
**Image Field Configuration:**
During installation, you may specify an **Image Field**. This installation parameter is a string which corresponds to a field in Cloud Firestore documents.

The Gemini Pro Vision API has a limit on image sizes. For Google AI this limit is currently 1MB, and for Vertex AI this limit is 4MB. This extension will compress and resize images that fall above this limit.
When you select these options, any document handled by the extension must contain an image field. The image field must be a string, and can either be the Cloud Storage URL of an object (e.g `gs://my-bucket.appspot.com/filename.png`).

##### Gemini Pro Vision (deprecated)

This extension has historically supported calls to the (now deprecated) Gemini Pro Vision model on Google AI and Vertex AI APIs.

For the Gemini Pro Vision models Google AI requires prompts to have both an image and text part, whereas Vertex AI allows gemini-pro-vision to be prompted with text only as well. If you have selected to use the Gemini Pro Vision model (deprecated) and have Google AI as a provider then any document handled by the extension must contain an image field.

The Gemini Pro Vision API has a limit on image sizes. For Google AI this limit is currently 1MB, and for Vertex AI this limit is 4MB. This extension compress and resize images that fall above this limit.

### Troubleshooting timeout/PROCESSING errors

Expand Down

0 comments on commit 81cfd8d

Please sign in to comment.