Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

Large Multimodal Model Prompting with Gemini

Dear learner,

Introducing Large Multimodal Model Prompting with Gemini, a new short course built in collaboration with Google Cloud, and taught by Erwin Huizenga, Developer Advocate for Generative AI at Google Cloud.

Large Multimodal Models (LMMs) represent a significant evolution from language models by integrating different data modalities, allowing for more comprehensive outputs based on varied input types such as text, images, and video.

For LMMs, prompt structure becomes even more important. For example, placing text inputs, such as a patient’s medical history, before image inputs like an X-ray, can improve the model’s interpretation. Conversely, for tasks like image captioning, leading with the image may yield better results. In this course, you'll explore best practices for multimodal prompting, and learn how to properly set parameters for optimized results.

Additionally, you’ll learn how to integrate Gemini with external APIs and databases using function calling, with the ability to infuse your applications with real-time data and dynamic content.

Enroll Today

In detail, you’ll explore:

Introduction to Gemini Models: Learn the differences and use cases for Gemini Nano, Pro, Flash, and Ultra. Understand how to select optimal models based on capability, latency, and cost.
Multimodal Prompting and Parameter Control: Learn techniques for structuring effective text-image-video prompts. Fine-tune key parameters like temperature, top_p, top_k to control model creativity vs determinism.
Best Practices for Multimodal Prompting: Get hands-on experience with prompt engineering for Gemini multimodal models, and role assignment, task decomposition, and formatting.
Creating Use Cases with Images: Build engaging multimodal applications like interior design assistants and receipt itemization tools.
Developing Use Cases with Videos: Implement "needle in the haystack" semantic video search powered by Gemini's large context window.
Integrating Real-Time Data with Function Calling: Extend Gemini with external knowledge and live data via function calling and API integration. Combine Gemini's Natural Language Understanding (NLU) capabilities with APIs for up-to-date facts and interactive services.

Start building advanced AI applications that can reason across multiple data modalities today!

Note that due to technical requirements, this course features downloadable-only notebooks on the learning platform. You are free to download, review, and run these notebooks on your own.

Details

Learn state-of-the-art techniques for getting the most out of multimodal AI with Google’s Gemini model family.
Leverage the power of Gemini’s cross-modal attention to fuse information from text, images, and video for complex reasoning tasks.
Extend Gemini’s capabilities with external knowledge and live data via function calling and API integration.

Lesson	Video	Code
Introduction	video
Introduction to Gemini Models	video
Multimodal Prompting and Parameter Control	video	code
Best Practices for Multimodal Prompting	video
Creating Use Cases with Images	video	code
Developing Use Cases with Videos	video	code
Integrating Real-Time Data with Function Calling	video	code
Conclusion	video
How to Set Up your Google Cloud Account - Try it out Yourself (optional)		code

Try it out Yourself

Thank you for taking the time to go through this course!

Due to technical requirements, we cannot provide you with a lab environment to run the notebooks. However, in this document, we’ll walk you step-by-step on how to set up Google Cloud, and do multimodal prompting with Gemini models via VertexAI on your own. This is completely optional. You can learn to use Gemini by simply viewing the course.

💻 You can access the official github repository of this course from here.

The github repository contains:

4 folders, which include the notebooks and supplementary files for each lesson.
- The notebooks can run as Google Colabs as they are. (Open the links in a new tab.)
A requirements.txt file to help you to set up your environment (if running locally)

NOTE: The steps below will walk you through how to set up your Google Cloud account to run these notebooks ONLY as Colabs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LargeMultimodalModelPromptingwithGemini

LargeMultimodalModelPromptingwithGemini

README.md

Large Multimodal Model Prompting with Gemini

Details

Try it out Yourself

Files

LargeMultimodalModelPromptingwithGemini

Directory actions

More options

Directory actions

More options

Latest commit

History

LargeMultimodalModelPromptingwithGemini

Folders and files

parent directory

README.md

Large Multimodal Model Prompting with Gemini

Details

Try it out Yourself