Dear learner,
Introducing Large Multimodal Model Prompting with Gemini, a new short course built in collaboration with Google Cloud, and taught by Erwin Huizenga, Developer Advocate for Generative AI at Google Cloud.
Large Multimodal Models (LMMs) represent a significant evolution from language models by integrating different data modalities, allowing for more comprehensive outputs based on varied input types such as text, images, and video.
For LMMs, prompt structure becomes even more important. For example, placing text inputs, such as a patient’s medical history, before image inputs like an X-ray, can improve the model’s interpretation. Conversely, for tasks like image captioning, leading with the image may yield better results. In this course, you'll explore best practices for multimodal prompting, and learn how to properly set parameters for optimized results.
Additionally, you’ll learn how to integrate Gemini with external APIs and databases using function calling, with the ability to infuse your applications with real-time data and dynamic content.
Enroll Today
In detail, you’ll explore:
- Introduction to Gemini Models: Learn the differences and use cases for Gemini Nano, Pro, Flash, and Ultra. Understand how to select optimal models based on capability, latency, and cost.
- Multimodal Prompting and Parameter Control: Learn techniques for structuring effective text-image-video prompts. Fine-tune key parameters like temperature, top_p, top_k to control model creativity vs determinism.
- Best Practices for Multimodal Prompting: Get hands-on experience with prompt engineering for Gemini multimodal models, and role assignment, task decomposition, and formatting.
- Creating Use Cases with Images: Build engaging multimodal applications like interior design assistants and receipt itemization tools.
- Developing Use Cases with Videos: Implement "needle in the haystack" semantic video search powered by Gemini's large context window.
- Integrating Real-Time Data with Function Calling: Extend Gemini with external knowledge and live data via function calling and API integration. Combine Gemini's Natural Language Understanding (NLU) capabilities with APIs for up-to-date facts and interactive services.
Start building advanced AI applications that can reason across multiple data modalities today!
Note that due to technical requirements, this course features downloadable-only notebooks on the learning platform. You are free to download, review, and run these notebooks on your own.
-
Learn state-of-the-art techniques for getting the most out of multimodal AI with Google’s Gemini model family.
-
Leverage the power of Gemini’s cross-modal attention to fuse information from text, images, and video for complex reasoning tasks.
-
Extend Gemini’s capabilities with external knowledge and live data via function calling and API integration.
Lesson | Video | Code |
---|---|---|
Introduction | video | |
Introduction to Gemini Models | video | |
Multimodal Prompting and Parameter Control | video | code |
Best Practices for Multimodal Prompting | video | |
Creating Use Cases with Images | video | code |
Developing Use Cases with Videos | video | code |
Integrating Real-Time Data with Function Calling | video | code |
Conclusion | video | |
How to Set Up your Google Cloud Account - Try it out Yourself (optional) | code |
Thank you for taking the time to go through this course!
Due to technical requirements, we cannot provide you with a lab environment to run the notebooks. However, in this document, we’ll walk you step-by-step on how to set up Google Cloud, and do multimodal prompting with Gemini models via VertexAI on your own. This is completely optional. You can learn to use Gemini by simply viewing the course.
💻 You can access the official github repository of this course from here.
The github repository contains:
- 4 folders, which include the notebooks and supplementary files for each lesson.
- The notebooks can run as Google Colabs as they are. (Open the links in a new tab.)
- A
requirements.txt
file to help you to set up your environment (if running locally)
NOTE: The steps below will walk you through how to set up your Google Cloud account to run these notebooks ONLY as Colabs.