Skip to content

Latest commit

 

History

History

LargeMultimodalModelPromptingwithGemini

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Promo banner for

Dear learner,

Introducing Large Multimodal Model Prompting with Gemini, a new short course built in collaboration with Google Cloud, and taught by Erwin Huizenga, Developer Advocate for Generative AI at Google Cloud.

Large Multimodal Models (LMMs) represent a significant evolution from language models by integrating different data modalities, allowing for more comprehensive outputs based on varied input types such as text, images, and video.

For LMMs, prompt structure becomes even more important. For example, placing text inputs, such as a patient’s medical history, before image inputs like an X-ray, can improve the model’s interpretation. Conversely, for tasks like image captioning, leading with the image may yield better results. In this course, you'll explore best practices for multimodal prompting, and learn how to properly set parameters for optimized results.

Additionally, you’ll learn how to integrate Gemini with external APIs and databases using function calling, with the ability to infuse your applications with real-time data and dynamic content.

Enroll Today

Launch email GIFs (40)

In detail, you’ll explore:

  • Introduction to Gemini Models: Learn the differences and use cases for Gemini Nano, Pro, Flash, and Ultra. Understand how to select optimal models based on capability, latency, and cost.
  • Multimodal Prompting and Parameter Control: Learn techniques for structuring effective text-image-video prompts. Fine-tune key parameters like temperature, top_p, top_k to control model creativity vs determinism.
  • Best Practices for Multimodal Prompting: Get hands-on experience with prompt engineering for Gemini multimodal models, and role assignment, task decomposition, and formatting.
  • Creating Use Cases with Images: Build engaging multimodal applications like interior design assistants and receipt itemization tools.
  • Developing Use Cases with Videos: Implement "needle in the haystack" semantic video search powered by Gemini's large context window.
  • Integrating Real-Time Data with Function Calling: Extend Gemini with external knowledge and live data via function calling and API integration. Combine Gemini's Natural Language Understanding (NLU) capabilities with APIs for up-to-date facts and interactive services.

Start building advanced AI applications that can reason across multiple data modalities today!

Note that due to technical requirements, this course features downloadable-only notebooks on the learning platform. You are free to download, review, and run these notebooks on your own.

Details

  • Learn state-of-the-art techniques for getting the most out of multimodal AI with Google’s Gemini model family.

  • Leverage the power of Gemini’s cross-modal attention to fuse information from text, images, and video for complex reasoning tasks.

  • Extend Gemini’s capabilities with external knowledge and live data via function calling and API integration.

Lesson Video Code
Introduction video
Introduction to Gemini Models video
Multimodal Prompting and Parameter Control video code
Best Practices for Multimodal Prompting video
Creating Use Cases with Images video code
Developing Use Cases with Videos video code
Integrating Real-Time Data with Function Calling video code
Conclusion video
How to Set Up your Google Cloud Account - Try it out Yourself (optional) code

Try it out Yourself

Thank you for taking the time to go through this course!

Due to technical requirements, we cannot provide you with a lab environment to run the notebooks. However, in this document, we’ll walk you step-by-step on how to set up Google Cloud, and do multimodal prompting with Gemini models via VertexAI on your own. This is completely optional. You can learn to use Gemini by simply viewing the course.

💻 You can access the official github repository of this course from here.

The github repository contains:

NOTE: The steps below will walk you through how to set up your Google Cloud account to run these notebooks ONLY as Colabs.