This cookbook provides a structured learning path for using the Gemini API, focusing on hands-on tutorials and practical examples.
For comprehensive API documentation, visit ai.google.dev.
With Gemini 2 we are offering a new SDK
(google-genai
,
v1.0
). The updated SDK is fully compatible with all Gemini API
models and features, including recent additions like the
live API (audio + video streaming),
improved tool usage (
code execution,
function calling and integrated
Google search grounding),
and media generation (Imagen).
This SDK allows you to connect to the Gemini API through either
Google AI Studio or
Vertex AI.
The google-generativeai
package will continue to support the original Gemini models.
It can also be used with Gemini 2 models, just with a limited feature
set. All new features will be developed in the new Google GenAI SDK.
See the migration guide for details.
This cookbook is organized into two main categories:
- Quick Starts: Step-by-step guides covering both introductory topics ("Get Started") and specific API features.
- Examples: Practical use cases demonstrating how to combine multiple features.
We also showcase Demos in separate repositories, illustrating end-to-end applications of the Gemini API.
Here are the recent additions and updates to the Gemini API and the Cookbook:
- Gemini 2.0 models: Explore the capabilities of the latest Gemini 2.0 models! See the Get Started Guide.
- Imagen: Get started with our image generation model with this brand new Imagen guide!
- Recently Added Guides:.
- Thinking model: Discover the thinking model capabilities.
- Invoice and Form Data Extraction: Analyze PDFs with structured outputs.
- Code execution: Generating and running Python code to solve complex tasks and even ouput graphs
The quickstarts section contains step-by-step tutorials to get you started with Gemini and learn about its specific features.
To begin, you'll need:
- A Google account.
- An API key (create one in Google AI Studio).
We recommend starting with the following:
- Authentication: Set up your API key for access.
- Get started: Get started with Gemini models and the Gemini API, covering basic prompting and multimodal input.
Then, explore the other quickstarts tutorials to learn about individual features:
- Get started with Live API: Get started with the live API with this comprehensive overview of its capabilities
- Grounding: use Google Search for grounded responses
- Code execution: Generating and running Python code to solve complex tasks and even ouput graphs
- And many more
These examples demonstrate how to combine multiple Gemini API features or 3rd-party tools to build more complex applications.
- Plotting and mapping Live: Mix Live API and Code execution to solve complex tasks live.
- Search grounding for research report: Use Grounding to improve the quality of your research report
- 3D Spatial understanding: Use Gemini 3D spatial abilities to understand 3D scenes
- Gradio and live API: Use gradio to deploy your own instance of the Live API
- And many many more
These fully functional, end-to-end applications showcase the power of Gemini in real-world scenarios.
- Gemini API quickstart: Python Flask App running with the Google AI Gemini API, designed to get you started building with Gemini's multi-modal capabilities
- Multimodal Live API Web Console: React-based starter app for using the Multimodal Live API over a websocket
- Google AI Studio Starter Applets: A collection of small apps that demonstrate how Gemini can be used to create interactive experiences
The Gemini API is a REST API. You can call it directly using tools like curl
(see REST examples), or use one of our official SDKs:
Ask a question on the Google AI Developer Forum.
For enterprise developers, the Gemini API is also available on Google Cloud Vertex AI. See this repo for examples.
Contributions are welcome! See CONTRIBUTING.md for details.
Thank you for developing with the Gemini API! We’re excited to see what you create.