Skip to content

maxi-w/gemini-computer-use

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gemini 2.0 Computer Use

Setup

This project uses PDM to manage Python dependencies. Find installation instructions here: https://pdm-project.org/en/latest/.

Step 1

Clone the project repository and install dependencies with:

git clone https://github.com/maxi-w/gemini-computer-use.git
cd gemini-computer-use
pdm install

Step 2

Set an environment variable with your GOOGLE_API_KEY to use Gemini:

export GOOGLE_API_KEY=YOUR_API_KEY

Step 3

Run your computer agent with a goal:

pdm run start "search for cat images with google"

Roadmap

  • Simple implementation of screenshot understanding and computer tool use.
  • Improve structure of actions e.g. with JSON mode.
  • Checkout tool use as in Gemini SDK Docs.
  • Improve prompt to prevent some unwanted behaviours.
  • Explore different grounding info formats (2d box vs point, order of coordinates, scaling).
  • Make the agent decide when it's done with the task.
  • Explore Multimodal Live API for screen input Docs

Note: Feel free to open an issue to discuss improvements.

About

Explore Building Computer Use Agents with Gemini 2.0

Topics

Resources

License

Stars

Watchers

Forks

Languages