ErikBjare · ErikBjare · Oct 10, 2024 · Oct 10, 2024 · Oct 10, 2024 · Oct 10, 2024
diff --git a/docs/cli.rst b/docs/cli.rst
@@ -10,14 +10,30 @@ gptme provides the following commands:
 
 This is the full CLI reference. For a more concise version, run ``gptme --help``.
 
+.. rubric:: gptme
+
+You can skip confirmation prompts and run in non-interactive mode to terminate when all prompts have been completed:
+
+.. code-block:: bash
+
+    gptme --non-interactive --no-confirm 'create a snake game using curses in snake.py, dont run it' '-' 'make the snake green and the apple red'
+
+This should make it first write snake.py, then make the change in a following prompt.
+
+The '-' is special "multiprompt" syntax that tells the assistant to wait for the assistant to finish work on the next prompt (run until no more tool calls) before continuing.
+
 .. click:: gptme.cli:main
    :prog: gptme
    :nested: full
 
+.. rubric:: gptme-server
+
 .. click:: gptme.server:main
    :prog: gptme-server
    :nested: full
 
+.. rubric:: gptme-eval
+
 .. click:: gptme.eval:main
    :prog: gptme-eval
    :nested: full
diff --git a/docs/conf.py b/docs/conf.py
@@ -4,7 +4,6 @@
 # https://www.sphinx-doc.org/en/master/usage/configuration.html
 # -- Project information -----------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
-
 import re
 from datetime import date
 
@@ -116,6 +115,7 @@ def setup(app):
     ("py:class", "pathlib.Path"),
     ("py:class", "flask.app.Flask"),
     ("py:class", "gptme.tools.python.T"),
+    ("py:class", "threading.Thread"),
 ]
 
 # -- Options for HTML output -------------------------------------------------

diff --git a/docs/getting-started.rst b/docs/getting-started.rst
@@ -29,8 +29,7 @@ To initiate a new chat or select an existing one, execute:
 
 This will show you a list of past chats, allowing you to select one or start a new one.
 
-Writing a file
-**************
+.. rubric:: Writing a file
 
 You can then interact with the assistant. Lets start by asking it to write code.
 
@@ -50,8 +49,7 @@ You can then interact with the assistant. Lets start by asking it to write code.
 
 The assistant will prompt for your confirmation and save the file, as requested.
 
-Making changes
-**************
+.. rubric:: Making changes
 
 We can also start chats and request changes directly from the command line. The contents of any mentioned text files will be included as context, and the assistant will generate patches to apply the requested changes:
 
@@ -74,42 +72,23 @@ We can also start chats and request changes directly from the command line. The
    System: Patch applied
 
 .. note::
-    With the browser extras installed, the assistant can also process URLs included in the prompt.
+    With the :ref:`tools:browser` extras installed, the assistant can also process URLs included in the prompt.
 
-Other tools
-***********
+More tools
+**********
 
-You can read about other tools on the :doc:`tools` page.
+You can read about all the other tools on the :doc:`tools` page.
 
-Other interfaces
-****************
+Including :ref:`tools:shell`, :ref:`tools:python`, how to set up :ref:`tools:browser`, and use :ref:`tools:vision`.
 
-There are other ways to interact with the assistant:
+Interfaces
+**********
 
-Command line
-^^^^^^^^^^^^
+There are several ways to interact with gptme:
 
-Commands can also be executed directly from the command line. For example, one can skip confirmation prompts and  run in non-interactive mode to terminate when all prompts have been completed:
-
-.. code-block:: bash
-
-    gptme --non-interactive --no-confirm 'create a snake game using curses in snake.py, dont run it' '-' 'make the snake green and the apple red'
-
-This should make it first write snake.py, then make the change in a following prompt. The '-' is special "multiprompt" syntax that tells the assistant to wait for the next prompt before continuing.
-
-Web UI
-^^^^^^
-
-To run the assistant in a web interface, execute:
-
-.. code-block:: bash
-
-    gptme-server
-
-This should let you view your chats in a web browser and make basic requests.
-
-.. note::
-    The web interface is still in development and is not fully functional (no confirmation prompts or streaming).
+- :doc:`CLI <cli>`
+- :ref:`server:web ui`
+- :doc:`bot`
 
 Support
 -------

diff --git a/docs/server.rst b/docs/server.rst
@@ -3,7 +3,7 @@ Server
 
 .. note::
    The server and web UI is still in development and does not have all the features of the CLI.
-   It does not support streaming, doesn't ask for confirmation before executing, lacks the ability to interrupt generations, etc.
+   It does not support streaming, doesn't ask for confirmation before executing, lacks the ability to interrupt responses and tool calls, etc.
 
 gptme has a minimal REST API with very minimalistic web UI.
 
@@ -16,6 +16,12 @@ It can be started by running the following command:
 Web UI
 ------
 
+.. code-block:: bash
+
+    gptme-server
+
+This should let you view your chats in a web browser and make basic requests.
+
 You can then access the web UI by visiting http://localhost:5000 in your browser.
 
-For more usage, see `the CLI documentation <cli.html#gptme-server>`_.
+For more usage, see :ref:`the CLI documentation <cli:gptme-server>`.
diff --git a/docs/tools.rst b/docs/tools.rst
@@ -3,24 +3,31 @@ Tools
 
 Tools available in gptme.
 
-The main tools can be grouped in the following categories:
+The tools can be grouped into the following categories:
 
-- execution
+- Execution
 
   - `Shell`_
   - `Python`_
   - `Tmux`_
+  - `Subagent`_
 
-- filesystem
+- Files
 
+  - `Read`_
   - `Save`_
   - `Patch`_
 
-- network
+- Network
 
   - `Browser`_
 
-- chat management
+- Vision
+
+  - `Screenshot`_
+  - `Vision`_
+
+- Chat management
 
   - `Chats`_
 
@@ -45,6 +52,20 @@ Tmux
     :members:
     :noindex:
 
+Subagent
+--------
+
+.. automodule:: gptme.tools.subagent
+    :members:
+    :noindex:
+
+Read
+----
+
+.. automodule:: gptme.tools.read
+    :members:
+    :noindex:
+
 Save
 ----
 
@@ -59,13 +80,27 @@ Patch
     :members:
     :noindex:
 
+Screenshot
+----------
+
+.. automodule:: gptme.tools.screenshot
+    :members:
+    :noindex:
+
 Browser
 -------
 
 .. automodule:: gptme.tools.browser
     :members:
     :noindex:
 
+Vision
+------
+
+.. automodule:: gptme.tools.vision
+    :members:
+    :noindex:
+
 Chats
 -----
 

diff --git a/gptme/tools/base.py b/gptme/tools/base.py
@@ -1,6 +1,7 @@
 import logging
 from collections.abc import Callable, Generator
 from dataclasses import dataclass, field
+from textwrap import indent
 from typing import Literal, Protocol, TypeAlias
 
 from lxml import etree
@@ -58,11 +59,21 @@ def get_doc(self, doc: str | None = None) -> str:
             doc = ""
         else:
             doc += "\n\n"
+        if self.instructions:
+            doc += f"""
+.. rubric:: Instructions
+
+.. code-block:: markdown
+
+{indent(self.instructions, "    ")}\n\n"""
         if self.examples:
-            doc += (
-                f"# Examples\n\n{transform_examples_to_chat_directives(self.examples)}"
-            )
-        return doc
+            doc += f"""
+.. rubric:: Examples
+
+{transform_examples_to_chat_directives(self.examples)}\n\n
+"""
+        # doc += """.. rubric:: Members"""
+        return doc.strip()
 
     def __eq__(self, other):
         if not isinstance(other, ToolSpec):

diff --git a/gptme/tools/browser.py b/gptme/tools/browser.py
@@ -8,6 +8,13 @@
 .. note::
 
     This is an experimental feature. It needs some work to be more robust and useful.
+
+To use the browser tool, you need to have the `playwright` Python package installed along with gptme, which you can install with:
+
+.. code-block:: bash
+
+    pipx install gptme[browser]
+    gptme '/shell playwright install chromium'
 """
 
 import importlib.util
@@ -19,7 +26,7 @@
 import tempfile
 from typing import Literal
 
-from .base import ToolSpec
+from .base import ToolSpec, ToolUse
 
 has_playwright = importlib.util.find_spec("playwright") is not None
 
@@ -40,64 +47,37 @@
 To browse the web, you can use the `read_url`, `search`, and `screenshot_url` functions in Python.
 """.strip()
 
-examples = """
+examples = f"""
 ### Answer question from URL with browsing
 User: find out which is the latest ActivityWatch version from superuserlabs.org
 Assistant: Let's browse the site.
-```ipython
-read_url("https://superuserlabs.org/")
-```
+{ToolUse("ipython", [], "read_url('https://superuserlabs.org/')").to_output()}
 System:
-```https://superuserlabs.org/
-...
-[ActivityWatch](https://activitywatch.net/)
-...
-```
+{ToolUse("https://superuserlabs.org/", [], "... [ActivityWatch](https://activitywatch.net/) ...".strip()).to_output()}
 Assistant: Couldn't find the answer on the page. Following link to the ActivityWatch website.
-```ipython
-read_url("https://activitywatch.net/")
-```
+{ToolUse("ipython", [], "read_url('https://activitywatch.net/')").to_output()}
 System:
-```https://activitywatch.net/
-...
-Download latest version v0.12.2
-...
-```
+{ToolUse("https://activitywatch.net/", [], "... Download latest version v0.12.2 ...".strip()).to_output()}
 Assistant: The latest version of ActivityWatch is v0.12.2
 
 ### Searching
 User: who is the founder of ActivityWatch?
 Assistant: Let's search for that.
-```ipython
-search("ActivityWatch founder")
-```
+{ToolUse("ipython", [], "search('ActivityWatch founder')").to_output()}
 System:
-```results
-1. [ActivityWatch](https://activitywatch.net/)
-...
-```
+{ToolUse("results", [], "1. [ActivityWatch](https://activitywatch.net/) ...").to_output()}
 Assistant: Following link to the ActivityWatch website.
-```ipython
-read_url("https://activitywatch.net/")
-```
+{ToolUse("ipython", [], "read_url('https://activitywatch.net/')").to_output()}
 System:
-```https://activitywatch.net/
-...
-The ActivityWatch project was founded by Erik Bjäreholt in 2016.
-...
-```
+{ToolUse("https://activitywatch.net/", [], "... The ActivityWatch project was founded by Erik Bjäreholt in 2016. ...".strip()).to_output()}
 Assistant: The founder of ActivityWatch is Erik Bjäreholt.
 
 ### Take screenshot of page
 User: take a screenshot of the ActivityWatch website
 Assistant: Certainly! I'll use the browser tool to screenshot the ActivityWatch website.
-```ipython
-screenshot_url("https://activitywatch.net")
-```
+{ToolUse("ipython", [], "screenshot_url('https://activitywatch.net')").to_output()}
 System:
-```
-Screenshot saved to screenshot.png
-```
+{ToolUse("result", [], "Screenshot saved to screenshot.png").to_output()}
 """.strip()
 
 

diff --git a/gptme/tools/chats.py b/gptme/tools/chats.py
@@ -9,7 +9,7 @@
 from typing import TYPE_CHECKING
 
 from ..message import Message
-from .base import ToolSpec
+from .base import ToolSpec, ToolUse
 
 if TYPE_CHECKING:
     from ..logmanager import LogManager
@@ -193,13 +193,11 @@ def read_chat(conversation: str, max_results: int = 5, incl_system=False) -> Non
 The chats tool allows you to list, search, and summarize past conversation logs.
 """
 
-examples = """
+examples = f"""
 ### Search for a specific topic in past conversations
 User: Can you find any mentions of "python" in our past conversations?
 Assistant: Certainly! I'll search our past conversations for mentions of "python" using the search_chats function.
-```ipython
-search_chats("python")
-```
+{ToolUse("ipython", [], "search_chats('python')").to_output()}
 """
 
 tool = ToolSpec(