PrefectHQ · jlowin · Jan 14, 2024 · Jan 14, 2024
diff --git a/docs/ai/interactive/assistants.md b/docs/ai/interactive/assistants.md
@@ -1,6 +1,9 @@
 # Working with assistants
 
-Many of Marvin's features are standalone functions, without memory. However, interactive conversation is one of the most powerful ways to work with LLMs, allowing collaboration, context discovery, and feedback. OpenAI's assistants API makes this possible while handling stateful complexities like system messages, history, and separate threads. Marvin's assistants API is a Pythonic way to take advantage of those features/
+Marvin has an extremely intuitive API for working with OpenAI assistants. Assistants are a powerful way to interact with LLMs, allowing you to maintain state, context, and multiple threads of conversation. 
+
+The need to manage all this state makes the assistants API very different from the more familiar "chat" APIs that OpenAI and other providers offer. The benefit of abandoning the more traditional request/response pattern of user messages and AI responses is that assistants can invoke more powerful workflows, including calling custom functions and posting multiple messages related to their progress. Marvin's developer experience is focused on making all that interactive, stateful power as accessible as possible.
+
 
 <div class="admonition abstract">
   <p class="admonition-title">What it does</p>
@@ -17,7 +20,7 @@ Many of Marvin's features are standalone functions, without memory. However, int
     from marvin.beta.assistants import Assistant, pprint_messages
 
     # create an assistant
-    ai = Assistant(name="Marvin",  instructions="You the Paranoid Android.")
+    ai = Assistant(name="Marvin", instructions="You the Paranoid Android.")
 
     # send a message to the assistant and have it respond
     response = ai.say('Hello, Marvin!')
@@ -112,7 +115,7 @@ A major advantage of using Marvin's assistants API is that you can add your own
 
     # Integrate custom tools with the assistant
     ai = Assistant(name="Marvin", tools=[visit_url])
-    response = ai.say("Count how many HN front page titles mention LLMs")
+    response = ai.say("What's the top story on Hacker News?")
 
     # pretty-print the response
     pprint_messages(response)
@@ -284,8 +287,39 @@ As part of a run, the assistant may decide to use one or more tools to generate
 
 You can use an assistant's `say` method to simulate a simple request/response pattern against the assistant's default thread. However, for more advanced control, in particular for maintaining multiple conversations at once, you'll want to manage  threads directly.
 
-To run a thread with an assistant, use its `run` method. This will return a `Run` object that represents the OpenAI run. You can use this object to inspect all actions the assistant took, including tool use, messages posted, and more.
+To run a thread with an assistant, use its `run` method: 
+```python
+thread.run(assistant=assistant)
+``` 
+
+This will return a `Run` object that represents the OpenAI run. You can use this object to inspect all actions the assistant took, including tool use, messages posted, and more.
+
+!!! tip "Assistant lifecycle management applies to threads"
+    When threads are `run` with an assistant, the same lifecycle management rules apply as when you use the assistant's `say` method. In the above example, lazy lifecycle management is used for conveneince. See [lifecycle management](#lifecycle-management) for more information.
+
+!!! warning "Threads are locked while running"
+    When an assistant is running a thread, the thread is locked and no other messages can be added to it. This applies to both user and assistant messages.
+
+### Reading messages
+
+To read the messages in a thread, use its `get_messages` method:
+
+```python
+messages = thread.get_messages()
+```
+
+Messages are always returned in ascending order by timestamp, and the last 20 messages are returned by default.
+
+To control the output, you can provide the following parameters:
+    - `limit`: the number of messages to return (1-100)
+    - `before_message`: only return messages chronologically earlier than this message ID
+    - `after_message`: only return messages chronologically later than this message ID
+
+#### Printing messages
+
+Messages are not strings, but structured message objects. Marvin has a few utilities to help you print them in a human-readable way, most notably the `pprint_messages` function used throughout in this doc.
 
+### Full example with threads
 
 !!! example "Running a thread"
     This example creates an assistant with a tool that can roll dice, then instructs the assistant to roll two--no, five--dice:
@@ -295,60 +329,35 @@ To run a thread with an assistant, use its `run` method. This will return a `Run
     from marvin.beta.assistants.formatting import pprint_messages
     import random
 
-    # write a function to be used as a tool
+    # write a function for the assistant to use
     def roll_dice(n_dice: int) -> list[int]:
         return [random.randint(1, 6) for _ in range(n_dice)]
 
     ai = Assistant(name="Marvin", tools=[roll_dice])
 
-    # create a new thread to track history
+    # create a thread - you could pass an ID to resume a conversation
     thread = Thread()
 
-    # add any number of user messages to the thread
-    thread.add("Hello")
+    # add a user messages to the thread
+    thread.add("Hello!")
 
     # run the thread with the AI to produce a response
     thread.run(ai)
 
-    # post more messages
-    thread.add("please roll two dice")
-    thread.add("actually roll five dice")
+    # post two more user messages
+    thread.add("Please roll two dice")
+    thread.add("Actually--roll five dice")
 
-    # run the thread again with the latest messages
+    # run the thread again to generate a new response
     thread.run(ai)
 
-    # print the messages
+    # see all the messages
     pprint_messages(thread.get_messages())
     ```
 
     !!! success "Result"
         ![](/assets/images/ai/assistants/advanced.png)
 
-!!! tip "Assistant lifecycle management applies to threads"
-    When threads are `run` with an assistant, the same lifecycle management rules apply as when you use the assistant's `say` method. In the above example, lazy lifecycle management is used for conveneince. See [lifecycle management](#lifecycle-management) for more information.
-
-!!! warning "Threads are locked while running"
-    When an assistant is running a thread, the thread is locked and no other messages can be added to it. This applies to both user and assistant messages.
-
-### Reading messages
-
-To read the messages in a thread, use the `get_messages` method:
-
-```python
-messages = thread.get_messages()
-```
-
-Messages are always returned in ascending order by timestamp, and the last 20 messages are returned by default.
-
-To control the output, you can provide the following parameters:
-    - `limit`: the number of messages to return (1-100)
-    - `before_message`: only return messages chronologically earlier than this message ID
-    - `after_message`: only return messages chronologically later than this message ID
-
-#### Printing messages
-
-Messages are not strings, but structured message objects. Marvin has a few utilities to help you print them in a human-readable way, most notably the `pprint_messages` function used throughout in this doc.
-
 ### Async support
 
 Every `Thread` method has a corresponding async version. To use the async API, append `_async` to the method name.

diff --git a/docs/assets/images/ai/assistants/advanced.png b/docs/assets/images/ai/assistants/advanced.png
diff --git a/docs/assets/images/ai/assistants/code_interpreter.png b/docs/assets/images/ai/assistants/code_interpreter.png
diff --git a/docs/assets/images/ai/assistants/custom_tools.png b/docs/assets/images/ai/assistants/custom_tools.png
diff --git a/docs/assets/images/ai/assistants/quickstart.png b/docs/assets/images/ai/assistants/quickstart.png
diff --git a/docs/assets/images/ai/assistants/sin_x.png b/docs/assets/images/ai/assistants/sin_x.png
diff --git a/src/marvin/beta/applications/state/state.py b/src/marvin/beta/applications/state/state.py
@@ -3,7 +3,7 @@
 from typing import Optional, Union
 
 from jsonpatch import JsonPatch
-from pydantic import BaseModel, Field, PrivateAttr
+from pydantic import BaseModel, Field, PrivateAttr, SerializeAsAny
 
 from marvin.types import Tool
 from marvin.utilities.tools import tool_from_function
@@ -26,7 +26,7 @@ class JSONPatchModel(BaseModel, populate_by_name=True):
 
 
 class State(BaseModel):
-    value: Union[BaseModel, dict] = {}
+    value: SerializeAsAny[Union[BaseModel, dict]] = {}
     _last_saved_value: Optional[Union[BaseModel, dict]] = PrivateAttr(None)
 
     def render(self) -> str:

diff --git a/src/marvin/beta/assistants/assistants.py b/src/marvin/beta/assistants/assistants.py
@@ -13,11 +13,12 @@
 from marvin.utilities.logging import get_logger
 from marvin.utilities.openai import get_openai_client
 
-from .threads import Thread
+from .threads import Thread, ThreadMessage
 
 if TYPE_CHECKING:
     from .runs import Run
 
+
 logger = get_logger("Assistants")
 
 
@@ -60,31 +61,30 @@ async def say_async(
         message: str,
         file_paths: Optional[list[str]] = None,
         thread: Optional[Thread] = None,
+        return_user_message: bool = False,
         **run_kwargs,
-    ) -> "Run":
+    ) -> list[ThreadMessage]:
         """
         A convenience method for adding a user message to the assistant's
         default thread, running the assistant, and returning the assistant's
         messages.
         """
         thread = thread or self.default_thread
 
-        last_message = await thread.get_messages_async(limit=1)
-        if last_message:
-            last_msg_id = last_message[0].id
-        else:
-            last_msg_id = None
-
         # post the message
-        if message:
-            await thread.add_async(message, file_paths=file_paths)
+        user_message = await thread.add_async(message, file_paths=file_paths)
 
         # run the thread
         async with self:
             await thread.run_async(assistant=self, **run_kwargs)
 
         # load all messages, including the user message
-        response_messages = await thread.get_messages_async(after_message=last_msg_id)
+        response_messages = await thread.get_messages_async(
+            after_message=user_message.id
+        )
+
+        if return_user_message:
+            response_messages = [user_message] + response_messages
         return response_messages
 
     def __enter__(self):

diff --git a/src/marvin/beta/assistants/runs.py b/src/marvin/beta/assistants/runs.py
@@ -80,6 +80,7 @@ async def _handle_step_requires_action(self):
                         tools=tools,
                         function_name=tool_call.function.name,
                         function_arguments_json=tool_call.function.arguments,
+                        return_string=True,
                     )
                 except CancelRun as exc:
                     logger.debug(f"Ending run with data: {exc.data}")

diff --git a/src/marvin/utilities/tools.py b/src/marvin/utilities/tools.py
@@ -143,7 +143,7 @@ def call_function_tool(
     function_name: str,
     function_arguments_json: str,
     return_string: bool = False,
-):
+) -> str:
     tool = next(
         (
             tool