Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New logging schema for conversations and continue mode #91

Closed
simonw opened this issue Jul 11, 2023 · 6 comments
Closed

New logging schema for conversations and continue mode #91

simonw opened this issue Jul 11, 2023 · 6 comments
Labels
enhancement New feature or request
Milestone

Comments

@simonw
Copy link
Owner

simonw commented Jul 11, 2023

To implement continue mode I'm going to need to persist these things to the database.

Which means I need a whole new schema, since I'm switching to using ULID IDs as part of this work.

Originally posted by @simonw in #85 (comment)

@simonw simonw added this to the 0.5 milestone Jul 11, 2023
@simonw simonw added the enhancement New feature or request label Jul 11, 2023
@simonw
Copy link
Owner Author

simonw commented Jul 11, 2023

Current schema:

llm/docs/logging.md

Lines 78 to 91 in 23eeb0f

CREATE TABLE "logs" (
[id] INTEGER PRIMARY KEY,
[model] TEXT,
[prompt] TEXT,
[system] TEXT,
[prompt_json] TEXT,
[options_json] TEXT,
[response] TEXT,
[response_json] TEXT,
[reply_to_id] INTEGER REFERENCES [logs]([id]),
[chat_id] INTEGER REFERENCES [logs]([id]),
[duration_ms] INTEGER,
[datetime_utc] TEXT
);

But I want to instead log conversations and responses. Here's what those look like at the code level right now:

llm/llm/models.py

Lines 47 to 52 in 23eeb0f

@dataclass
class Conversation:
model: "Model"
id: str = field(default_factory=lambda: str(ULID()).lower())
name: Optional[str] = None
responses: List["Response"] = field(default_factory=list)

llm/llm/models.py

Lines 74 to 89 in 23eeb0f

class Response(ABC):
def __init__(
self,
prompt: Prompt,
model: "Model",
stream: bool,
conversation: Optional[Conversation] = None,
):
self.prompt = prompt
self._prompt_json = None
self.model = model
self.stream = stream
self._chunks: List[str] = []
self._done = False
self._response_json = None
self.conversation = conversation

I think I'm going to replace or remove the LogMessage class entirely:

llm/llm/models.py

Lines 32 to 44 in 23eeb0f

@dataclass
class LogMessage:
model: str # Actually the model.model_id string
prompt: str # Simplified string version of prompt
system: Optional[str] # Simplified string of system prompt
prompt_json: Optional[Dict[str, Any]] # Detailed JSON of prompt
options_json: Dict[str, Any] # Any options e.g. temperature
response: str # Simplified string version of response
response_json: Optional[Dict[str, Any]] # Detailed JSON of response
reply_to_id: Optional[int] # ID of message this is a reply to
chat_id: Optional[
int
] # ID of chat this is a part of (ID of first message in thread)

@simonw
Copy link
Owner Author

simonw commented Jul 11, 2023

I'm a bit nervous about these ULIDs, which look like this:

>>> str(ulid.ULID()).lower()
'01h51r2j69dbj1qma2874bywvw'
>>> str(ulid.ULID()).lower()
'01h51r2jvwkx2d8t8dweynrkvb'
>>> str(ulid.ULID()).lower()
'01h51r2kay2hqj9ymw5fmckmhw'

(They are case-insensitive, I think lower-case is visually prettier.)

The downside of these is that they aren't things people can type, unlike llm --chat 34 "continue prompt". People will have to copy-and-paste them.

But... I expect most CLI usage of the continue mode to use llm --continue instead, which just uses the most recent ID without you needing to specify it.

The benefit of them is that they're globally unique, like UUIDs - which is great news if you want to e.g. run prompts on your local machine and then upload them to a shared space later.

Since shared prompt libraries feel like a useful thing to support, I'm going to use ULIDs.

@simonw
Copy link
Owner Author

simonw commented Jul 11, 2023

Calling them "conversations" does also mean that the llm --chat 34 option should perhaps be something a bit longer.

llm --conversation 34

Since -c is already taken by --continue, maybe I use a surprising shortcut letter like -x?

@simonw
Copy link
Owner Author

simonw commented Jul 11, 2023

I think the option is --cid - where C ID is short for Conversation ID.

@simonw
Copy link
Owner Author

simonw commented Jul 11, 2023

OK, I got the new schema working and got --continue mode to work too - but I still need to update how llm logs works and get all the tests passing and add new tests and documentation.

I'll move that work to a PR.

@simonw
Copy link
Owner Author

simonw commented Jul 11, 2023

Here's the new schema:

CREATE TABLE [conversations] (
   [id] TEXT PRIMARY KEY,
   [name] TEXT,
   [model] TEXT
);
CREATE TABLE [responses] (
   [id] TEXT PRIMARY KEY,
   [model] TEXT,
   [prompt] TEXT,
   [system] TEXT,
   [prompt_json] TEXT,
   [options_json] TEXT,
   [response] TEXT,
   [response_json] TEXT,
   [conversation_id] TEXT REFERENCES [conversations]([id]),
   [duration_ms] INTEGER,
   [datetime_utc] TEXT
);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant