-
-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
register_models() plugin hook #65
Conversation
For continue mode, it's all about building the prompt. Here's the current implementation: Lines 176 to 184 in 9190051
The key bit of Lines 552 to 555 in 9190051
So maybe this is all about building a prompt an alternative way, perhaps like this: # History is a list of logged messages:
prompt = Prompt.from_history(history, prompt, system) |
OpenAI chat models work by building up that |
It looks like PaLM2 has a separate chat mode, which is mainly documented through code examples: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/chat.py |
Untangling chat mode for PaLM 2 is a bigger job than I want to take on right now - more notes here: #20 (comment) |
I'm not happy with how streaming works yet. I think models should have the following methods:
|
I was contemplating if a Then I remembered that there are some LLMs that run locally that have significant startup costs (loading data into memory/the GPU) - which is a good fit for a class, because that way they can be loaded once and then reused for subsequent calls to |
Copying in some miscellaneous notes I made on the API design: The streaming vs not streaming thing still feels a bit ugly. The key problem I'm trying to solve is providing a generic interface to an LLM that will enable the following:
I want the simplest possible API to implement in order to solve all of these - adding new models should be as easy as possible. So maybe two methods: Perhaps have a If there is ever a model that only does streaming the subclass can still implement Maybe class VertexModel(Model):
class Options(Model.Options):
temperature: float
class Response:
..
It would be good to have a Python library utility mechanism for this logging, since then users could log prompts that their app is using easily. Maybe it's the Logging should be off by default but easy to opt into, maybe by providing a A So the internals documentation ends up covering these concepts:
For OpenAI functions I could introduce a So the API for a model is response = model.prompt(...) Or for chunk in model.stream(...)
for response in model.chain(...):
print(response.text()) Maybe this: for response in model.chain_stream(...):
for token in response:
print(token, end="") I'm going with What abstraction could I build so other chains can be easily constructed? Like there's a thing where the user gets to define a function that takes a response and decides if there should be another prompt (which the functions stuff can then be built on): def next(response, prompt):
return None # or str or Prompt
model.chain(next)
model.chain(lambda response: "fact check this: $input", once=True) So |
I'm going to try a bit of documentation-driven development here. |
Here's that first draft of the internals documentation - next step, actually implement what's described there: https://github.com/simonw/llm/blob/c2ec8a9c60ac38d152ed48ba8c7c067c8d2c9859/docs/python-api.md |
The default implementation of continue mode can be really basic: it just grabs the text version of the prompts and responses from the previous messages into a list and joins them with newlines. |
I'm still not clear on the best way to truncate messages in continue mode, right now I'm going to leave that and allow the model to return an error - but it would be good to have a strategy for that involving automatic truncating later on. |
I don't think I should solve continuation mode until I've re-implemented logging to the database. Although... I do want continue mode to be possible even without having database logging, at least at the Python API layer. |
Got all the tests passing, partly by disabling the DB logging test. |
I think that test fails because of timezone differences between GitHub Actions and my laptop. |
Docs can also be previewed here: https://llm--65.org.readthedocs.build/en/65/python-api.html |
I think a Maybe there should be a way in which these chain together, for the purposes of modeling conversations in situations where the SQLite log isn't being used? Then perhaps you could do this: response = model.prompt("Ten names for a pet pelican")
print(response.text())
response2 = response.reply("Make them more grandiose")
print(response2.text()) Problem with Does it make sense for that |
Another option: the model itself could keep a in-memory cache of its previous prompts, such that you can then reply via the model. I'm not keen on this though, because the conversation state shouldn't be shared by every user of the model instance in a situation like |
The current signature for Lines 28 to 33 in 69ce584
If it also grew a |
What would this look like if everything was represented in terms of chains of Prompts and Responses? Those chains could then be serialized and deserialized to SQLite, or to JSON or other formats too. Especially when function start coming into play, there's something very interesting about storing a high fidelity representation of the full sequence of prompts and responses that got to the most recent state. |
Whether or not something should stream is currently a property of the Response. That works: the |
|
Something very broken there. I tried
And I get this error when I run it:
|
Fixed the |
Now that some models live in I'm tempted to add this helper: from llm import get_model
gpt4 = get_model("gpt-4") This will provide Python API level access to both the model plugins mechanism and the aliases mechanism. |
This means that if you do this: llm -m markov -o length -1 You will see an error message rather than have the command hang waiting for a prompt to be entered on stdin.
To ensure any changes made by ruff --fix are reformatted.
- Moved a whole bunch of things from llm/cli.py into llm/__init__.py - Switched plugin listings to use importlib.metadata to avoid deprecation warning - iter_prompt() is now a method on Model, not on Response
I wanted to rebase this branch, but GitHub said there were conflicts. Following https://stackoverflow.com/a/50012219/6083 I ran these commands locally: git checkout register-models
git rebase main
# There was a conflict in setup.py which I fixed
git add setup.py
git rebase --continue
git push --force-with-lease |
I don't like how that |
Looks like I can fix that with: git filter-repo --commit-callback '
if commit.committer_date == b"Mon Jul 10 08:39:00 2023 -0700":
commit.committer_date = commit.author_date
' --force
|
That didn't quite work - it didn't actually update the commit dates. This helped debug it: git filter-repo --commit-callback '
print(repr(commit.committer_date))
' --force Turns out those dates look like this:
So I ran this command: git filter-repo --commit-callback '
if commit.committer_date == b"1689003540 -0700":
commit.committer_date = commit.author_date
' --force And then force pushed it all to git push --force-with-lease origin main |
Turned that into a TIL: https://til.simonwillison.net/git/git-filter-repo |
register_models
#53TODO:
Prototype of functions support to further validate the design