-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial plugin design #1
Comments
I hacked together this prototype as a starting point: migration = Migrations("datasette_llm")
@migration()
def create_usage_table(db):
db["llm_usage"].create({
"id": int,
"created": float,
"model": str,
"purpose": str,
"actor_id": str,
"input_tokens": int,
"output_tokens": int,
}, pk="id")
class WrappedModel:
def __init__(self, model, datasette, purpose=None):
self.model = model
self.datasette = datasette
self.purpose = purpose
async def prompt(self, prompt, system=None, actor_id=None, **kwargs):
response = self.model.prompt(prompt, system=system, **kwargs)
async def done(response):
# Log usage against current actor_id and purpose
usage = await response.usage()
input_tokens = usage.input
output_tokens = usage.output
db = self.datasette.get_database("llm")
await db.execute_write("""
insert into llm_usage (created, model, purpose, actor_id, input_tokens, output_tokens)
values (:created, :model, :purpose, {actor_id}, :input_tokens, :output_tokens)
""".format(actor_id = ":actor_id" if actor_id else "null"), {
"created": time.time(),
"model": self.model.model_id,
"purpose": self.purpose,
"actor_id": actor_id,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
})
await response.on_done(done)
return response
def __repr__(self):
return f"WrappedModel: {self.model.model_id}"
class LLM:
def __init__(self, datasette):
self.datasette = datasette
def get_async_models(self):
return [WrappedModel(model) for model in llm.get_async_models()]
def get_async_model(self, model_id=None, purpose=None):
return WrappedModel(
llm.get_async_model(model_id), self.datasette, purpose=purpose
) Usage: from datasette.app import Datasette
from datasette_llm import LLM
ds = Datasette()
await ds.invoke_startup()
llm = LLM(ds)
m = llm.get_async_model("gpt-4o-mini")
r = await m.prompt("hi")
await r.text() |
This design stores timestamps as a float, which I haven't actually done before. I don't want to store them as strings because it's too wasteful. Integers are also an option, but I'd like to capture time more finely grained than one second, so I could use ms-since-epoch ( REAL in SQLite is always 8 bytes. Integer can be 1, 2, 3, 4, 6 or 8 depending on size: https://www.sqlite.org/datatype3.html So I think 4 bytes for integer timestamp and 6 bytes for ms integer timestamp. |
I'm going to do integer ms since epoch, which may then result in other features in the Datasette ecosystem to better support that. |
To make that async def my_view(datasette):
model = datasette.llm.get_async_model("gpt-4o-mini") This would work by having a Second option: from datasette_llm_usage import LLM
llm = LLM(datasette) This avoids the I'm leaning to the second option, because of Python's optional typing. I'll implement that first, then maybe add a |
Did some brainstorming with Claude about how allowances should work: https://gist.github.com/simonw/5339a4bc71508e553cee73ab00b350eb A tricky thing about allowances is that I want to have one per model family - so a user might get 100,000 free OpenAI tokens per day, 10,000 Anthropic etc - but that needs to take into account the difference in price between models. GPT-4o-mini is 2.5 / 0.15 = 16.7 times cheaper than GPT-4o. I'm going to go with a "credits" system where we count credits internally that then map to token allowances, but implement a UI feature that shows you things like "16,000 GPT-4o-mini tokens (1,000 GPT-4o tokens) left" - so users never have to think about those raw credit numbers out of context. |
Minor problem: where does the information about the relative prices of the model families live? I could put it in the LLM plugins but I've avoided baking in pricing information so far because I don't want to ship a new plugin version any time the prices change. I think for the moment that stuff goes in this plugin, and can be over-ridden in configuration. |
We can only track usage and allowances against a model that we've got pricing information for, so we can specify that a model won't be available via this plugin unless it has been configured. |
I'm tempted to track credits as floating point numbers. I know that's bad practice for real money accounting, but here I don't think there's any harm in the occasional floating point inaccuracy creeping in. |
If I track usage as integers, maybe I do it in the equivalent of thousandths-of-a-cent? Gemini 1.5 Flash 8B is so cheap that even a 100 token input costs less than 1/1000th of a cent though. Round that up to 1? I think that would be OK. |
I just got Flash 8B to write me a Haiku and it cost:
|
Wow Gemini Flash 8B is cheap. llm -m gemini-1.5-flash-8b-latest 'describe image in detail' -a https://static.simonwillison.net/static/2024/recraft-ai.jpg -u
https://static.simonwillison.net/static/2024/recraft-ai.jpg That's:
|
How do I map a user to an allowance? Maybe I punt on that for the moment, and just support global allowances for a specific Datasette instance. I can add per-user allowances later on. |
The credit mechanism would make it possible to have an allowance that spans multiple models, which is probably better overall. I'm going to implement that. So the simple initial version of allowances says that:
I'm also going to have a little bit of a denormalization where the number of remaining credits in the allowance is stored on that table. |
Another challenge: the Gemini models charge differently for <128,000 tokens v.s. >128,000 tokens. |
Actually I do want to support multiple allowances - if a user goes wild with the enrichments feature and burns through all their tokens I'd still like them to be able to use the MUCH cheaper query assistant out of a separate budget. |
I think allowances have an optional |
@dataclass
class Price:
name: str
model_id: str
size_limit: Optional[int]
input_token_cost_10000th_cent: int
output_token_cost_10000th_cent: int
def cost_in_cents(self, input_tokens: int, output_tokens: int):
return (
input_tokens * self.input_token_cost_10000th_cent
+ output_tokens * self.output_token_cost_10000th_cent
) / 1000000
PRICES = [
Price("gemini-1.5-flash", "gemini-1.5-flash", 128000, 7, 30),
Price("gemini-1.5-flash-128k", "gemini-1.5-flash", None, 15, 60),
Price("gemini-1.5-flash-8b", "gemini-1.5-flash-8b", 128000, 3, 15),
Price("gemini-1.5-flash-8b-128k", "gemini-1.5-flash-8b", None, 7, 30),
Price("gemini-1.5-pro", "gemini-1.5-pro", 128000, 125, 500),
Price("gemini-1.5-pro-128k", "gemini-1.5-pro", None, 250, 1000),
Price("claude-3.5-sonnet", "claude-3.5-sonnet", None, 300, 1500),
Price("claude-3-opus", "claude-3-opus", None, 1500, 7500),
Price("claude-3-haiku", "claude-3-haiku", None, 25, 125),
Price("claude-3.5-haiku", "claude-3.5-haiku", None, 100, 500),
Price("gpt-4o", "gpt-4o", None, 250, 1000),
Price("gpt-4o-mini", "gpt-4o-mini", None, 15, 60),
Price("o1-preview", "o1-preview", None, 1500, 6000),
Price("o1-mini", "o1-mini", None, 300, 1200),
] Using prices from https://tools.simonwillison.net/llm-prices
|
I pushed an initial alpha, mainly to reserve the name on PyPI. Still a bunch more needed:
|
Goal of this plugin is to track token usage of various LLM models when used by Datasette plugins, to help provide a mechanism for things like only allowing X tokens per day (for free demo apps) or allowing customers of SaaS platforms to purchase more token allowance.
Will use the new features from LLM 0.19:
The text was updated successfully, but these errors were encountered: