Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial plugin design #1

Open
simonw opened this issue Dec 2, 2024 · 18 comments
Open

Initial plugin design #1

simonw opened this issue Dec 2, 2024 · 18 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Contributor

simonw commented Dec 2, 2024

Goal of this plugin is to track token usage of various LLM models when used by Datasette plugins, to help provide a mechanism for things like only allowing X tokens per day (for free demo apps) or allowing customers of SaaS platforms to purchase more token allowance.

Will use the new features from LLM 0.19:

@simonw simonw added the enhancement New feature or request label Dec 2, 2024
@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

I hacked together this prototype as a starting point:

migration = Migrations("datasette_llm")

@migration()
def create_usage_table(db):
    db["llm_usage"].create({
        "id": int,
        "created": float,
        "model": str,
        "purpose": str,
        "actor_id": str,
        "input_tokens": int,
        "output_tokens": int,
    }, pk="id")


class WrappedModel:
    def __init__(self, model, datasette, purpose=None):
        self.model = model
        self.datasette = datasette
        self.purpose = purpose

    async def prompt(self, prompt, system=None, actor_id=None, **kwargs):
        response = self.model.prompt(prompt, system=system, **kwargs)
        async def done(response):
            # Log usage against current actor_id and purpose
            usage = await response.usage()
            input_tokens = usage.input
            output_tokens = usage.output
            db = self.datasette.get_database("llm")
            await db.execute_write("""
            insert  into llm_usage (created, model, purpose, actor_id, input_tokens, output_tokens)
            values (:created, :model, :purpose, {actor_id}, :input_tokens, :output_tokens)
            """.format(actor_id = ":actor_id" if actor_id else "null"), {
                "created": time.time(),
                "model": self.model.model_id,
                "purpose": self.purpose,
                "actor_id": actor_id,
                "input_tokens": input_tokens,
                "output_tokens": output_tokens,
            })
        await response.on_done(done)
        return response

    def __repr__(self):
        return f"WrappedModel: {self.model.model_id}"


class LLM:
    def __init__(self, datasette):
        self.datasette = datasette

    def get_async_models(self):
        return [WrappedModel(model) for model in llm.get_async_models()]

    def get_async_model(self, model_id=None, purpose=None):
        return WrappedModel(
            llm.get_async_model(model_id), self.datasette, purpose=purpose
        )

Usage:

from datasette.app import Datasette
from datasette_llm import LLM
ds = Datasette()

await ds.invoke_startup()
llm = LLM(ds)
m = llm.get_async_model("gpt-4o-mini")
r = await m.prompt("hi")
await r.text()

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

This design stores timestamps as a float, which I haven't actually done before. I don't want to store them as strings because it's too wasteful. Integers are also an option, but I'd like to capture time more finely grained than one second, so I could use ms-since-epoch (int(time.time() * 1000)).

REAL in SQLite is always 8 bytes. Integer can be 1, 2, 3, 4, 6 or 8 depending on size: https://www.sqlite.org/datatype3.html

So I think 4 bytes for integer timestamp and 6 bytes for ms integer timestamp.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

I'm going to do integer ms since epoch, which may then result in other features in the Datasette ecosystem to better support that.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

To make that llm object available I considered two patterns:

async def my_view(datasette):
    model = datasette.llm.get_async_model("gpt-4o-mini")

This would work by having a startup() plugin hook that set up datasette.llm as a new property for other plugins to access.

Second option:

from datasette_llm_usage import LLM

llm = LLM(datasette)

This avoids the datasette.llm extra property trick, at the expense of a more verbose import.

I'm leaning to the second option, because of Python's optional typing. I'll implement that first, then maybe add a datasette.llm shortcut in the future if it feels right.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

Did some brainstorming with Claude about how allowances should work: https://gist.github.com/simonw/5339a4bc71508e553cee73ab00b350eb

A tricky thing about allowances is that I want to have one per model family - so a user might get 100,000 free OpenAI tokens per day, 10,000 Anthropic etc - but that needs to take into account the difference in price between models. GPT-4o-mini is 2.5 / 0.15 = 16.7 times cheaper than GPT-4o.

I'm going to go with a "credits" system where we count credits internally that then map to token allowances, but implement a UI feature that shows you things like "16,000 GPT-4o-mini tokens (1,000 GPT-4o tokens) left" - so users never have to think about those raw credit numbers out of context.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

Minor problem: where does the information about the relative prices of the model families live?

I could put it in the LLM plugins but I've avoided baking in pricing information so far because I don't want to ship a new plugin version any time the prices change.

I think for the moment that stuff goes in this plugin, and can be over-ridden in configuration.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

We can only track usage and allowances against a model that we've got pricing information for, so we can specify that a model won't be available via this plugin unless it has been configured.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

I'm tempted to track credits as floating point numbers. I know that's bad practice for real money accounting, but here I don't think there's any harm in the occasional floating point inaccuracy creeping in.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

If I track usage as integers, maybe I do it in the equivalent of thousandths-of-a-cent?

Gemini 1.5 Flash 8B is so cheap that even a 100 token input costs less than 1/1000th of a cent though. Round that up to 1? I think that would be OK.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

I just got Flash 8B to write me a Haiku and it cost:

6 input, 22 output:
$0.000004 or 0.0004 cents

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

Wow Gemini Flash 8B is cheap.

llm -m gemini-1.5-flash-8b-latest 'describe image in detail' -a https://static.simonwillison.net/static/2024/recraft-ai.jpg -u

The image is a digital illustration of a cartoon raccoon. 

The raccoon is light grayish-brown with distinctive white stripes on its tail and body. It has large, expressive eyes and a cheerful, slightly open-mouthed expression. It is holding a sign that says "I LOVE TRASH" in a simple, bold font. 

The background is a light, neutral beige or gray color. 

Small, light brown hearts are scattered lightly around the raccoon.

The image is presented within a digital interface, likely for design or creation purposes, as there are controls and options for resolution, file format (PNG, JPG, SVG, Lottie), style diversity settings, visibility, a Christmas customization option, and a "re-craft" button. There are also color settings showing hex code colors and a count of 7 colors.
Token usage: 263 input, 169 output

https://static.simonwillison.net/static/2024/recraft-ai.jpg

That's:

Total cost: $0.000035
Total cost: 0.0035 cents

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

How do I map a user to an allowance?

Maybe I punt on that for the moment, and just support global allowances for a specific Datasette instance. I can add per-user allowances later on.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

The credit mechanism would make it possible to have an allowance that spans multiple models, which is probably better overall. I'm going to implement that.

So the simple initial version of allowances says that:

  • An allowance is enforced over the entire instance
  • An allowance counts credits, different models then cost different credits per token to use
  • Only models that are explicitly configured to be usable (with pricing information provided) can be used with an allowance
  • Allowances can optionally be configured to reset at midnight UTC

I'm also going to have a little bit of a denormalization where the number of remaining credits in the allowance is stored on that table.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

Another challenge: the Gemini models charge differently for <128,000 tokens v.s. >128,000 tokens.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

Actually I do want to support multiple allowances - if a user goes wild with the enrichments feature and burns through all their tokens I'd still like them to be able to use the MUCH cheaper query assistant out of a separate budget.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

I think allowances have an optional purpose which, if present, causes that allowance to be used instead.

@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

@dataclass
class Price:
    name: str
    model_id: str
    size_limit: Optional[int]
    input_token_cost_10000th_cent: int
    output_token_cost_10000th_cent: int

    def cost_in_cents(self, input_tokens: int, output_tokens: int):
        return (
            input_tokens * self.input_token_cost_10000th_cent
            + output_tokens * self.output_token_cost_10000th_cent
        ) / 1000000


PRICES = [
    Price("gemini-1.5-flash", "gemini-1.5-flash", 128000, 7, 30),
    Price("gemini-1.5-flash-128k", "gemini-1.5-flash", None, 15, 60),
    Price("gemini-1.5-flash-8b", "gemini-1.5-flash-8b", 128000, 3, 15),
    Price("gemini-1.5-flash-8b-128k", "gemini-1.5-flash-8b", None, 7, 30),
    Price("gemini-1.5-pro", "gemini-1.5-pro", 128000, 125, 500),
    Price("gemini-1.5-pro-128k", "gemini-1.5-pro", None, 250, 1000),
    Price("claude-3.5-sonnet", "claude-3.5-sonnet", None, 300, 1500),
    Price("claude-3-opus", "claude-3-opus", None, 1500, 7500),
    Price("claude-3-haiku", "claude-3-haiku", None, 25, 125),
    Price("claude-3.5-haiku", "claude-3.5-haiku", None, 100, 500),
    Price("gpt-4o", "gpt-4o", None, 250, 1000),
    Price("gpt-4o-mini", "gpt-4o-mini", None, 15, 60),
    Price("o1-preview", "o1-preview", None, 1500, 6000),
    Price("o1-mini", "o1-mini", None, 300, 1200),
]

Using prices from https://tools.simonwillison.net/llm-prices

>>> from datasette_llm_usage import PRICES
>>> for price in PRICES:
...     print(price.model_id, price.cost_in_cents(1000, 100))
... 
gemini-1.5-flash 0.01
gemini-1.5-flash 0.021
gemini-1.5-flash-8b 0.0045
gemini-1.5-flash-8b 0.01
gemini-1.5-pro 0.175
gemini-1.5-pro 0.35
claude-3.5-sonnet 0.45
claude-3-opus 2.25
claude-3-haiku 0.0375
claude-3.5-haiku 0.15
gpt-4o 0.35
gpt-4o-mini 0.021
o1-preview 2.1
o1-mini 0.42

simonw added a commit that referenced this issue Dec 2, 2024
simonw added a commit that referenced this issue Dec 2, 2024
@simonw
Copy link
Contributor Author

simonw commented Dec 2, 2024

I pushed an initial alpha, mainly to reserve the name on PyPI. Still a bunch more needed:

  • Mechanism for cutting off users if they run out of credits
  • Mechanism for populating the credit allowance
  • The daily refresh thing

simonw added a commit that referenced this issue Jan 9, 2025
Added /-/llm-usage-simple-prompt and /-/llm-usage-credits pages

Documented new TokensExhausted exception

Refs #1, #2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant