rfc: anthropic cache usage #25684

baskaryan · 2024-08-22T22:48:51Z

Alternative to #25644

What do we want to do with anthropic cache token counts and UsageMetadata?

do nothing for now: UsageMetadata is meant to be standard across models and we dont know what other providers will do with caching. plus the info is there is AIMessage.response_metadata. main issue here is rn cached input tokens dont show up anywhere (neither in input_tokens nor total_tokens)
just add cached input tokens to total_tokens, so at least there's some way to figure out that those tokens exist
store cache token counts as their own keys on AIMessage.usage_metadata for anthropic outputs (current implementation in this pr). easy to implement and makes this info easily accessible to user, but will this lock us into to something we can't break in the future?
add cache token counts to existing "input_tokens" key (anthropic's data.usage.input_tokens does not take into account cached tokens so this isn't double counting). not useful for estimating pricing but useful for general token counting
add something like a "cost_adjusted_input_tokens" key that counts tokens in terms of costs (1.5 * cache_creation + 1 * input + 0.1 * cache_read). useful for estimating prices, but makes it a little harder to figure out when you've got cache hits/misses. is that important?

vercel · 2024-08-22T22:48:55Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)			Aug 22, 2024 10:48pm

mrdrprofuroboros · 2024-08-23T19:50:34Z

One more thing I'm facing right now is PropmtTemplate - I can't find a way to use it with Anthropic cache 'cause it wipes out the 'cache_control' attribute from the text block.

langchain/libs/core/langchain_core/prompts/chat.py

Line 521 in 92abf62

text, template_format=template_format

Here it just extracts text and throws all the rest away

Are there any ideas / workarounds?

baskaryan · 2024-08-23T21:00:27Z

One more thing I'm facing right now is PropmtTemplate - I can't find a way to use it with Anthropic cache 'cause it wipes out the 'cache_control' attribute from the text block.

langchain/libs/core/langchain_core/prompts/chat.py

Line 521 in 92abf62

text, template_format=template_format

Here it just extracts text and throws all the rest away
Are there any ideas / workarounds?

thinking of adding support for something like this, any thoughts? #25674

mrdrprofuroboros · 2024-08-23T21:55:43Z

lgtm
so far I just made a runnable lambda that injects cache_control after template compilation. this might be a bit cleaner for me since I substitute models and e.g. groq doesn't like that sort of dict messages and I have to strip them anyway
Does that make sense to control in templates? some param to either keep dicts or flat everything down

huynhphuchuy · 2024-09-01T11:17:36Z

lgtm so far I just made a runnable lambda that injects cache_control after template compilation. this might be a bit cleaner for me since I substitute models and e.g. groq doesn't like that sort of dict messages and I have to strip them anyway Does that make sense to control in templates? some param to either keep dicts or flat everything down

Hi @mrdrprofuroboros , could you share your approach with PromptTemplate? thanks :((

mrdrprofuroboros · 2024-10-01T21:01:39Z

Oh, sorry, I missed your message
sure, here's what I came up with, rather hacky, but works for me

def to_cached(content: str | List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    if isinstance(content, str):
        return [{
            "type": "text",
            "text": content,
            "cache_control": {"type": "ephemeral"},
        }]
    if isinstance(content[-1], str):
        content[-1] = {
            "type": "text",
            "text": content[-1],
            "cache_control": {"type": "ephemeral"},
        }
    else:
        content[-1]["cache_control"] = {"type": "ephemeral"}
    return content


def add_cache_control(compiled_chat: ChatPromptValue) -> ChatPromptValue:
    """
    Anthropic supports maximum 4 blocks with cache_control so we'll set
    - 1 on tools
    - 1 on the first and last system block
    - 1 on the last even 5th message
    """
    system_messages = []
    other_messages = []
    
    for message in compiled_chat.messages:
        if isinstance(message, SystemMessage):
            if not system_messages:
                system_messages.append(SystemMessage(to_cached(message.content)))
            else:
                system_messages[0].content.append(message.content)
        else:
            other_messages.append(deepcopy(message))

    if system_messages:
        system_messages[0].content = to_cached(system_messages[0].content)

    messages = system_messages + other_messages
    last_cached = (len(messages) - 2) // 5 * 5 # never set on the last message since it's constantly changing
    messages[last_cached].content = to_cached(messages[last_cached].content)
    return ChatPromptValue(messages=messages)
    

### and then

if isinstance(model, ChatAnthropic):
    model = RunnableLambda(add_cache_control) | model

mrdrprofuroboros · 2024-10-01T21:06:20Z

Hey @baskaryan, can we merge this branch? We can explicitly warn that this data is not standardized and one could use it on their own risk of deprecation

baskaryan · 2024-10-03T22:31:33Z

closing in favor of #27087

baskaryan · 2024-10-03T22:32:13Z

Hey @baskaryan, can we merge this branch? We can explicitly warn that this data is not standardized and one could use it on their own risk of deprecation

will have a standardized format out in #27087! should land and release later today

mrdrprofuroboros · 2024-10-04T12:40:03Z

That’s great news, thank you!

baskaryan · 2024-10-04T21:55:01Z

That’s great news, thank you!

out in langchain-anthropic ~~0.2.2~~

baskaryan · 2024-10-04T22:46:53Z

That’s great news, thank you!

out in langchain-anthropic 0.2.2

Sorry 0.2.3! Had to patch so that usage_metadata['input_tokens'] was a sum of all input tokens, including cache read and cache creation tokens

rfc: anthropic cache usage

130fe9e

efriis added the partner label Aug 22, 2024

efriis self-assigned this Aug 22, 2024

baskaryan requested a review from ccurme August 22, 2024 22:54

efriis approved these changes Aug 22, 2024

View reviewed changes

dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Aug 22, 2024

baskaryan added the 08 RFC Request for Comment. Solicit design input from other contributors label Aug 23, 2024

efriis assigned baskaryan and unassigned efriis Aug 30, 2024

mrdrprofuroboros added a commit to mrdrprofuroboros/langchain that referenced this pull request Oct 1, 2024

updated langchain-ai#25684

e794a58

mrdrprofuroboros mentioned this pull request Oct 2, 2024

Anthropic adjusted usage langfuse/langfuse-python#943

Closed

baskaryan closed this Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: anthropic cache usage #25684

rfc: anthropic cache usage #25684

baskaryan commented Aug 22, 2024 •

edited

Loading

vercel bot commented Aug 22, 2024

mrdrprofuroboros commented Aug 23, 2024

baskaryan commented Aug 23, 2024

mrdrprofuroboros commented Aug 23, 2024

huynhphuchuy commented Sep 1, 2024

mrdrprofuroboros commented Oct 1, 2024

mrdrprofuroboros commented Oct 1, 2024 •

edited

Loading

baskaryan commented Oct 3, 2024

baskaryan commented Oct 3, 2024

mrdrprofuroboros commented Oct 4, 2024

baskaryan commented Oct 4, 2024 •

edited

Loading

baskaryan commented Oct 4, 2024

rfc: anthropic cache usage #25684

rfc: anthropic cache usage #25684

Conversation

baskaryan commented Aug 22, 2024 • edited Loading

vercel bot commented Aug 22, 2024

mrdrprofuroboros commented Aug 23, 2024

baskaryan commented Aug 23, 2024

mrdrprofuroboros commented Aug 23, 2024

huynhphuchuy commented Sep 1, 2024

mrdrprofuroboros commented Oct 1, 2024

mrdrprofuroboros commented Oct 1, 2024 • edited Loading

baskaryan commented Oct 3, 2024

baskaryan commented Oct 3, 2024

mrdrprofuroboros commented Oct 4, 2024

baskaryan commented Oct 4, 2024 • edited Loading

baskaryan commented Oct 4, 2024

baskaryan commented Aug 22, 2024 •

edited

Loading

mrdrprofuroboros commented Oct 1, 2024 •

edited

Loading

baskaryan commented Oct 4, 2024 •

edited

Loading