Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: anthropic cache usage #25684

Closed
wants to merge 1 commit into from
Closed

Conversation

baskaryan
Copy link
Collaborator

@baskaryan baskaryan commented Aug 22, 2024

Alternative to #25644

What do we want to do with anthropic cache token counts and UsageMetadata?

  1. do nothing for now: UsageMetadata is meant to be standard across models and we dont know what other providers will do with caching. plus the info is there is AIMessage.response_metadata. main issue here is rn cached input tokens dont show up anywhere (neither in input_tokens nor total_tokens)
  2. just add cached input tokens to total_tokens, so at least there's some way to figure out that those tokens exist
  3. store cache token counts as their own keys on AIMessage.usage_metadata for anthropic outputs (current implementation in this pr). easy to implement and makes this info easily accessible to user, but will this lock us into to something we can't break in the future?
  4. add cache token counts to existing "input_tokens" key (anthropic's data.usage.input_tokens does not take into account cached tokens so this isn't double counting). not useful for estimating pricing but useful for general token counting
  5. add something like a "cost_adjusted_input_tokens" key that counts tokens in terms of costs (1.5 * cache_creation + 1 * input + 0.1 * cache_read). useful for estimating prices, but makes it a little harder to figure out when you've got cache hits/misses. is that important?

@efriis efriis added the partner label Aug 22, 2024
Copy link

vercel bot commented Aug 22, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Aug 22, 2024 10:48pm

@efriis efriis self-assigned this Aug 22, 2024
@baskaryan baskaryan requested a review from ccurme August 22, 2024 22:54
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Aug 22, 2024
@baskaryan baskaryan added the 08 RFC Request for Comment. Solicit design input from other contributors label Aug 23, 2024
@mrdrprofuroboros
Copy link

One more thing I'm facing right now is PropmtTemplate - I can't find a way to use it with Anthropic cache 'cause it wipes out the 'cache_control' attribute from the text block.

text, template_format=template_format

Here it just extracts text and throws all the rest away

Are there any ideas / workarounds?

@baskaryan
Copy link
Collaborator Author

One more thing I'm facing right now is PropmtTemplate - I can't find a way to use it with Anthropic cache 'cause it wipes out the 'cache_control' attribute from the text block.

text, template_format=template_format

Here it just extracts text and throws all the rest away
Are there any ideas / workarounds?

thinking of adding support for something like this, any thoughts? #25674

@mrdrprofuroboros
Copy link

lgtm
so far I just made a runnable lambda that injects cache_control after template compilation. this might be a bit cleaner for me since I substitute models and e.g. groq doesn't like that sort of dict messages and I have to strip them anyway
Does that make sense to control in templates? some param to either keep dicts or flat everything down

@efriis efriis assigned baskaryan and unassigned efriis Aug 30, 2024
@huynhphuchuy
Copy link

lgtm so far I just made a runnable lambda that injects cache_control after template compilation. this might be a bit cleaner for me since I substitute models and e.g. groq doesn't like that sort of dict messages and I have to strip them anyway Does that make sense to control in templates? some param to either keep dicts or flat everything down

Hi @mrdrprofuroboros , could you share your approach with PromptTemplate? thanks :((

@mrdrprofuroboros
Copy link

Oh, sorry, I missed your message
sure, here's what I came up with, rather hacky, but works for me

def to_cached(content: str | List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    if isinstance(content, str):
        return [{
            "type": "text",
            "text": content,
            "cache_control": {"type": "ephemeral"},
        }]
    if isinstance(content[-1], str):
        content[-1] = {
            "type": "text",
            "text": content[-1],
            "cache_control": {"type": "ephemeral"},
        }
    else:
        content[-1]["cache_control"] = {"type": "ephemeral"}
    return content


def add_cache_control(compiled_chat: ChatPromptValue) -> ChatPromptValue:
    """
    Anthropic supports maximum 4 blocks with cache_control so we'll set
    - 1 on tools
    - 1 on the first and last system block
    - 1 on the last even 5th message
    """
    system_messages = []
    other_messages = []
    
    for message in compiled_chat.messages:
        if isinstance(message, SystemMessage):
            if not system_messages:
                system_messages.append(SystemMessage(to_cached(message.content)))
            else:
                system_messages[0].content.append(message.content)
        else:
            other_messages.append(deepcopy(message))

    if system_messages:
        system_messages[0].content = to_cached(system_messages[0].content)

    messages = system_messages + other_messages
    last_cached = (len(messages) - 2) // 5 * 5 # never set on the last message since it's constantly changing
    messages[last_cached].content = to_cached(messages[last_cached].content)
    return ChatPromptValue(messages=messages)
    

### and then

if isinstance(model, ChatAnthropic):
    model = RunnableLambda(add_cache_control) | model

@mrdrprofuroboros
Copy link

mrdrprofuroboros commented Oct 1, 2024

Hey @baskaryan, can we merge this branch? We can explicitly warn that this data is not standardized and one could use it on their own risk of deprecation

mrdrprofuroboros added a commit to mrdrprofuroboros/langchain that referenced this pull request Oct 1, 2024
@baskaryan
Copy link
Collaborator Author

closing in favor of #27087

@baskaryan baskaryan closed this Oct 3, 2024
@baskaryan
Copy link
Collaborator Author

Hey @baskaryan, can we merge this branch? We can explicitly warn that this data is not standardized and one could use it on their own risk of deprecation

will have a standardized format out in #27087! should land and release later today

@mrdrprofuroboros
Copy link

That’s great news, thank you!

@baskaryan
Copy link
Collaborator Author

baskaryan commented Oct 4, 2024

That’s great news, thank you!

out in langchain-anthropic 0.2.2

@baskaryan
Copy link
Collaborator Author

That’s great news, thank you!

out in langchain-anthropic 0.2.2

Sorry 0.2.3! Had to patch so that usage_metadata['input_tokens'] was a sum of all input tokens, including cache read and cache creation tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
08 RFC Request for Comment. Solicit design input from other contributors lgtm PR looks good. Use to confirm that a PR is ready for merging. partner
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants