Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community: gather token usage info in BedrockChat during generation #19127

Merged

Conversation

dmenini
Copy link
Contributor

@dmenini dmenini commented Mar 15, 2024

This PR allows to calculate token usage for prompts and completion directly in the generation method of BedrockChat. The token usage details are then returned together with the generations, so that other downstream tasks can access them easily.

This allows to define a callback for tokens tracking and cost calculation, similarly to what happens with OpenAI (see OpenAICallbackHandler. I plan on adding a BedrockCallbackHandler later.
Right now keeping track of tokens in the callback is already possible, but it requires passing the llm, as done here: https://how.wtf/how-to-count-amazon-bedrock-anthropic-tokens-with-langchain.html. However, I find the approach of this PR cleaner.

Thanks for your reviews. FYI @baskaryan, @hwchase17

Copy link

vercel bot commented Mar 15, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Mar 28, 2024 6:50pm

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. Ɑ: models Related to LLMs or chat model modules 🔌: anthropic Primarily related to Anthropic integrations 🤖:improvement Medium size change to existing code to handle new use-cases labels Mar 15, 2024
@dmenini dmenini changed the title Improvement (community): gather token usage info in BedrockChat during generation Community: gather token usage info in BedrockChat during generation Mar 15, 2024
@esoler-sage
Copy link
Contributor

Hey! For Bedrock you may want to use the tokens returned in the headers "x-amzn-bedrock-*", like

  • x-amzn-bedrock-output-token-count
  • x-amzn-bedrock-input-token-count
    instead of using anthropic only tokenizer 🤔?

@dmenini
Copy link
Contributor Author

dmenini commented Mar 20, 2024

Hey! For Bedrock you may want to use the tokens returned in the headers "x-amzn-bedrock-*", like

  • x-amzn-bedrock-output-token-count
  • x-amzn-bedrock-input-token-count
    instead of using anthropic only tokenizer 🤔?

Thanks for this comment! I didn't know about those headers. It definitely makes sense and I'll rework my implementation accordingly

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Mar 21, 2024
@dmenini
Copy link
Contributor Author

dmenini commented Mar 21, 2024

The PR is updated! Now token counters are read from the headers in case of simple generation: The resulting LLMOutput will have the fields model_id and usage, containing the token counters.

NB In case of streaming, usage will be an empty dict as extracting the tokens is more complicated. Indeed, the headers are not available, and one would have to extract the token counters from the body, according to the model provider structure. Will do it at a later time :)

@baskaryan baskaryan added 🔌: aws Primarily related to Amazon Web Services (AWS) integrations and removed 🔌: anthropic Primarily related to Anthropic integrations labels Mar 26, 2024
@baskaryan
Copy link
Collaborator

cc @3coins

),
"completion_tokens": int(
headers.get("x-amzn-bedrock-output-token-count", 0)
),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I was looking to achieve the same thing to enable cost monitoring on the Anthropic Bedrock models. Should we also add total_tokens (sum of prompt_tokens + completion_tokens), to keep it compatible with the OpenAI model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! I added the total_count

Copy link

@pratik60 pratik60 Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! You have linting errors, fyi.

Hopefully one of the project maintainers has a chance to look at this, and get it merged soon!

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Mar 28, 2024
@baskaryan baskaryan enabled auto-merge (squash) March 28, 2024 18:51
@baskaryan baskaryan merged commit f704232 into langchain-ai:master Mar 28, 2024
59 checks passed
gkorland pushed a commit to FalkorDB/langchain that referenced this pull request Mar 30, 2024
…ation (langchain-ai#19127)

This PR allows to calculate token usage for prompts and completion
directly in the generation method of BedrockChat. The token usage
details are then returned together with the generations, so that other
downstream tasks can access them easily.

This allows to define a callback for tokens tracking and cost
calculation, similarly to what happens with OpenAI (see
[OpenAICallbackHandler](https://api.python.langchain.com/en/latest/_modules/langchain_community/callbacks/openai_info.html#OpenAICallbackHandler).
I plan on adding a BedrockCallbackHandler later.
Right now keeping track of tokens in the callback is already possible,
but it requires passing the llm, as done here:
https://how.wtf/how-to-count-amazon-bedrock-anthropic-tokens-with-langchain.html.
However, I find the approach of this PR cleaner.

Thanks for your reviews. FYI @baskaryan, @hwchase17

---------

Co-authored-by: taamedag <[email protected]>
Co-authored-by: Bagatur <[email protected]>
@Sukitly
Copy link
Contributor

Sukitly commented Apr 1, 2024

Excellent work!
I'm curious about the differences between the use of "usage" in the response body versus in the headers. According to the Bedrock API documentation, the "usage" is included in the response body as follows:

{
    "id": string,
    "model": string,
    "type": "message",
    "role": "assistant",
    "content": [
        {
            "type": "text",
            "text": string
        }
    ],
    "stop_reason": string,
    "stop_sequence": string,
    "usage": {
        "input_tokens": integer,
        "output_tokens": integer
    }
    
}

Do you guys have any insights on this? @dmenini @esoler-sage @pratik60
Thanks!

@esoler-sage
Copy link
Contributor

esoler-sage commented Apr 1, 2024

I think usage in headers is for sync calls, and async call' last chunk contains usage in body 🤔

Update: Also, the docs you were pointing are using the new messages api from Anthropic, only available on (some) Anthropic models

hinthornw pushed a commit that referenced this pull request Apr 26, 2024
…ation (#19127)

This PR allows to calculate token usage for prompts and completion
directly in the generation method of BedrockChat. The token usage
details are then returned together with the generations, so that other
downstream tasks can access them easily.

This allows to define a callback for tokens tracking and cost
calculation, similarly to what happens with OpenAI (see
[OpenAICallbackHandler](https://api.python.langchain.com/en/latest/_modules/langchain_community/callbacks/openai_info.html#OpenAICallbackHandler).
I plan on adding a BedrockCallbackHandler later.
Right now keeping track of tokens in the callback is already possible,
but it requires passing the llm, as done here:
https://how.wtf/how-to-count-amazon-bedrock-anthropic-tokens-with-langchain.html.
However, I find the approach of this PR cleaner.

Thanks for your reviews. FYI @baskaryan, @hwchase17

---------

Co-authored-by: taamedag <[email protected]>
Co-authored-by: Bagatur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔌: aws Primarily related to Amazon Web Services (AWS) integrations 🤖:improvement Medium size change to existing code to handle new use-cases lgtm PR looks good. Use to confirm that a PR is ready for merging. Ɑ: models Related to LLMs or chat model modules size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants