Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community[minor]: Add UpstashRatelimitHandler #21885

Merged
merged 22 commits into from
Jun 7, 2024

Conversation

CahidArda
Copy link
Contributor

Adding UpstashRatelimitHandler callback for rate limiting based on number of chain invocations or LLM token usage.

For more details, see upstash/ratelimit-py repository or the notebook guide included in this PR.

Twitter handle: @CahidArda

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label May 19, 2024
Copy link

vercel bot commented May 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 7, 2024 9:02pm

@dosubot dosubot bot added the 🤖:improvement Medium size change to existing code to handle new use-cases label May 19, 2024
@CahidArda CahidArda force-pushed the upstash-callback branch 2 times, most recently from ab4dd42 to 13fd6c7 Compare May 19, 2024 19:13
Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall, some minor comments only

Example:
.. code-block:: python

from upstash_redis import Redis
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing indentation code block example

@@ -72,6 +72,10 @@
from langchain_community.callbacks.trubrics_callback import (
TrubricsCallbackHandler,
)
from langchain_community.callbacks.upstash_ratelimit_callback import (
UpstashRatelimitError, # noqa: F401
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove the F401 please to match the other callbacks?

@eyurtsev
Copy link
Collaborator

After thinking a bit more about this -- I am not sure that this is a good design for rate limiting.

Why is it implemented via a callback handler? It would ideally just be a part in the chain that can wait until it can issue a request?

@CahidArda
Copy link
Contributor Author

After thinking a bit more about this -- I am not sure that this is a good design for rate limiting.

Why is it implemented via a callback handler? It would ideally just be a part in the chain that can wait until it can issue a request?

I wanted to use callbacks because I felt like it would make adding request or token based ratelimiting very easy.

I guess something like this would work for request based rate limiting:

# request based
request_limiter = UpstashRatelimit("ip")
other_step = RunnableLambda(str)

chain = request_limiter | other_step
chain.invoke()

But I think token based would be more complex. We would need a step before LLM starts to stop the chain and another step after the LLM to count the tokens. Or somehow wrap the model step to do both but I don't know if this is possible in LangChain

# token based
other_step = RunnableLambda(str)
model = ChatOpenAI()
model_with_ratelimit = UpstashRatelimit("ip", model=model)

chain = other_step | model_with_ratelimit
chain.invoke()

@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.7.1 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you undo the changes in the lock file?

@eyurtsev
Copy link
Collaborator

We generally don't want to assume that callbacks must be blocking for execution.

What use case is this callback handler helping to solve given that it's raising an exception?

Is the goal to apply different (lower) rate limits on a given deployment then the ones specified by the model provider?

@CahidArda
Copy link
Contributor Author

We generally don't want to assume that callbacks must be blocking for execution.

What use case is this callback handler helping to solve given that it's raising an exception?

Is the goal to apply different (lower) rate limits on a given deployment then the ones specified by the model provider?

Yes, with the callback, it's becomes possible to allow n number of requests from an ip address or some user per minute/hour/day. It's also possible to rate limit based on the number of tokens.

@eyurtsev
Copy link
Collaborator

eyurtsev commented May 23, 2024

I suspect a better design would be to create a chat model wrapper, potentially a bit more work for the user, but won't have any unexpected issues associated with the callback not being blocking

@CahidArda Anyway, let me know if you'd still like to merge -- if so could you remove the changes from the lock file? (i assume they're unnecessary for this PR?)

@eyurtsev eyurtsev added the waiting-on-author PR Status: Confirmation from author is required label May 23, 2024
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label May 23, 2024
@CahidArda
Copy link
Contributor Author

Hi @eyurtsev,

I think we can go ahead with callback if it's okay.

As for the lockfile, I have tried to remove it but when I remove it linter gets an error saying that the lock file is not compatible with the toml file. If I remove the changes in the toml file, tests get an error saying that upstash_ratelimit was not found. So I added upstash_ratelimit and bumped upstash_redis version while I am at it.

@CahidArda
Copy link
Contributor Author

Hi @eyurtsev,

Have you had the chance to review the changes?

@CahidArda
Copy link
Contributor Author

Hi again @eyurtsev,

JavaScript version of this PR was merged recently. Have you had a chance to review the latest changes in this PR? 😄

@eyurtsev eyurtsev changed the title community: Add UpstashRatelimitHandler community[minor]: Add UpstashRatelimitHandler Jun 5, 2024
@eyurtsev eyurtsev enabled auto-merge (squash) June 5, 2024 15:39
@eyurtsev
Copy link
Collaborator

eyurtsev commented Jun 5, 2024

@CahidArda apologies was on vacation until yesterday! merging

@eyurtsev eyurtsev disabled auto-merge June 5, 2024 15:53
@eyurtsev
Copy link
Collaborator

eyurtsev commented Jun 5, 2024

@CahidArda could you address the side-effects for the optional imports and we can merge then?

Copy link

vercel bot commented Jun 6, 2024

Deployment failed with the following error:

The provided GitHub repository does not contain the requested branch or commit reference. Please ensure the repository is not empty.

@CahidArda
Copy link
Contributor Author

Hope you had a great holiday! 🌴

I fixed the side effects.

@eyurtsev
Copy link
Collaborator

eyurtsev commented Jun 7, 2024

Taking over to resolve merge conflicts

@eyurtsev eyurtsev enabled auto-merge (squash) June 7, 2024 20:54
@eyurtsev eyurtsev merged commit 6c07eb0 into langchain-ai:master Jun 7, 2024
44 checks passed
hinthornw pushed a commit that referenced this pull request Jun 20, 2024
Adding `UpstashRatelimitHandler` callback for rate limiting based on
number of chain invocations or LLM token usage.

For more details, see [upstash/ratelimit-py
repository](https://github.com/upstash/ratelimit-py) or the notebook
guide included in this PR.

Twitter handle: @CahidArda

---------

Co-authored-by: Eugene Yurtsev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:improvement Medium size change to existing code to handle new use-cases lgtm PR looks good. Use to confirm that a PR is ready for merging. size:XL This PR changes 500-999 lines, ignoring generated files. waiting-on-author PR Status: Confirmation from author is required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants