-
Notifications
You must be signed in to change notification settings - Fork 16.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
community[minor]: Add UpstashRatelimitHandler #21885
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
ab4dd42
to
13fd6c7
Compare
13fd6c7
to
821f867
Compare
3451595
to
b66a339
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good overall, some minor comments only
libs/community/langchain_community/callbacks/upstash_ratelimit_callback.py
Show resolved
Hide resolved
Example: | ||
.. code-block:: python | ||
|
||
from upstash_redis import Redis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing indentation code block example
@@ -72,6 +72,10 @@ | |||
from langchain_community.callbacks.trubrics_callback import ( | |||
TrubricsCallbackHandler, | |||
) | |||
from langchain_community.callbacks.upstash_ratelimit_callback import ( | |||
UpstashRatelimitError, # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove the F401 please to match the other callbacks?
After thinking a bit more about this -- I am not sure that this is a good design for rate limiting. Why is it implemented via a callback handler? It would ideally just be a part in the chain that can wait until it can issue a request? |
I wanted to use callbacks because I felt like it would make adding request or token based ratelimiting very easy. I guess something like this would work for request based rate limiting: # request based
request_limiter = UpstashRatelimit("ip")
other_step = RunnableLambda(str)
chain = request_limiter | other_step
chain.invoke() But I think token based would be more complex. We would need a step before LLM starts to stop the chain and another step after the LLM to count the tokens. Or somehow wrap the model step to do both but I don't know if this is possible in LangChain # token based
other_step = RunnableLambda(str)
model = ChatOpenAI()
model_with_ratelimit = UpstashRatelimit("ip", model=model)
chain = other_step | model_with_ratelimit
chain.invoke() |
libs/community/poetry.lock
Outdated
@@ -1,4 +1,4 @@ | |||
# This file is automatically @generated by Poetry 1.7.1 and should not be changed by hand. | |||
# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you undo the changes in the lock file?
We generally don't want to assume that callbacks must be blocking for execution. What use case is this callback handler helping to solve given that it's raising an exception? Is the goal to apply different (lower) rate limits on a given deployment then the ones specified by the model provider? |
Yes, with the callback, it's becomes possible to allow n number of requests from an ip address or some user per minute/hour/day. It's also possible to rate limit based on the number of tokens. |
I suspect a better design would be to create a chat model wrapper, potentially a bit more work for the user, but won't have any unexpected issues associated with the callback not being blocking @CahidArda Anyway, let me know if you'd still like to merge -- if so could you remove the changes from the lock file? (i assume they're unnecessary for this PR?) |
Hi @eyurtsev, I think we can go ahead with callback if it's okay. As for the lockfile, I have tried to remove it but when I remove it linter gets an error saying that the lock file is not compatible with the toml file. If I remove the changes in the toml file, tests get an error saying that upstash_ratelimit was not found. So I added upstash_ratelimit and bumped upstash_redis version while I am at it. |
Hi @eyurtsev, Have you had the chance to review the changes? |
Hi again @eyurtsev, JavaScript version of this PR was merged recently. Have you had a chance to review the latest changes in this PR? 😄 |
@CahidArda apologies was on vacation until yesterday! merging |
libs/community/tests/unit_tests/callbacks/test_upstash_ratelimit_callback.py
Outdated
Show resolved
Hide resolved
libs/community/langchain_community/callbacks/upstash_ratelimit_callback.py
Outdated
Show resolved
Hide resolved
@CahidArda could you address the side-effects for the optional imports and we can merge then? |
Deployment failed with the following error:
|
Hope you had a great holiday! 🌴 I fixed the side effects. |
Taking over to resolve merge conflicts |
Adding `UpstashRatelimitHandler` callback for rate limiting based on number of chain invocations or LLM token usage. For more details, see [upstash/ratelimit-py repository](https://github.com/upstash/ratelimit-py) or the notebook guide included in this PR. Twitter handle: @CahidArda --------- Co-authored-by: Eugene Yurtsev <[email protected]>
Adding
UpstashRatelimitHandler
callback for rate limiting based on number of chain invocations or LLM token usage.For more details, see upstash/ratelimit-py repository or the notebook guide included in this PR.
Twitter handle: @CahidArda