-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add safeguard on tokens returned by functions #576
Conversation
@@ -518,7 +518,8 @@ def handle_ai_response(self, response_message): | |||
self.interface.function_message(f"Running {function_name}({function_args})") | |||
try: | |||
function_args["self"] = self # need to attach self to arg since it's dynamically linked | |||
function_response_string = function_to_call(**function_args) | |||
function_response = function_to_call(**function_args) | |||
function_response_string = validate_function_response(function_response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are now validating function outputs before putting them into a function response message (in agent.messages
)
@@ -712,7 +713,13 @@ def summarize_messages_inplace(self, cutoff=None, preserve_last_N_messages=True) | |||
pass | |||
|
|||
message_sequence_to_summarize = self.messages[1:cutoff] # do NOT get rid of the system message | |||
printd(f"Attempting to summarize {len(message_sequence_to_summarize)} messages [1:{cutoff}] of {len(self.messages)}") | |||
if len(message_sequence_to_summarize) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an additional catch to prevent infinite summarization of a single message (I noticed this sort of loop happening when I ran the http_request
overflow example).
@@ -55,6 +55,9 @@ | |||
CORE_MEMORY_PERSONA_CHAR_LIMIT = 2000 | |||
CORE_MEMORY_HUMAN_CHAR_LIMIT = 2000 | |||
|
|||
# Function return limits | |||
FUNCTION_RETURN_CHAR_LIMIT = 2000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were a few ways to cap the message output (could compute based on some fraction % of agent.context_window
), but I figured might as well do something simpler like set it to a max char length similar since we're already doing this for human/persona. There's a chance the math can still go wrong ("go wrong" = enter some sort of "summarize deadlock" with a 2000 long truncated http request result) if the context window is too low, but with 8k I don't think we should have this problem. Also we make this assumption for persona/human anyways and it's working fine.
raise ValueError(function_response_string) | ||
|
||
else: | ||
if strict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many functions (user-made etc) will return non-strings like dicts, so we should leave strict=False
and attempt to cast as a string instead of failing.
Tested after adding more informative truncation method:
|
* updated local APIs to return usage info (#585) * updated APIs to return usage info * tested all endpoints * added autogen as an extra (#616) * added autogen as an extra * updated docs Co-authored-by: hemanthsavasere <[email protected]> * Update LICENSE * Add safeguard on tokens returned by functions (#576) * swapping out hardcoded str for prefix (forgot to include in #569) * add extra failout when the summarizer tries to run on a single message * added function response validation code, currently will truncate responses based on character count * added return type hints (functions/tools should either return strings or None) * discuss function output length in custom function section * made the truncation more informative * patch bug where None.copy() throws runtime error (#617) * allow passing custom host to uvicorn (#618) * feat: initial poc for socket server * feat: initial poc for frontend based on react Set up an nx workspace which maks it easy to manage dependencies and added shadcn components that allow us to build good-looking ui in a fairly simple way. UI is a very simple and basic chat that starts with a message of the user and then simply displays the answer string that is sent back from the fastapi ws endpoint * feat: mapp arguments to json and return new messages Except for the previous user message we return all newly generated messages and let the frontend figure out how to display them. * feat: display messages based on role and show inner thoughts and connection status * chore: build newest frontend * feat(frontend): show loader while waiting for first message and disable send button until connection is open * feat: make agent send the first message and loop similar to CLI currently the CLI loops until the correct function call sends a message to the user. this is an initial try to achieve a similar behavior in the socket server * chore: build new version of frontend * fix: rename lib directory so it is not excluded as part of python gitignore * chore: rebuild frontend app * fix: save agent at end of each response to allow the conversation to carry on over multiple sessions * feat: restructure server to support multiple endpoints and add agents and sources endpoint * feat: setup frontend routing and settings page * chore: build frontend * feat: another iteration of web interface changes include: websocket for chat. switching between different agents. introduction of zustand state management * feat: adjust frontend to work with memgpt rest-api * feat: adjust existing rest_api to serve and interact with frontend * feat: build latest frontend * chore: build latest frontend * fix: cleanup workspace --------- Co-authored-by: Charles Packer <[email protected]> Co-authored-by: hemanthsavasere <[email protected]>
* swapping out hardcoded str for prefix (forgot to include in #569) * add extra failout when the summarizer tries to run on a single message * added function response validation code, currently will truncate responses based on character count * added return type hints (functions/tools should either return strings or None) * discuss function output length in custom function section * made the truncation more informative
* updated local APIs to return usage info (letta-ai#585) * updated APIs to return usage info * tested all endpoints * added autogen as an extra (letta-ai#616) * added autogen as an extra * updated docs Co-authored-by: hemanthsavasere <[email protected]> * Update LICENSE * Add safeguard on tokens returned by functions (letta-ai#576) * swapping out hardcoded str for prefix (forgot to include in letta-ai#569) * add extra failout when the summarizer tries to run on a single message * added function response validation code, currently will truncate responses based on character count * added return type hints (functions/tools should either return strings or None) * discuss function output length in custom function section * made the truncation more informative * patch bug where None.copy() throws runtime error (letta-ai#617) * allow passing custom host to uvicorn (letta-ai#618) * feat: initial poc for socket server * feat: initial poc for frontend based on react Set up an nx workspace which maks it easy to manage dependencies and added shadcn components that allow us to build good-looking ui in a fairly simple way. UI is a very simple and basic chat that starts with a message of the user and then simply displays the answer string that is sent back from the fastapi ws endpoint * feat: mapp arguments to json and return new messages Except for the previous user message we return all newly generated messages and let the frontend figure out how to display them. * feat: display messages based on role and show inner thoughts and connection status * chore: build newest frontend * feat(frontend): show loader while waiting for first message and disable send button until connection is open * feat: make agent send the first message and loop similar to CLI currently the CLI loops until the correct function call sends a message to the user. this is an initial try to achieve a similar behavior in the socket server * chore: build new version of frontend * fix: rename lib directory so it is not excluded as part of python gitignore * chore: rebuild frontend app * fix: save agent at end of each response to allow the conversation to carry on over multiple sessions * feat: restructure server to support multiple endpoints and add agents and sources endpoint * feat: setup frontend routing and settings page * chore: build frontend * feat: another iteration of web interface changes include: websocket for chat. switching between different agents. introduction of zustand state management * feat: adjust frontend to work with memgpt rest-api * feat: adjust existing rest_api to serve and interact with frontend * feat: build latest frontend * chore: build latest frontend * fix: cleanup workspace --------- Co-authored-by: Charles Packer <[email protected]> Co-authored-by: hemanthsavasere <[email protected]>
* swapping out hardcoded str for prefix (forgot to include in letta-ai#569) * add extra failout when the summarizer tries to run on a single message * added function response validation code, currently will truncate responses based on character count * added return type hints (functions/tools should either return strings or None) * discuss function output length in custom function section * made the truncation more informative
* updated local APIs to return usage info (letta-ai#585) * updated APIs to return usage info * tested all endpoints * added autogen as an extra (letta-ai#616) * added autogen as an extra * updated docs Co-authored-by: hemanthsavasere <[email protected]> * Update LICENSE * Add safeguard on tokens returned by functions (letta-ai#576) * swapping out hardcoded str for prefix (forgot to include in letta-ai#569) * add extra failout when the summarizer tries to run on a single message * added function response validation code, currently will truncate responses based on character count * added return type hints (functions/tools should either return strings or None) * discuss function output length in custom function section * made the truncation more informative * patch bug where None.copy() throws runtime error (letta-ai#617) * allow passing custom host to uvicorn (letta-ai#618) * feat: initial poc for socket server * feat: initial poc for frontend based on react Set up an nx workspace which maks it easy to manage dependencies and added shadcn components that allow us to build good-looking ui in a fairly simple way. UI is a very simple and basic chat that starts with a message of the user and then simply displays the answer string that is sent back from the fastapi ws endpoint * feat: mapp arguments to json and return new messages Except for the previous user message we return all newly generated messages and let the frontend figure out how to display them. * feat: display messages based on role and show inner thoughts and connection status * chore: build newest frontend * feat(frontend): show loader while waiting for first message and disable send button until connection is open * feat: make agent send the first message and loop similar to CLI currently the CLI loops until the correct function call sends a message to the user. this is an initial try to achieve a similar behavior in the socket server * chore: build new version of frontend * fix: rename lib directory so it is not excluded as part of python gitignore * chore: rebuild frontend app * fix: save agent at end of each response to allow the conversation to carry on over multiple sessions * feat: restructure server to support multiple endpoints and add agents and sources endpoint * feat: setup frontend routing and settings page * chore: build frontend * feat: another iteration of web interface changes include: websocket for chat. switching between different agents. introduction of zustand state management * feat: adjust frontend to work with memgpt rest-api * feat: adjust existing rest_api to serve and interact with frontend * feat: build latest frontend * chore: build latest frontend * fix: cleanup workspace --------- Co-authored-by: Charles Packer <[email protected]> Co-authored-by: hemanthsavasere <[email protected]>
* swapping out hardcoded str for prefix (forgot to include in #569) * add extra failout when the summarizer tries to run on a single message * added function response validation code, currently will truncate responses based on character count * added return type hints (functions/tools should either return strings or None) * discuss function output length in custom function section * made the truncation more informative
* updated local APIs to return usage info (#585) * updated APIs to return usage info * tested all endpoints * added autogen as an extra (#616) * added autogen as an extra * updated docs Co-authored-by: hemanthsavasere <[email protected]> * Update LICENSE * Add safeguard on tokens returned by functions (#576) * swapping out hardcoded str for prefix (forgot to include in #569) * add extra failout when the summarizer tries to run on a single message * added function response validation code, currently will truncate responses based on character count * added return type hints (functions/tools should either return strings or None) * discuss function output length in custom function section * made the truncation more informative * patch bug where None.copy() throws runtime error (#617) * allow passing custom host to uvicorn (#618) * feat: initial poc for socket server * feat: initial poc for frontend based on react Set up an nx workspace which maks it easy to manage dependencies and added shadcn components that allow us to build good-looking ui in a fairly simple way. UI is a very simple and basic chat that starts with a message of the user and then simply displays the answer string that is sent back from the fastapi ws endpoint * feat: mapp arguments to json and return new messages Except for the previous user message we return all newly generated messages and let the frontend figure out how to display them. * feat: display messages based on role and show inner thoughts and connection status * chore: build newest frontend * feat(frontend): show loader while waiting for first message and disable send button until connection is open * feat: make agent send the first message and loop similar to CLI currently the CLI loops until the correct function call sends a message to the user. this is an initial try to achieve a similar behavior in the socket server * chore: build new version of frontend * fix: rename lib directory so it is not excluded as part of python gitignore * chore: rebuild frontend app * fix: save agent at end of each response to allow the conversation to carry on over multiple sessions * feat: restructure server to support multiple endpoints and add agents and sources endpoint * feat: setup frontend routing and settings page * chore: build frontend * feat: another iteration of web interface changes include: websocket for chat. switching between different agents. introduction of zustand state management * feat: adjust frontend to work with memgpt rest-api * feat: adjust existing rest_api to serve and interact with frontend * feat: build latest frontend * chore: build latest frontend * fix: cleanup workspace --------- Co-authored-by: Charles Packer <[email protected]> Co-authored-by: hemanthsavasere <[email protected]>
Closes #553
Please describe the purpose of this pull request
We currently do not check the size of function responses to make sure they are under X tokens.
This is not a big deal for the base function set, but it a problem for custom functions (eg the HTTP request in extras) which can return huge responses. This will trigger a summarization lock event and seems to have the potential to "brick" agents by nuking their message state.
How to test
Modify
http_request
to return an arbitrarily long response:Side note: if you use something like
" ".join(["TEST"] * 5000)
you'll get a 400 error from OpenAI because it identifies "TEST" as a specific bad request inputDemonstrating the bug
On
main
, if you override a function call like the HTTP requests call to return an arbitrary large text file, it'll brick the agent by causing a summarize chain that cannot complete:Demonstrating the bug fix (this PR)
With this fix, you'll instead get a warning:
Have you tested this PR?
Yes, see above.