Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating model to gpt-4o-mini #21

Closed
wants to merge 35 commits into from
Closed

Updating model to gpt-4o-mini #21

wants to merge 35 commits into from

Conversation

bilalba
Copy link

@bilalba bilalba commented Jul 29, 2024

Inspired by https://github.com/Acters/humanify-ng/tree/GPT4o, I decided to make a PR after testing some of the changes from there.

The openai sdk is updated to version 4.52.7 addressing #19

gpt-tokenizer is used instead of gpt3-encoder addressing #5

I am seeking to address #4 with a good enough solution of using gpt-4o-mini with the assumption that it would be intelligent enough.

I have updated the model to gpt-4o-mini and removed the option to use 4k context, since gpt-4o-mini is already cheap enough and is not constrained by context length.

However, I have still capped the context at 32k. In my testing, Using a larger context would not result in enough variable name replacements. I.e. it would only change a few function and variable names.

@Acters
Copy link

Acters commented Jul 30, 2024

https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/

I just noticed this too.
I also noticed you decreased the number of tokens to 32k. how does this change affect it? is the 128k limit too much even with the soft cap reducing it to be at 1/4 the amount of tokens? Or is it because the output tokens are at 16k for the 4o-mini versions?

I also was making some changes to the prompt, the function name and descriptions to include the parameters for renaming as I noticed it doesn't change parameter names, which I think is because it is not the same as a "variable" in the context for the AI.

@0xdevalias
Copy link

0xdevalias commented Jul 31, 2024

I also noticed you decreased the number of tokens to 32k. how does this change affect it? is the 128k limit too much even with the soft cap reducing it to be at 1/4 the amount of tokens? Or is it because the output tokens are at 16k for the 4o-mini versions?

@Acters The last paragraph of the PR description seems to already address this?:

However, I have still capped the context at 32k. In my testing, Using a larger context would not result in enough variable name replacements. I.e. it would only change a few function and variable names.

@Acters
Copy link

Acters commented Jul 31, 2024

I also noticed you decreased the number of tokens to 32k. how does this change affect it? is the 128k limit too much even with the soft cap reducing it to be at 1/4 the amount of tokens? Or is it because the output tokens are at 16k for the 4o-mini versions?

@Acters The last paragraph of the PR description seems to already address this?:

However, I have still capped the context at 32k. In my testing, Using a larger context would not result in enough variable name replacements. I.e. it would only change a few function and variable names.

// TODO: Check how much tokens we usually need for response

This is something I saw and was also wondering about. I improperly expressed myself, and caused some misunderstanding.

I do have it return all things renamed, so I also assumed for him to be wrong. only sometimes will it return partially renamed JS as he mentioned.
Running it multiple times gives different results but if it is more consistent, then it is OK. However, I worry that the amount of input is not enough in larger code samples for which the AI will not get a clear idea about a variable or function, unless a certain amount of context is given. with 32k context, this might be a fine trade off.

Note that JSON mode is always enabled when the model is generating arguments as part of function calling.

I was wondering the reason why gpt would be limiting the amount changed. I am guessing this is because the json that the function returns is part of the token amount? which is odd to me because I was expecting the names were the cost and the overhead on constructing the json format was just something that was not touched by the AI. I guess I am mistaken.

Also, I was wondering why we are bundling both functions renaming and variables renaming into one OpenAI function call.
Are we sure that we do not wish to make use of the Parallel Function Calling feature to split the renaming to three separate calls for function names, variable names, and parameter names?(as currently we don't have parameters properly being renamed)

or if we want to be fancy, we can look into how OpenAI's Assistant API to make use of it more properly by building up the thread with the JS code/files, then execute the runs. https://platform.openai.com/docs/api-reference/runs
I see it as a way to fill the larger input context of 128k for larger code bases, and likely improve the renames. The "Code Interpreter" sounds interesting, too.

@bilalba
Copy link
Author

bilalba commented Jul 31, 2024

@Acters Did you see it return all the variable names replaced when you ran it with 128k context? I only saw a few. I presume that it only tackles a few problems every query, despite having all the context and output tokens.

I do like your idea about using threads to fill the context and querying it multiple times, so it gets a chance to 'think' and address all the replacements.

I also really like your idea on Parallel Function Calling. I imagine it would serve as a significant improvement.

Would you like to add those features as a part of another PR? I believe all of these three things are separate enhancements that can go on top of each other.

@jehna Would appreciate your input as the owner and maintainer of this repository.

@Acters
Copy link

Acters commented Jul 31, 2024

@Acters Did you see it return all the variable names replaced when you ran it with 128k context? I only saw a few. I presume that it only tackles a few problems every query, despite having all the context and output tokens.

I do like your idea about using threads to fill the context and querying it multiple times, so it gets a chance to 'think' and address all the replacements.

I also really like your idea on Parallel Function Calling. I imagine it would serve as a significant improvement.

Would you like to add those features as a part of another PR? I believe all of these three things are separate enhancements that can go on top of each other.

@jehna Would appreciate your input as the owner and maintainer of this repository.

I will like to note that JSON mode was not enforced on function calling, Which this new API package version should fix the JSON format problems. (#12 #17 and #22 )

While this is a stepping stone for fixing some of the issues mentioned, I'm seeing my above mentioned concerns as more of something that is part of the #4 enhancement, since when this project(and tracked issues) were made, these new features were clearly not available. So I decided to start transforming my own fork and trying out some novel ideas in private. For now the changes I made that you transported here seem fine. I doubt the 32k context window is a large enough concern especially since it is an improvement already from when the project was made with the 16k and 4k context limits. I do want to implement the parallel function calling before refactoring into the full featured assistant API features from the basic chat completion features.

Likely would be great improvement to give the entire file/code at once and will reduce the amount of times the ChatGPT will return the reduced amount of chat completions. Also, it should be possible that the parallel feature will come in handy to reduce the complex nature of handling multiple tasks: renaming variable, functions, and adding parameters into the mix. (As currently, Humanify doesn't support parameter renaming)

@0xdevalias
Copy link

0xdevalias commented Aug 1, 2024

Would you like to add those features as a part of another PR? I believe all of these three things are separate enhancements that can go on top of each other.

@bilalba @Acters While including them in a new PR is probably a good idea, updating the OpenAI SDK version is likely going to be a blocker on all of them:

The openai sdk is updated to version 4.52.7 addressing #19

I wonder if those 'base SDK upgrade' aspects of this PR could be separated off, such that they can be tested to 'continue working as expected', and get merged sooner rather than later.

Then the larger changes around gpt-4o-mini, json mode, parallel function calling, etc could be handled in separate/independent PR's as needed; without being blocked on the outdated SDK version.

@Acters
Copy link

Acters commented Aug 6, 2024

Would you like to add those features as a part of another PR? I believe all of these three things are separate enhancements that can go on top of each other.

@bilalba @Acters While including them in a new PR is probably a good idea, updating the OpenAI SDK version is likely going to be a blocker on all of them:

The openai sdk is updated to version 4.52.7 addressing #19

I wonder if those 'base SDK upgrade' aspects of this PR could be separated off, such that they can be tested to 'continue working as expected', and get merged sooner rather than later.

Then the larger changes around gpt-4o-mini, json mode, parallel function calling, etc could be handled in separate/independent PR's as needed; without being blocked on the outdated SDK version.

Changing the model will become very necessary as gpt3.5 will eventually be deprecated as their new gpt4o mini is meant to be a naturally next generational equivalent in both cost and speed. It is just a name change and not a breaking change.

Another thing to note is that because of the new SDK, the code for the parsing is changed heavily to be compatible with how the OpenAI API communicates with us.

This means that JSON mode is forced on without our input. So the fixPerhapsBrokenResponse function should be removed as it causes new errors. This is what caused the problems for you @bilalba when you were testing the larger contexts.

The Using a larger context would not result in enough variable name replacements issue is not caused by the AI but from the fixPerhapsBrokenResponse breaking the json format. The json format from OpenAI uses ' single quotes, where as the fixPerhapsBrokenResponse will replace random parts with " double quotes which is more problematic when the context is larger as there will be larger amount of broken json vs a smaller context that will reduce the amount of broken json content.

We need to delete the fixPerhapsBrokenResponse function now that json mode is enforced with the new API updates and will not return broken json

@jehna
Copy link
Owner

jehna commented Aug 11, 2024

Hey! Seems that this PR got auto-closed because of the release of v2.

Thank you for your hard work and interest on improving humanify, especially @bilalba for maintaining a modernised fork while I've been unresponsive.

The v2 has now dependabot and automated merges for dependency updates that pass the tests, so dependencies should be easier to be kept up to date. I've also made gpt-4o-mini the default model and added the long awaited JSON mode with the new structured outputs.

The v2 does not count tokens anymore, but it uses the same AST-based precise approach that's been working for local renames. This should ensure that all the variable names are renamed, while not overloading the context limit of gpt-4o and others. I've experienced that some models (like Claude Opus) is much better at utilizing the full context window than others (like gpt-4o). By directing the LLM's focus to a small part that's only the fraction of its context window has worked the best in my testing. The v2 window size is super small now, but I'd be happy to increase it (or make it configurable via cli flag) if needed.

@0xdevalias
Copy link

Seems that this PR got auto-closed because of the release of v2.

I think that was probably because of this:

Also, it looks like the v2 branch may have been started from scratch (first commit: e9770a6), rather than continuing on from the historical branch (https://github.com/jehna/humanify/tree/v1). Why was that?

Originally posted by @0xdevalias in #31 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants