Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planning for smart utilization of 3.5 #6

Open
CiberNin opened this issue Apr 4, 2023 · 1 comment
Open

Planning for smart utilization of 3.5 #6

CiberNin opened this issue Apr 4, 2023 · 1 comment

Comments

@CiberNin
Copy link

CiberNin commented Apr 4, 2023

GPT 3.5 is so much cheaper (.002 vs .06 / k tokens), not to mention it it usually returns faster and is less throttled.
Given that, it makes sense to always at least attempt to use GPT 3.5 first.

Given we are gonna try GPT 3.5 first, how do we determine when to fallback to GPT 4?

  1. When compiling the prompt for our completion, if it leaves less than n tokens remaining for completion, where n is the smallest number we expect to possibly hold an expected completion. IMO 500 tokens is a reasonable amount to reserve. But that's a variable that could use empirical measurement.
  2. When receiving the completion, we should prompt for the answer to be wrapped in some delimiters to detect if there were not enough tokens for GPT 3.5 to complete it's attempted answer.
  3. When detecting if the completion resulted in a fix for the current error. It may however be worth retrying here while slowly ratcheting up temperature. Or feeding new error back in. Need a way to check if GPT is just introducing new errors that happen before the original error could happen.

Additionally, code should include future proofing for fallback to the 32K model using rules 1 & 2 (since it's not smarter, just bigger). Obviously disabled by flag. Similarly allow disabling of 4-8k using the same system.

@biobootloader
Copy link
Owner

Good ideas! I'm done limited experimentation with using 3.5-turbo with wolverine (it's now added as a flag). It sometimes works but sometimes fails to return valid json. A quick optimization might be to try a few iterations with 3.5, and if it returns invalid json automatically retry with 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants