-
-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--format code
only works sometimes, leads to syntax errors
#751
Comments
You can modify prompt templates, as described in #498 and implemented in #581, to attempt to get better quality code from your language models. From my experience, even the most highly trained large language models often make mistakes with programming languages, such as relying on modules that may not be installed, or calling functions that look plausible but do not exist. Some LLMs can purport to run code in a sandboxed environment, but these, too, are not guaranteed to work with any particular programming language or real-world runtime. In addition, I've found that even after giving a prompt template telling an LLM not to include anything other than code, many LLMs include explanatory text, such as the text in your screenshot. Prompt engineering may fix this, but future LLM developments may render your prompt ineffective. |
Sounds good, thank you!! Sorry if I didn't make it clear - do you think raising an exception when code is not generated would be feasible? For example, if parsing with Tree Sitter (https://tree-sitter.github.io/tree-sitter/) does not work, then I would raise an error if the user has passed in Appreciate the discussion! I don't think this issue has to do with generating better quality code, but simply about the unreliability of language models, and ways to constrain that unreliability. For example, I have tried setting the temperature parameter to zero, but that still leads to irreproducible results - where, sometimes code is generated, sometimes not in the above example I shared. Hope that helps for additional context! |
Thank you for mentioning Tree Sitter! I haven't tried that myself. Our general guidance is that whenever an AI model generates code for a person, the person needs to do a code review as diligently as if another person had written it. If you have a code change that would improve the quality of code generated via LLMs in Jupyter AI, I encourage you to open a pull request against Jupyter AI, or to build a new extension to further improve our code. Thanks again! |
Description
When I teach courses, it is hard to have students new to programming understand the limitations of LLMs, so they often take the output at face value.
The
--format code
directive in the cells is very helpful to illustrate what the expected output is.However, if I write the prompt incorrectly/make typos, then the
--format code
directive should throw an exception (I think? new to this) if the output is not formatted as code.Here is an example:
Reproduce
See above for the prompt that leads to a syntax error, running in https://colab.research.google.com/github/jaanli/language-model-notebooks/blob/main/notebooks/getting-started.ipynb
Expected behavior
When the user declares an intent such as
--format code
, the output should throw an exception if the output is not adherent to the user's intent or stated format.For example, in SGLang (https://github.com/sgl-project/sglang) it is possible to declare intents and then catch errors if the output is not in a format. Is support for these types of frameworks planned? I'd love to see what this would take and happy to try to contribute :)
The text was updated successfully, but these errors were encountered: