Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context sensitive help #3556

Closed
wants to merge 37 commits into from

Conversation

pudepiedj
Copy link
Contributor

All the apps/executables in llama.cpp/examples currently use the same help based on lists in common.cpp, which don't distinguish between the arguments/parameters that are implemented in a particular app, so it's hard to know which ones should work (implemented) and which shouldn't (not implemented) or for some reason don't (bug).

The discussion #3514 details early thoughts and shows output from the latest implementation, which scans all the .cpp source files in examples/ and identifies which are using one of the gpt-params attributes by searching on params.attribute. At the moment this is implement in python in find-implemented-args.py but it could be ported to C++ and some part would obviously need to be (TODO) to incorporate it into, say, common.cpp.

readcommonh.py grabs all the attributes from common.h and is then imported into find-implemented-args.py providing up-to-date scans of common/common.h for all current param attributes (although there are a couple that escape this scan - logit_bias and lora_adapter). There are two syntactically distinct forms for the attributes, in their param.attribute form (e.g. n_threads) and in their common.cpp help form (e.g., --threads) as is triggered by /bin/app --help. These are matched by find-implemented-args.py to perform substitutions/links.

About 12 implemented args/parameters are not listed in the master version of common.cpp, so they have been added in the context-sensitive-help branch of my fork - --embedding, --beams, --ppl-stride, --ppl-output-type, --memory-f32, --no-mmap, --mlock, --use-color, --nprobs, --alias, --infill, --prompt-file - but the fprintf() sequence will also need to be checked (TODO).

Discussion #3514 shows a complete output run based on current values, although only the help variety is really needed, but if this is of any general interest we can work on a C++ version that could be built into common.cpp along with the rest of the help.

Written and tested on Apple M2 MAX (38 core) with 32GB RAM running Sonoma 14.0.

@pudepiedj
Copy link
Contributor Author

pudepiedj commented Oct 9, 2023

Trailing ws removed.
Added final newlines to find-implementation-args.py and cmap-example.cpp.
And VSC will do it automatically once you know how (mea culpa):
settings.json

{
    "files.trimTrailingWhitespace": true,
    "files.trimFinalNewlines": true,
    "files.insertFinalNewline": true
}

@KerfuffleV2
Copy link
Collaborator

Absolutely nothing against you personally, but I really, really don't like the approach of scanning source files to try to determine that information.

I think it's likely to be pretty brittle, hard to understand/confusing and overly complicated. A much simpler approach would just be to make a big enum of the possible options and have a function that parses/generates help based on a list of options.

Then the individual examples can just pass a list of the options they support and "context-sensitive help" is the result.

@pudepiedj
Copy link
Contributor Author

Absolutely nothing against you personally, but I really, really don't like the approach of scanning source files to try to determine that information.

I think it's likely to be pretty brittle, hard to understand/confusing and overly complicated. A much simpler approach would just be to make a big enum of the possible options and have a function that parses/generates help based on a list of options.

Then the individual examples can just pass a list of the options they support and "context-sensitive help" is the result.

Thank for the response. I understand your reservations, and I don't take constructive criticism personally, so no worries there, but I think you're underestimating the overhead to keep the enum you suggest up to date. I did consider that, as well as the "binary coding" option I outlined in the discussion before rejecting it, but both ultimately demand some kind of automation or they become prohibitively cumbersome.

Unless it's done automatically, which brings us back to some version of where we are, every time there is a new implementation or an enhancement to the whole suite of apps someone needs to go through and change everything, and in my experience that ceases to happen pretty quickly and is itself a bit "brittle".

As for complicated: if the underlying code is written tightly according to consistent standards - and it is - then it isn't that complicated; if it were, I wouldn't have been able to do it!

@KerfuffleV2
Copy link
Collaborator

KerfuffleV2 commented Oct 9, 2023

I don't take constructive criticism personally

I definitely respect that! I still hate saying something negative about someone else's work, but in this case I'm really not in favor of this type of solution.

I think you're underestimating the overhead to keep the enum you suggest up to date.

I don't think it would be that bad. Most examples fall into one of three categories: Supports everything, supports only a few things, supports everything except for a few options.

The option interface could export a couple convenience functions like one that returns all the options: examples that support all except a few can just remove the ones they don't support. Examples that only support a few can start with an empty list of supported options and just add the ones they support, and of course the examples that support everything just don't need to do anything.

If there are other common sets of options then there could be other convenience functions that provide them as a starting point.

Just an example, main supports most options but not parallel generation right now, so it could grab the preset "all options" list (or maybe "all inference options" which might exclude stuff like perplexity, etc) and just remove the parallel generation items that it doesn't support.

On the other hand, an application like batched just supports the model file, prompt and number of parallel sequences so it could start with an empty list of options and just add those three.

There would still be the need to maintain a master list of options and associated help text, parsing those options, etc, but that's the case with the current status quo and also your changes as far as I know. So that part isn't really the issue. (Not sure if I explained it clearly enough. If not, let me know and I can show a more concrete example of what I'm talking about.)

@ggerganov
Copy link
Owner

Thanks for this work - interesting approach. I'm inclined to say no atm, as I think we are over-engineering this problem.
But let me think on it for some time and see if I will change my mind

@ggerganov ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label Oct 10, 2023
@pudepiedj
Copy link
Contributor Author

Thanks for this work - interesting approach. I'm inclined to say no atm, as I think we are over-engineering this problem. But let me think on it for some time and see if I will change my mind

No problem at all. @KerfuffleV2 and I had a conversation about it as you will have seen, but it's absolutely your call. And apologies (again) for my incompetence causing you extra work when so many PR runs failed because of my lack of experience with github, VSC and the system. I think the version I recently pushed will compile and I've fixed the previous failure to catch logit_bias and lora_adapter but the ./bin/cmap-example executable doesn't produce any output unless run in debug mode for reasons that currently escape me.

Anyway, I've learned a lot about the whole suite of apps by doing this, so nothing is lost. Thanks for your patience and, more particularly, for the whisper.cpp and llama.cpp projects, which are fantastic.

@pudepiedj
Copy link
Contributor Author

pudepiedj commented Oct 10, 2023

I don't take constructive criticism personally

I definitely respect that! I still hate saying something negative about someone else's work, but in this case I'm really not in favor of this type of solution.

I think you're underestimating the overhead to keep the enum you suggest up to date.

I don't think it would be that bad. Most examples fall into one of three categories: Supports everything, supports only a few things, supports everything except for a few options.

The option interface could export a couple convenience functions like one that returns all the options: examples that support all except a few can just remove the ones they don't support. Examples that only support a few can start with an empty list of supported options and just add the ones they support, and of course the examples that support everything just don't need to do anything.

If there are other common sets of options then there could be other convenience functions that provide them as a starting point.

Just an example, main supports most options but not parallel generation right now, so it could grab the preset "all options" list (or maybe "all inference options" which might exclude stuff like perplexity, etc) and just remove the parallel generation items that it doesn't support.

On the other hand, an application like batched just supports the model file, prompt and number of parallel sequences so it could start with an empty list of options and just add those three.

There would still be the need to maintain a master list of options and associated help text, parsing those options, etc, but that's the case with the current status quo and also your changes as far as I know. So that part isn't really the issue. (Not sure if I explained it clearly enough. If not, let me know and I can show a more concrete example of what I'm talking about.)

I suspect that this horse has already bolted, but just for the sake of completeness I feel that I should respond to what at least appear to be misreadings of what I have done in your evaluation.

As I indicated before, your solution seems to involve some kind of manual intervention to every app that indicates which argument/parameter pairs it supports. The apps don't decide for themselves which they support: unless it is manual, the app must be scanned - either by a fallible human or some kind of program - to ascertain which they are; this is exactly what find-implemented-args.py does, but it does it for all the apps in any specified directory.

The "master list" you mention - as things stand - arises from a combination of common.h and common.cpp exactly as they are already; no further list is required by readcommonh.py or find-implementation-args.py. Currently, the only manual part comes from the need to associate gpt-param attributes with command-line arguments, but even that could be eliminated if all the param.attributes were exactly matched to each --attribute in those command lines. In view of the stated desire to reduce duplication of code stated elsewhere and marked as high priority, rendering them the same would enhance that process and make the substitution dictionary redundant.

My point, in sum, is that if a developer for some reason wanted to add --dice-carrots as a command-line option, he would need to do three things: add dice_carrots to common.h as a gpt-param; add -dc --dice-carrots to common.cpp as a help item along with the associated if argv[i] == '-dc' || argv[i] == '--dice-carrots': check; implement whatever was involved in however many apps could do it using param.dice_carrots with associated code. find-implemented-args.py and readcommonh.py (although creadcommonh.cpp is a stepping-stone to a C++ implementation of the latter and already uploaded) will automatically add the -dc --dice-carrots (default: 0) don't cut yourself to the contextual help for every app in which dice-carrots is implemented, and it's not necessary even to think about keeping a master list somewhere else. Provided developers stick to the house style of gpt-param, param.attribute and associated --help (and why wouldn't they?), everything else is automatic.

@cebtenzzre
Copy link
Collaborator

@pudepiedj I don't think running a python script should be part of the process of adding a new option to a llama.cpp example (or examples), especially one that tries to parse C++ code with a regex.

@KerfuffleV2
Copy link
Collaborator

KerfuffleV2 commented Oct 10, 2023

your solution seems to involve some kind of manual intervention to every app that indicates which argument/parameter pairs it supports.

Basically, but that doesn't mean the options have to be dealt with individually for each app.

The apps don't decide for themselves which they support

I mean, they basically do otherwise you could just use all options for every example. In reality, so apps don't implement the stuff that's required to respect/be compatible with certain options.

Just for example, main currently doesn't support parallel generation. That feature would have to be implemented.

The "master list" you mention - as things stand - arises from a combination of

I think you misunderstood me. I wasn't saying maintaining the master list was something specific to your approach. I was saying it was an issue (if you want to look at it that way - I do, personally) that was shared by all three approaches: the status quo, your approach and what I was suggesting as well.

The reason I brought that up, is because I don't think that requirement would really be part of the ideal solution. What I was suggesting was a relatively simple compromise. I mean, if I add that --dice-carrots option to my example and no other examples implement it/use it, then why does it have to be added to a central location? It's specific to that one example.

for every app in which dice-carrots is implemented

This is basically the issue, you can't reliably tell if it's "implemented" using a regex.

#ifdef LLAMA_SUPPORT_CARROT_CHOPPING
 printf("%d", params.dice_carrots);
#endif

Your script will think I implemented dice-carrots whether or not it was complied with LLAMA_SUPPORT_CARROT_CHOPPING enabled. Right?

gpt_params my_params;
// Stuff
printf("%d", my_params.dice_carrots);

Your script won't recognize that I "implement" dice-carrots because I didn't call my variable exactly params. Right? But why am I required to call the variable something specific?

Also, it doesn't look like your regexs will handle -> or whitespace correctly. [->\.] matches -, or > or . (I think the \ is redundant there because . isn't special in a character class). You could possibly fix it by changing it to something like \s*(?:->|\.)\s* but this just is to illustrate how hard it is to do this with regexes. They just fall short of being able to reliably parse C even if you were the world's foremost expert on regexes, which I'm not and you probably aren't either.

Also, if I happen to name some random variable params weird stuff will probably happen (at least if it is a struct that has a field that also exists in gpt_params). Those restrictions/effects are difficult to predict and pretty inintuitive in my opinion.

It's really not maintaining lists or whatever that I was concerned about, it was arbitrary/unintuitive effects and restrictions, and also the difficulty of reliably determining stuff like "this example implements this feature".

edit: Another thing just occurred to me. I have some changes in #3543 that moves the sampling-related parameters out of gpt_params into its own struct (which gpt_params holds an instance of). It's definitely not a sure thing that the pull I mentioned gets accepted eventually but the idea of moving common options into a separate structure to clean up the code/make it easier to deal with is definitely something that could happen in different pulls. Would we have to say "No, you can't do that. Everything has to be at the top level in gpt_params)?

If not, how will you deal with stuff like:

gpt_params params;
llama_sampling_params & sampling_params = params.sampling_params;

and that kind of thing (an approach I used a lot in my change to avoid having to write params.sampling_params.whatever a billion times.

@pudepiedj
Copy link
Contributor Author

@pudepiedj I don't think running a python script should be part of the process of adding a new option to a llama.cpp example (or examples), especially one that tries to parse C++ code with a regex.

You are right, and I wouldn't dream of it. I just find python easier to script in terms of "proof of concept"; a working version would need to be in C++.

@slaren
Copy link
Collaborator

slaren commented Oct 10, 2023

I think that there are two big categories of flags, model related and sampling related. Nearly everything else is specific to one or two examples. So maybe it would be good to have common parsing for these two big categories (that can be toggled per example), and then just do the rest of the parsing in each example? We could extend the command line parsing functions in common.cpp to support adding new flags dynamically.

I would avoid any over-engineered solution because examples are supposed to be simple and easy to understand. It may be preferable to duplicate some code than to make the code harder to understand by adding more layers.

@pudepiedj
Copy link
Contributor Author

I think that there are two big categories of flags, model related and sampling related. Nearly everything else is specific to one or two examples. So maybe it would be good to have common parsing for these two big categories (that can be toggled per example), and then just do the rest of the parsing in each example? We could extend the command line parsing functions in common.cpp to support adding new flags dynamically.

I would avoid any over-engineered solution because examples are supposed to be simple and easy to understand. It may be preferable to duplicate some code than to make the code harder to understand by adding more layers.

These are great points! Thank you! I absolutely think (amongst many things) that we are at the point where we need some kind of common parsing/naming if this project is to go on.

My issue in this conversation has absolutely nothing to do with the relative merits of Python and C, and my ineptitude in both should prove that lack of expertise in coding is not in itself disproof of concept. Honestly: who (with any sense) cares?

But the question you raise is really important: what is over-engineered? I am a very poor coder, but I [still hope that I can] think, and the trouble with ad hoc solutions is not that they are over-engineered, but that they are under-engineered: they limit their scope by their lack of ambition, by their lack of vision.

No doubt we can create heaps of ad hoc solutions to any number of perceived intermediate problems, but the key is that what we are doing can be generalised wherever we want to go.

We should not accept "fixit-it" temporary solutions: observed limitations should feed back to redefine fundamental concepts (modus tollendo tollens) and so we shouldn't build limitations into this fantastic project by opting for quick fixes to systemic difficulties. Specifically: if we need (right now) to regularise the syntax/naming of the code, we should do it. Otherwise, we are building on sand.

@goerch
Copy link
Collaborator

goerch commented Oct 10, 2023

@pudepiedj : we are not making the rules, the lead contributors are (and the success of the project proves them right).

@slaren
Copy link
Collaborator

slaren commented Oct 10, 2023

But the question you raise is really important: what is over-engineered?

In this case, my main concern is that I should be to understand everything in an example without having to look too far. If I see something like this in an example, it is fairly obvious what it does and where the value of n_parallel comes from:

add_model_arguments(&mparams);
add_sampling_arguments(&sparams);
add_argument({"-np", "--parallel"}, "number of parallel sequences to decode", 1, [&](auto s) {
    n_parallel = std::stoi(s);
});
parse_arguments(argc, argv);

auto [model, ctx] = init_model(mparams);

// ...
llama_token token = sample_token(ctx, sparams, ...);

When you add another abstraction layer on top of all of this, I am no longer able to understand at a glance what is happening in the code. In some cases that complexity may be completely justified, but in this case simplicity is the main concern, because the reason I am looking at the example is to learn how to use llama.cpp, not to learn about command line parsing schemes.

This is just an example, I am not saying that the command line parsing should look exactly like that (that code isn't even C++11), and I don't want to discourage from exploring different options either.

@KerfuffleV2
Copy link
Collaborator

If I see something like this in an example,

That's very, very similar to the approach I was thinking of. The only thing different really is I might have done it where you pass an enum of the argument type like ARG_INT, ARG_FLOAT, etc instead of calling the parse functions in a lambda. (I hadn't really thought it out to the point of how to get the result into a custom variable though, so maybe the lambda approach is the only reasonable way to do it.)

@pudepiedj
Copy link
Contributor Author

Lots of interesting things to consider have been mentioned and many deserve long consideration, especially about the way to organise multi-contributor projects and the extent to which some kind of central control is eventually necessary, but this isn't the place for that. I think this PR has served its purpose, so I will close it.

@pudepiedj pudepiedj closed this Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Demonstrate some concept or idea, not intended to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants