-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass pointer to params in llama_init_from_file #1902
Conversation
Especially with golang bindings, calling by value has the side-effect of values not being copied correctly. This has been observed with the bindings in go-skynet/go-llama.cpp#105.
This is needed until ggml-org/llama.cpp#1902 is addressed/merged. Signed-off-by: mudler <[email protected]>
This is needed until ggml-org/llama.cpp#1902 is addressed/merged. Signed-off-by: mudler <[email protected]>
however this break other bindings. |
it might need other bindings to update and sync, but does it break them? I doubt that. The only difference is that now bindings have to pass a pointer instead of a struct. Pretty easy to adapt |
Happy to try other suggestions in case maintainers don't agree with the approach. This is by far the less invasive I've found. I'd avoid duplicate this function in llama.cpp just for golang, and pass by reference is quite common to avoid unnecessary memory allocation. |
@ggerganov what's your thoughts on this? Currently I have to carry this patch into the binding code, which makes it painful to maintain across updates. The bindings are used also by https://github.com/go-skynet/LocalAI , making it really cumbersome to carry around. Happy to explore other paths. Also, now that openllama is available, would make sense to port the binding entirely here in the llama repo and add test suites? This is tied also to #351 . I'd be glad to give a stab at it. |
I guess we can make the change, but before that can you try to track down which one of the parameter gets corrupted and at which point. I think you might have some other issue lingering there. In general, I prefer to pass parameter structs by value. For example, this is the pattern in
I now notice that we accidentally broke the pattern in I'm not worried about memory allocations as these are very tiny structures, usually on the stack. |
I did tracked down with gdb - it is affecting only the booleans of params, at least. Although even if checking carefully one step at the time, it is not clear how that happens, as soon as it enters
(I did forgot to take the gdb output logs, but can collect those as well again if needed, but there is nothing more interesting to show from those as well). And strangely enough, this started to happen ONLY when linking against CUDA libraries. The problem appears only inside The parameters are correctly set by To note,
Gotcha, but that would leave me in the cold with the patch to carry along in the binding, and to document that for the users. I'm not sure what to do at this point - any pointer is appreciated here. I'd also avoid to duplicate code, but if that's what I'm left to, I guess I'll have to live with it. |
How about writing a wrapper that gets compiled along with your binding:
|
I am curious to understand why "pass by value" is not working. Also, even we want to use poitner to pass, we should use const * as it is a readonly parameter. |
@@ -2618,8 +2618,9 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s | |||
|
|||
struct llama_context * llama_init_from_file( | |||
const char * path_model, | |||
struct llama_context_params params) { | |||
const struct llama_context_params * params_ptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@howard0su const is being used here? not sure what do you mean
tried that, but same results... |
It only happens when linking against CUDA - it smells has to do with the objects created by nvcc + the linking, even if I can't see an obvious reason how linking could break that. I'm going to check if SSE could be blamed here ( thanks @lu-zero for the tip) and will collect gdb logs. |
From what we could see from the gdb session the new data in the struct is a bogus copy (it is more evident on booleans but all the data seems wrong). I hadn't time to look into the disasm to see exactly what is happening.
Those structs are larger than a pointer and their usage is read-only that I could see, so passing them as pointer shouldn't hurt at all. We spare some memcpy that somehow get wrong in this specific case (to be seen if it is because of gcc + nvcc objects with cgo driving the linker being mislinked or something stranger). |
Just curious, can you confirm that when linking with CUDA, the initial list of parameters up to struct llama_context_params {
int n_ctx; // text context
int n_batch; // prompt processing batch size
int n_gpu_layers; // number of layers to store in VRAM
int main_gpu; // the GPU that is used for scratch and small tensors
float tensor_split[LLAMA_MAX_DEVICES]; // how to split layers across multiple GPUs
bool low_vram; // if true, reduce VRAM usage at the cost of performance
int seed; // RNG seed, -1 for random
... That is, you get bogus values for Need to confirm these because there are 2 contradicting statements so far:
and
I suspect that the "lonely" 1-byte |
@ggerganov @lu-zero and I captured the gdb session when this is happening (always with CUDA on):
This doesn't seem to be quite visible unless it affects boolean with a jump from 0 a to a bigger number - You can notice the behavior with We went ahead, and modified the
And can confirm that the misalignment doesn't happen anymore:
|
I think this is the proper solution, we just need to make sure what's the best way to order the members and put comments to keep that order. |
I'm still not entirely convinced that pass by value is the best as it shows how delicate it is in such situations and the fact that requires special comments and attention would make me uncomfortable with to have it in the code, however It's not my call. I'm already happy that the issue is fixed! closing this PR then in favor of the other one. |
The common practice with struct members is to order them by decreasing type sizes so that no holes are left. Of course it's not always convenient, but when holes have to be left, I like to replace them with a comment, and possibly add an explicit pad member if compatibility with packed is required. A great tool to inspect structures during development is "pahole", it shows the offsets and sizes of each members of the requested structs, location and size of holes, and the total vs used size. On a related note, I don't know if the copy time nor memory usage are of any importance here, but the bools could be advantageously replaced with single bits. |
Especially with golang bindings, calling by value has the side-effect of values not being copied correctly. This has been observed with go-skynet/go-llama.cpp#105 , only when enabling CUDA and linking against it.
I've tried to trace this back together with @lu-zero with gdb - however there seems to be no obvious reason why this is happening. We suspect something going on during the linking process with golang/gcc versions, however using a pointer fixes the issue and makes the binding work as expected.