Pass pointer to params in llama_init_from_file #1902

mudler · 2023-06-16T21:49:01Z

Especially with golang bindings, calling by value has the side-effect of values not being copied correctly. This has been observed with go-skynet/go-llama.cpp#105 , only when enabling CUDA and linking against it.

I've tried to trace this back together with @lu-zero with gdb - however there seems to be no obvious reason why this is happening. We suspect something going on during the linking process with golang/gcc versions, however using a pointer fixes the issue and makes the binding work as expected.

Especially with golang bindings, calling by value has the side-effect of values not being copied correctly. This has been observed with the bindings in go-skynet/go-llama.cpp#105.

This is needed until ggml-org/llama.cpp#1902 is addressed/merged. Signed-off-by: mudler <[email protected]>

howard0su · 2023-06-17T11:40:51Z

however this break other bindings.

mudler · 2023-06-17T11:44:05Z

however this break other bindings.

it might need other bindings to update and sync, but does it break them? I doubt that. The only difference is that now bindings have to pass a pointer instead of a struct. Pretty easy to adapt

mudler · 2023-06-17T11:53:14Z

Happy to try other suggestions in case maintainers don't agree with the approach. This is by far the less invasive I've found. I'd avoid duplicate this function in llama.cpp just for golang, and pass by reference is quite common to avoid unnecessary memory allocation.

mudler · 2023-06-17T12:05:13Z

@ggerganov what's your thoughts on this? Currently I have to carry this patch into the binding code, which makes it painful to maintain across updates. The bindings are used also by https://github.com/go-skynet/LocalAI , making it really cumbersome to carry around.

Happy to explore other paths. Also, now that openllama is available, would make sense to port the binding entirely here in the llama repo and add test suites? This is tied also to #351 . I'd be glad to give a stab at it.

ggerganov · 2023-06-17T16:10:24Z

I guess we can make the change, but before that can you try to track down which one of the parameter gets corrupted and at which point. I think you might have some other issue lingering there.

In general, I prefer to pass parameter structs by value. For example, this is the pattern in ggml:

I now notice that we accidentally broke the pattern in llama.cpp with the introduction of llama_model_quantize_params:

https://github.com/ggerganov/llama.cpp/blob/0711a5f6dce7f04c2a791b14bc47f7d4cb545408/llama.h#L150-L155

I'm not worried about memory allocations as these are very tiny structures, usually on the stack.
So if anything, I would prefer to change llama_model_quantize to pass the params by value and keep the pattern consistent.

mudler · 2023-06-17T17:24:59Z

I guess we can make the change, but before that can you try to track down which one of the parameter gets corrupted and at which point. I think you might have some other issue lingering there.

I did tracked down with gdb - it is affecting only the booleans of params, at least. Although even if checking carefully one step at the time, it is not clear how that happens, as soon as it enters llama_init_from_file, at least booleans are mixed up.

load_model: f16 true                                                                                                                                                                                                 
load_model: lparams.f16 true                                                                              
load_model: lparams.low_vram false                                                                        
load_model: lparams.mlock false                                                                           
load_model: lparams.use_mmap true
ggml_init_cublas: found 1 CUDA devices:                                                                                                                                                                              
  Device 0: Tesla T4                                                                                      
llama_init_from_file: f16 false                                                                           
llama_init_from_file: mmap false                                                                          
llama_init_from_file: mlock true                                                                          
llama_init_from_file: low_vram true

(I did forgot to take the gdb output logs, but can collect those as well again if needed, but there is nothing more interesting to show from those as well).

And strangely enough, this started to happen ONLY when linking against CUDA libraries.

The problem appears only inside llama_init_from_file : https://github.com/go-skynet/go-llama.cpp/blob/7ad833b67070fd3ec46d838f5e38d21111013f98/binding.cpp#L634 , outside of it the parameters are all correct.

The parameters are correctly set by go, and are wrong only inside llama_init_from_file (before, and after the function call the parameters are all correct. It seems the corruption happen when copying into another structure into the memory).

To note, llama_init_from_file is in a different object than the binding, which I tried also to compile changing C++ standards, but with no difference.

In general, I prefer to pass parameter structs by value. For example, this is the pattern in ggml:
* https://github.com/ggerganov/llama.cpp/blob/0711a5f6dce7f04c2a791b14bc47f7d4cb545408/ggml.h#L495

* https://github.com/ggerganov/llama.cpp/blob/0711a5f6dce7f04c2a791b14bc47f7d4cb545408/ggml.h#L1231-L1242
I now notice that we accidentally broke the pattern in llama.cpp with the introduction of llama_model_quantize_params:

https://github.com/ggerganov/llama.cpp/blob/0711a5f6dce7f04c2a791b14bc47f7d4cb545408/llama.h#L150-L155

I'm not worried about memory allocations as these are very tiny structures, usually on the stack. So if anything, I would prefer to change llama_model_quantize to pass the params by value and keep the pattern consistent.

Gotcha, but that would leave me in the cold with the patch to carry along in the binding, and to document that for the users. I'm not sure what to do at this point - any pointer is appreciated here. I'd also avoid to duplicate code, but if that's what I'm left to, I guess I'll have to live with it.

bullno1 · 2023-06-18T01:40:41Z

How about writing a wrapper that gets compiled along with your binding:

int
go_llama_init_from_file(const char * path, struct llama_context_params * params) {
    return llama_init_from_file(path, *params);
}

howard0su · 2023-06-18T01:54:36Z

I am curious to understand why "pass by value" is not working. Also, even we want to use poitner to pass, we should use const * as it is a readonly parameter.

mudler · 2023-06-18T06:29:11Z

llama.cpp

@@ -2618,8 +2618,9 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s

 struct llama_context * llama_init_from_file(
                             const char * path_model,
-            struct llama_context_params   params) {
+         const struct llama_context_params *  params_ptr) {


@howard0su const is being used here? not sure what do you mean

mudler · 2023-06-18T06:29:40Z

How about writing a wrapper that gets compiled along with your binding:
int
go_llama_init_from_file(const char * path, struct llama_context_params * params) {
    return llama_init_from_file(path, *params);
}

tried that, but same results...

mudler · 2023-06-18T06:38:05Z

It only happens when linking against CUDA - it smells has to do with the objects created by nvcc + the linking, even if I can't see an obvious reason how linking could break that. I'm going to check if SSE could be blamed here ( thanks @lu-zero for the tip) and will collect gdb logs.

lu-zero · 2023-06-18T06:38:24Z

I am curious to understand why "pass by value" is not working. Also, even we want to use poitner to pass, we should use const * as it is a readonly parameter.

From what we could see from the gdb session the new data in the struct is a bogus copy (it is more evident on booleans but all the data seems wrong). I hadn't time to look into the disasm to see exactly what is happening.

I'm not worried about memory allocations as these are very tiny structures, usually on the stack.
So if anything, I would prefer to change llama_model_quantize to pass the params by value and keep the pattern consistent.

Those structs are larger than a pointer and their usage is read-only that I could see, so passing them as pointer shouldn't hurt at all. We spare some memcpy that somehow get wrong in this specific case (to be seen if it is because of gcc + nvcc objects with cgo driving the linker being mislinked or something stranger).

ggerganov · 2023-06-18T06:47:53Z

Just curious, can you confirm that when linking with CUDA, the initial list of parameters up to low_vram are all bogus:

    struct llama_context_params {
        int n_ctx;                             // text context
        int n_batch;                           // prompt processing batch size
        int n_gpu_layers;                      // number of layers to store in VRAM
        int main_gpu;                          // the GPU that is used for scratch and small tensors
        float tensor_split[LLAMA_MAX_DEVICES]; // how to split layers across multiple GPUs
        bool low_vram;                         // if true, reduce VRAM usage at the cost of performance
        int seed;                              // RNG seed, -1 for random
        ...

That is, you get bogus values for n_ctx, n_batch, n_gpu_layers, main_gpu, tensor_split

Need to confirm these because there are 2 contradicting statements so far:

I did tracked down with gdb - it is affecting only the booleans of params, at least. Although even if checking carefully one step at the time, it is not clear how that happens, as soon as it enters llama_init_from_file, at least booleans are mixed up.

and

From what we could see from the gdb session the new data in the struct is a bogus copy (it is more evident on booleans but all the data seems wrong).

I suspect that the "lonely" 1-byte bool low_vram surrounded by the big, bad 4-byte floats and ints somehow messes up the alignment of the struct and want to confirm that is the case. If yes, then the solution would be to just move low_vram together with it's little bool brothers and optionally add some extra padding if necessary

mudler · 2023-06-19T12:45:18Z

@ggerganov @lu-zero and I captured the gdb session when this is happening (always with CUDA on):

GNU gdb (Ubuntu 12.1-3ubuntu2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test...
warning: File "/usr/share/go-1.20/src/runtime/runtime-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-loa
d".
To enable execution of this file add
        add-auto-load-safe-path /usr/share/go-1.20/src/runtime/runtime-gdb.py
line to your configuration file "/home/ubuntu/.config/gdb/gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/ubuntu/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
(gdb) set logging enable
Copying output to gdb.txt.
Copying debug output to gdb.txt.
(gdb) b load_model
bBreakpoint 1 at 0x4b75e0: file binding.cpp, line 593.
(gdb) b llama_init_from_file
Breakpoint 2 at 0x4e3bb0: file /home/ubuntu/go-llama.cpp-3/llama.cpp/llama.cpp, line 2617.
(gdb) run
Starting program: /home/ubuntu/go-llama.cpp-3/test -ngl 10

This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.ubuntu.com 
Enable debuginfod for this session? (y or [n]) n
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffa5bff000 (LWP 23782)]
[New Thread 0x7fffa53fe000 (LWP 23783)]
[New Thread 0x7fff9ffff000 (LWP 23785)]
[New Thread 0x7fffa4bfd000 (LWP 23784)]
[New Thread 0x7fff9f7fe000 (LWP 23786)]
[New Thread 0x7fff9effd000 (LWP 23787)]

Thread 1 "test" hit Breakpoint 1, load_model (fname=0xa421b0 "./models/7B/ggml-model-q4_0.bin", n_ctx=128, n_seed=0, memory_f16=true, mlock=false, embeddings=
true, mmap=true, low_vram=false, vocab_only=false, n_gpu_layers=10, n_batch=0, maingpu=0xa421e0 "", tensorsplit=0xa42200 "") at binding.cpp:593
593     void* load_model(const char *fname, int n_ctx, int n_seed, bool memory_f16, bool mlock, bool embeddings, bool mmap, bool low_vram, bool vocab_only, it n_gpu_layers, int n_batch, const char *maingpu, const char *tensorsplit) {
(gdb) list
588         
589         return params;
590     }
591
592
593     void* load_model(const char *fname, int n_ctx, int n_seed, bool memory_f16, bool mlock, bool embeddings, bool mmap, bool low_vram, bool vocab_only, in
t n_gpu_layers, int n_batch, const char *maingpu, const char *tensorsplit) {
594         // load the model
595        llama_context_params lparams = llama_context_default_params();
596
597         lparams.n_ctx      = n_ctx;
(gdb) c
Continuing.
[New Thread 0x7fff9e7fc000 (LWP 23791)]
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla T4
[New Thread 0x7fff9dffb000 (LWP 23792)]
load_model seed 0
load_model embeddings true
load_model n_gpu_layer 10
load_model f16 true
load_model mmap true
load_model low_vram false
n_ctx  128
n_batch  0
n_gpu_layers  10
main_gpu  0
tensor_split   0.000000
low_vram  0
seed  0
f16_kv  1
logits_all  0
vocab_only  0
use_mmap  1
use_mlock  0
embedding  1
progress_callback  (nil)
progress_callback_user_data  (nil)

Thread 1 "test" hit Breakpoint 2, llama_init_from_file (path_model=0xa421b0 "./models/7B/ggml-model-q4_0.bin", params=<error reading variable: Cannot access m
emory at address 0x0>) at /home/ubuntu/go-llama.cpp-3/llama.cpp/llama.cpp:2617
2617                struct llama_context_params   params) {
(gdb) c
Continuing.
n_ctx  128
n_batch  0
n_gpu_layers  10
main_gpu  0
tensor_split   0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
low_vram  0
seed  0
f16_kv  255
logits_all  255
vocab_only  23
use_mmap  0
use_mlock  192
embedding  128
progress_callback  0x100000001
progress_callback_user_data  (nil)
error loading model: failed to open ./models/7B/ggml-model-q4_0.bin: No such file or directory
llama_init_from_file: failed to load model

Thread 1 "test" received signal SIGSEGV, Segmentation fault.
ggml_cuda_free_data (tensor=0x0) at /home/ubuntu/go-llama.cpp-3/llama.cpp/ggml-cuda.cu:2280
2280        if (tensor->backend != GGML_BACKEND_GPU && tensor->backend != GGML_BACKEND_GPU_SPLIT) {

This doesn't seem to be quite visible unless it affects boolean with a jump from 0 a to a bigger number - You can notice the behavior with use_mlock for instance that jumps from 0 to 192, so it switches the boolean from false to true.

We went ahead, and modified the llama_context_params structure to:

    struct llama_context_params {
        bool low_vram;                         // if true, reduce VRAM usage at the cost of performance
        bool f16_kv;     // use fp16 for KV cache
        bool logits_all; // the llama_eval() call computes all logits, not just the last one
        bool vocab_only; // only load the vocabulary, no weights
        bool use_mmap;   // use mmap if possible
        bool use_mlock;  // force system to keep model in RAM
        bool embedding;  // embedding mode only

        int seed;                              // RNG seed, -1 for random
        int n_ctx;                             // text context
        int n_batch;                           // prompt processing batch size
        int n_gpu_layers;                      // number of layers to store in VRAM
        int main_gpu;                          // the GPU that is used for scratch and small tensors
        float tensor_split[LLAMA_MAX_DEVICES]; // how to split layers across multiple GPUs
        // called with a progress value between 0 and 1, pass NULL to disable
        llama_progress_callback progress_callback;
        // context pointer passed to the progress callback
        void * progress_callback_user_data;
    };

And can confirm that the misalignment doesn't happen anymore:

ubuntu@gpu-friend:~/go-llama.cpp-3$ gdb --args ./test -ngl 10                                                                                                 
GNU gdb (Ubuntu 12.1-3ubuntu2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test...
warning: File "/usr/share/go-1.20/src/runtime/runtime-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-loa
d".
To enable execution of this file add
        add-auto-load-safe-path /usr/share/go-1.20/src/runtime/runtime-gdb.py
line to your configuration file "/home/ubuntu/.config/gdb/gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/ubuntu/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
(gdb) b load_model
bBreakpoint 1 at 0x4b75e0: file binding.cpp, line 593.
(gdb) b llama_init_from_file
Breakpoint 2 at 0x4e3b90: file /home/ubuntu/go-llama.cpp-3/llama.cpp/llama.cpp, line 2617.
(gdb) run
Starting program: /home/ubuntu/go-llama.cpp-3/test -ngl 10

This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.ubuntu.com 
Enable debuginfod for this session? (y or [n]) n
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffa5bff000 (LWP 28075)]
[New Thread 0x7fff9ffff000 (LWP 28076)]
[New Thread 0x7fffa53fe000 (LWP 28077)]
[New Thread 0x7fffa4bfd000 (LWP 28078)]
[New Thread 0x7fff9f7fe000 (LWP 28079)]

Thread 1 "test" hit Breakpoint 1, load_model (fname=0xa421b0 "./models/7B/ggml-model-q4_0.bin", n_ctx=128, n_seed=0, memory_f16=true, mlock=false, embeddings=
true, mmap=true, low_vram=false, vocab_only=false, n_gpu_layers=10, n_batch=0, maingpu=0xa421e0 "", tensorsplit=0xa42200 "") at binding.cpp:593
593     void* load_model(const char *fname, int n_ctx, int n_seed, bool memory_f16, bool mlock, bool embeddings, bool mmap, bool low_vram, bool vocab_only, in
t n_gpu_layers, int n_batch, const char *maingpu, const char *tensorsplit) {
(gdb) list
588         
589         return params;
590     }
591
592
593     void* load_model(const char *fname, int n_ctx, int n_seed, bool memory_f16, bool mlock, bool embeddings, bool mmap, bool low_vram, bool vocab_only, in
t n_gpu_layers, int n_batch, const char *maingpu, const char *tensorsplit) {
594         // load the model
595        llama_context_params lparams = llama_context_default_params();
596
597         lparams.n_ctx      = n_ctx;
(gdb) c
Continuing.
[New Thread 0x7fff9effd000 (LWP 28083)]
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla T4
[New Thread 0x7fff9e7fc000 (LWP 28084)]
load_model seed 0
load_model embeddings true
load_model n_gpu_layer 10
load_model f16 true
load_model mmap true
load_model low_vram false
n_ctx  128
n_batch  0
n_gpu_layers  10
main_gpu  0
tensor_split   0.000000
low_vram  0
seed  0
f16_kv  1
logits_all  0
vocab_only  0
use_mmap  1
use_mlock  0
embedding  1
progress_callback  (nil)
progress_callback_user_data  (nil)
Thread 1 "test" hit Breakpoint 2, llama_init_from_file (path_model=0xa421b0 "./models/7B/ggml-model-q4_0.bin", params=<error reading variable: Cannot access $
emory at address 0x1>) at /home/ubuntu/go-llama.cpp-3/llama.cpp/llama.cpp:2617
2617                struct llama_context_params   params) {
(gdb) c
Continuing.
n_ctx  128
n_batch  0
n_gpu_layers  10
main_gpu  0
tensor_split   0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
low_vram  0
seed  0
f16_kv  1
logits_all  0
vocab_only  0
use_mmap  1
use_mlock  0
embedding  1
progress_callback  0x8000000000
progress_callback_user_data  0xa00000000
error loading model: failed to open ./models/7B/ggml-model-q4_0.bin: No such file or directory
llama_init_from_file: failed to load model

Thread 1 "test" received signal SIGSEGV, Segmentation fault.
ggml_cuda_free_data (tensor=0x0) at /home/ubuntu/go-llama.cpp-3/llama.cpp/ggml-cuda.cu:2280
2280        if (tensor->backend != GGML_BACKEND_GPU && tensor->backend != GGML_BACKEND_GPU_SPLIT) {
(gdb)

mudler · 2023-06-19T12:56:17Z

I've created #1936 that seems to copy without misalignment, however, It feels relying solely on #1936 is particularly risky and fragile instead of passing by reference.

ggerganov · 2023-06-19T16:05:45Z

I've created #1936 that seems to copy without misalignment, however, It feels relying solely on #1936 is particularly risky and fragile instead of passing by reference.

I think this is the proper solution, we just need to make sure what's the best way to order the members and put comments to keep that order.

mudler · 2023-06-19T16:41:09Z

I've created #1936 that seems to copy without misalignment, however, It feels relying solely on #1936 is particularly risky and fragile instead of passing by reference.

I think this is the proper solution, we just need to make sure what's the best way to order the members and put comments to keep that order.

I'm still not entirely convinced that pass by value is the best as it shows how delicate it is in such situations and the fact that requires special comments and attention would make me uncomfortable with to have it in the code, however It's not my call. I'm already happy that the issue is fixed!

closing this PR then in favor of the other one.

wtarreau · 2023-06-24T14:09:01Z

I've created #1936 that seems to copy without misalignment, however, It feels relying solely on #1936 is particularly risky and fragile instead of passing by reference.

I think this is the proper solution, we just need to make sure what's the best way to order the members and put comments to keep that order.

The common practice with struct members is to order them by decreasing type sizes so that no holes are left. Of course it's not always convenient, but when holes have to be left, I like to replace them with a comment, and possibly add an explicit pad member if compatibility with packed is required. A great tool to inspect structures during development is "pahole", it shows the offsets and sizes of each members of the requested structs, location and size of holes, and the total vs used size.

On a related note, I don't know if the copy time nor memory usage are of any importance here, but the bools could be advantageously replaced with single bits.

Pass pointer to params in llama_init_from_file

200892a

Especially with golang bindings, calling by value has the side-effect of values not being copied correctly. This has been observed with the bindings in go-skynet/go-llama.cpp#105.

mudler added a commit to go-skynet/go-llama.cpp that referenced this pull request Jun 16, 2023

Workaround llama_init_from_file parameter copy

76e4416

This is needed until ggml-org/llama.cpp#1902 is addressed/merged. Signed-off-by: mudler <[email protected]>

mudler added a commit to go-skynet/go-llama.cpp that referenced this pull request Jun 16, 2023

Workaround llama_init_from_file parameter copy

c55308a

This is needed until ggml-org/llama.cpp#1902 is addressed/merged. Signed-off-by: mudler <[email protected]>

mudler commented Jun 18, 2023

View reviewed changes

mudler mentioned this pull request Jun 19, 2023

Workaround struct misalignment during value-copy #1936

Merged

mudler closed this Jun 19, 2023

mudler mentioned this pull request Sep 20, 2023

how to resolve this error? trying to run cublas go-skynet/go-llama.cpp#218

Closed

ggerganov mentioned this pull request Apr 15, 2024

Added a bool fold_lowercase to whisper_context_params ggerganov/whisper.cpp#2005

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass pointer to params in llama_init_from_file #1902

Pass pointer to params in llama_init_from_file #1902

mudler commented Jun 16, 2023 •

edited

Loading

howard0su commented Jun 17, 2023

mudler commented Jun 17, 2023 •

edited

Loading

mudler commented Jun 17, 2023 •

edited

Loading

mudler commented Jun 17, 2023

ggerganov commented Jun 17, 2023

mudler commented Jun 17, 2023 •

edited

Loading

bullno1 commented Jun 18, 2023

howard0su commented Jun 18, 2023

mudler Jun 18, 2023

mudler commented Jun 18, 2023

mudler commented Jun 18, 2023

lu-zero commented Jun 18, 2023

ggerganov commented Jun 18, 2023 •

edited

Loading

mudler commented Jun 19, 2023 •

edited

Loading

mudler commented Jun 19, 2023 •

edited

Loading

ggerganov commented Jun 19, 2023

mudler commented Jun 19, 2023 •

edited

Loading

wtarreau commented Jun 24, 2023

Pass pointer to params in llama_init_from_file #1902

Pass pointer to params in llama_init_from_file #1902

Conversation

mudler commented Jun 16, 2023 • edited Loading

howard0su commented Jun 17, 2023

mudler commented Jun 17, 2023 • edited Loading

mudler commented Jun 17, 2023 • edited Loading

mudler commented Jun 17, 2023

ggerganov commented Jun 17, 2023

mudler commented Jun 17, 2023 • edited Loading

bullno1 commented Jun 18, 2023

howard0su commented Jun 18, 2023

mudler Jun 18, 2023

Choose a reason for hiding this comment

mudler commented Jun 18, 2023

mudler commented Jun 18, 2023

lu-zero commented Jun 18, 2023

ggerganov commented Jun 18, 2023 • edited Loading

mudler commented Jun 19, 2023 • edited Loading

mudler commented Jun 19, 2023 • edited Loading

ggerganov commented Jun 19, 2023

mudler commented Jun 19, 2023 • edited Loading

wtarreau commented Jun 24, 2023

mudler commented Jun 16, 2023 •

edited

Loading

mudler commented Jun 17, 2023 •

edited

Loading

mudler commented Jun 17, 2023 •

edited

Loading

mudler commented Jun 17, 2023 •

edited

Loading

ggerganov commented Jun 18, 2023 •

edited

Loading

mudler commented Jun 19, 2023 •

edited

Loading

mudler commented Jun 19, 2023 •

edited

Loading

mudler commented Jun 19, 2023 •

edited

Loading