sync : ggml (backend v2) #3912

ggerganov · 2023-11-02T16:08:41Z

This is a first step towards bringing the new ggml backend interface to llama.cpp. There should be no functional change at this point - merely transitioning to the new API in some places where it is necessary. This PR will likely remain open until we confirm that everything works correctly, so help with testing will be very appreciated.

The main part of the code that we expect issues with are the training examples:

finetune
train-from-scratch
baby-llama

I'll put a notice in the readme to direct people here. In general, if you care about some specific functionality in llama.cpp, please checkout this branch and make sure that it works as expected and post a comment below. This will help to ensure that it will not break when this is merged.

For more detailed information about this change:

ggml.h

ggml-ci

ggml.c

ggml-ci

ggerganov · 2023-11-02T18:00:35Z

examples/finetune/finetune.cpp

+        gf = ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, true);
        gf->order = (enum ggml_cgraph_eval_order) order;
-        gb = ggml_new_graph(ctx_compute);
+        gb = ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, false);
        gb_tmp = params.common.use_checkpointing
-            ? ggml_new_graph(ctx_compute)
+            ? ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, false)


@slaren Does this look OK?

I think so, I don't know if gb here needs grads or not.

gb needs grads, because gb also contains the gf nodes, which have grads.

Changing the bool grads argument from false to true resolves a triggered assert in ggml.c ggml_graph_cpy.

GGML_ASSERT(dst->grads != NULL);

With this change finetune runs, I will report back if the results are good as well.

Should ggml_graph_cpy be changed to allow skipping the grads if the src has them but not the dst?

There was an additional - unrelated to this PR - regression in finetune and train-text-from-scratch due to new yarn rope implementation.

Changing bool grads argument to true and applying #3974 to fix the backward process of rope, the output of finetune is correct.

Just to note, lines 1805 and 1807 below needs that change as well, I missed them at first attempt to copy this fix.
Also, mentioned regression and #3974 fix seems to be critical, because otherwise finetune produces LORA's without any progress from one checkpoint to another.

ggml-ci

ggerganov · 2023-11-02T18:45:07Z

Pinging @xaedes - in case you get the chance to take a look and see if the training examples work as expected

CoruNethron · 2023-11-02T23:44:29Z

@ggerganov, is funetune on this branch expected to produce file with same hash, given all the same RNG states at begining compared to master branch? Or testing should be more manual, like quering the resulting LORA's ? I'll run few short tests on 3B model to check today.

ggerganov · 2023-11-03T06:18:04Z

@CoruNethron The results should be identical for same RNG state

ggml-ci

KerfuffleV2 · 2023-11-03T11:43:20Z

Hmm, doesn't seem to calculate the correct context size for a 70B model I tried (dolphin-2.1-70b.q4_k_s.gguf):

ggml_new_object: not enough space in the context's memory pool (needed 852464, available 852112)

Other models I tried seemed to work (Mistral, Orca3B, CausalLM 14B). Doesn't seem related to GPU support, I tried compiling for CPU only and it made no difference.

examples/finetune/finetune.cpp

ggerganov · 2023-11-03T12:42:04Z

@KerfuffleV2 And this does not fail on master correct?

KerfuffleV2 · 2023-11-03T15:32:20Z

@ggerganov

And this does not fail on master correct?

Yes, that's correct. I also just tried with a Q2_K 70B and it failed the same - the numbers also didn't change. So quantization, GPU or no GPU doesn't seem to matter.

xaedes · 2023-11-03T19:52:10Z

I will look into the training examples.

ggerganov · 2023-11-03T19:59:36Z

@KerfuffleV2 Should be fixed now

@xaedes Thanks

KerfuffleV2 · 2023-11-03T20:25:17Z

Should be fixed now

Thanks. I can confirm it seems good now.

ggml-ci

CoruNethron · 2023-11-07T04:33:44Z

Sorry about delay. Trying to compare master (381efbf) vs sync (081a86d), I face assertion within ggml.c:16209: dst->grads != NULL when run finetune on sync branch
Oh, I see @xaedes did resolved this already.

ggml-ci

xaedes · 2023-11-07T13:47:26Z

examples/finetune/finetune.cpp

@@ -1769,7 +1769,7 @@ int main(int argc, char ** argv) {
        alloc = ggml_allocr_new_measure(tensor_alignment);
        gf = ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, true);
        gf->order = (enum ggml_cgraph_eval_order) order;
-        gb = ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, false);
+        gb = ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, true);
        gb_tmp = params.common.use_checkpointing
            ? ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, false)


I think gb_tmp also needs the grads=true argument.

ggml-ci

ggerganov · 2023-11-12T15:07:07Z

ggml-alloc.c

This merge mostly applies 875fb42

ggerganov · 2023-11-12T15:12:08Z

We should probably merge this soon. Anybody found any issues with the latest version of this branch?

KerfuffleV2 · 2023-11-13T01:29:43Z

I did some testing with ROCM. Mainly just loading the model and running a few blocks of perplexity. I didn't notice a difference in performance. The models I tested returned identical perplexity results compared to master.

Tested:

Orca 3B
Mistral 7B
CausalLM 14B
Yi 34B
LLaMA2 70B

I couldn't test with Persimmon due to missing CUDA ops (#4041). It's not possible to offload layers or even run perplexity when compiled with CUDA/ROCM. I was able to load the model and do a little TG though. It seemed fine.

Mistral and LLaMA2 70B were the ones that had issues with this pull previously. Currently everything seems to work as well as master though.

(Not that my approval really means anything. Also, I didn't test anything esoteric like training models.)

* sync : ggml (backend v2) (wip) * sync : migrate examples and llama.cpp to dynamic graphs (wip) * sync : update tests + fix max op params to 64 ggml-ci * sync : ggml-cuda ggml-ci * llama : fix save/load state context size ggml-ci * sync : try to fix build on tvOS * sync : pass custom graph sizes in training examples * sync : update graph copies to new ggml API * sync : update sync-ggml.sh with new files * scripts : fix header in sync script * train : fix context size calculations * llama : increase inference graph size up to 4096 nodes * train : allocate grads for backward graphs * train : allocate grads for gb_tmp

cebtenzzre · 2023-11-27T23:47:42Z

With this change, you no longer get a coredump when you hit a GGML_ASSERT, and you can't even catch the assertion with gdb without e.g. catch syscall exit_group:

GGML_ASSERT: /home/jared/src/forks/llama.cpp/ggml-cuda.cu:5644: false
[Detaching after fork from child process 509162]
stack module disabled
warning: process 509151 is already traced by process 509126
ptrace: Operation not permitted.
No stack.
The program is not being run.
[Thread 0x7fffbcfde000 (LWP 509160) exited]
[Thread 0x7fffc9162000 (LWP 509159) exited]
[Thread 0x7fffcbbff000 (LWP 509155) exited]
[Thread 0x7ffff7f36000 (LWP 509151) exited]
[Thread 0x7fffb4fde000 (LWP 509161) exited]
[New process 509151]
[Inferior 1 (process 509151) exited with code 01]
>>>

Could we please change the exit(1) back to an abort()?

slaren · 2023-11-28T00:00:57Z

Yes, I changed it to exit(1) because abort ends without flushing the buffers, and sometimes you don't get all the output after a crash. But the fflush should already address that.

sync : ggml (backend v2) (wip)

aa7a2c4

cebtenzzre reviewed Nov 2, 2023

View reviewed changes

ggml.h Outdated Show resolved Hide resolved

ggerganov added 3 commits November 2, 2023 18:29

sync : migrate examples and llama.cpp to dynamic graphs (wip)

e819070

sync : update tests + fix max op params to 64

4fe646f

ggml-ci

sync : ggml-cuda

83c96d5

ggml-ci

ggerganov commented Nov 2, 2023

View reviewed changes

ggml.c Outdated Show resolved Hide resolved

llama : fix save/load state context size

8401e3e

ggml-ci

ggerganov force-pushed the sync branch from 6919a52 to 8401e3e Compare November 2, 2023 17:02

ggerganov added 2 commits November 2, 2023 19:22

sync : try to fix build on tvOS

815f44e

sync : pass custom graph sizes in training examples

16e819d

ggerganov commented Nov 2, 2023

View reviewed changes

ggerganov added 2 commits November 2, 2023 20:29

sync : update graph copies to new ggml API

e2349ec

Merge branch 'master' into sync

f3fb45b

ggml-ci

ggerganov added help wanted Extra attention is needed refactoring Refactoring labels Nov 2, 2023

ggerganov marked this pull request as ready for review November 2, 2023 18:41

ggerganov added the need feedback Testing and feedback with results are needed label Nov 2, 2023

ggerganov added a commit that referenced this pull request Nov 2, 2023

readme : add notice about #3912

224e7d5

ggerganov mentioned this pull request Nov 2, 2023

Finetune LORA #2632

Merged

13 tasks

sync : update sync-ggml.sh with new files

7f8e2a5

ggerganov mentioned this pull request Nov 3, 2023

fixed bad memory access exception on ios 17 #3527

Closed

ggerganov added 2 commits November 3, 2023 09:49

Merge branch 'master' into sync

075ee61

ggml-ci

scripts : fix header in sync script

dc22db7

slaren reviewed Nov 3, 2023

View reviewed changes

examples/finetune/finetune.cpp Outdated Show resolved Hide resolved

train : fix context size calculations

b1592ea

skadefro mentioned this pull request Nov 3, 2023

llama.cpp server / embeddings broken vercel/modelfusion#156

Closed

llama : increase inference graph size up to 4096 nodes

e50ab5a

Merge branch 'master' into sync

081a86d

ggml-ci

ggerganov added 2 commits November 7, 2023 10:05

Merge branch 'master' into sync

aa1f36c

ggml-ci

train : allocate grads for backward graphs

a4de804

xaedes reviewed Nov 7, 2023

View reviewed changes

ggerganov added 2 commits November 7, 2023 16:42

train : allocate grads for gb_tmp

548ec46

Merge branch 'master' into sync

9efc6b9

ggml-ci

ggerganov force-pushed the sync branch from 9523db3 to 9efc6b9 Compare November 12, 2023 15:06

ggerganov commented Nov 12, 2023

View reviewed changes

ggml-alloc.c

Copy link

Owner Author

ggerganov Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This merge mostly applies 875fb42

KerfuffleV2 approved these changes Nov 13, 2023

View reviewed changes

ggerganov merged commit 4760e7c into master Nov 13, 2023
39 checks passed

ggerganov deleted the sync branch November 13, 2023 12:38

olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023

readme : add notice about ggerganov#3912

b01f523

cebtenzzre mentioned this pull request Nov 28, 2023

ggml : restore abort() in GGML_ASSERT #4242

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : ggml (backend v2) #3912

sync : ggml (backend v2) #3912

ggerganov commented Nov 2, 2023 •

edited

Loading

ggerganov Nov 2, 2023

slaren Nov 2, 2023

xaedes Nov 6, 2023 •

edited

Loading

slaren Nov 6, 2023 •

edited

Loading

xaedes Nov 6, 2023

CoruNethron Nov 7, 2023 •

edited

Loading

ggerganov commented Nov 2, 2023

CoruNethron commented Nov 2, 2023

ggerganov commented Nov 3, 2023

KerfuffleV2 commented Nov 3, 2023

ggerganov commented Nov 3, 2023

KerfuffleV2 commented Nov 3, 2023

xaedes commented Nov 3, 2023

ggerganov commented Nov 3, 2023

KerfuffleV2 commented Nov 3, 2023

CoruNethron commented Nov 7, 2023 •

edited

Loading

xaedes Nov 7, 2023

ggerganov Nov 12, 2023

ggerganov commented Nov 12, 2023

KerfuffleV2 commented Nov 13, 2023 •

edited

Loading

cebtenzzre commented Nov 27, 2023 •

edited

Loading

slaren commented Nov 28, 2023 •

edited

Loading

sync : ggml (backend v2) #3912

sync : ggml (backend v2) #3912

Conversation

ggerganov commented Nov 2, 2023 • edited Loading

ggerganov Nov 2, 2023

Choose a reason for hiding this comment

slaren Nov 2, 2023

Choose a reason for hiding this comment

xaedes Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

slaren Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

xaedes Nov 6, 2023

Choose a reason for hiding this comment

CoruNethron Nov 7, 2023 • edited Loading

Choose a reason for hiding this comment

ggerganov commented Nov 2, 2023

CoruNethron commented Nov 2, 2023

ggerganov commented Nov 3, 2023

KerfuffleV2 commented Nov 3, 2023

ggerganov commented Nov 3, 2023

KerfuffleV2 commented Nov 3, 2023

xaedes commented Nov 3, 2023

ggerganov commented Nov 3, 2023

KerfuffleV2 commented Nov 3, 2023

CoruNethron commented Nov 7, 2023 • edited Loading

xaedes Nov 7, 2023

Choose a reason for hiding this comment

ggerganov Nov 12, 2023

Choose a reason for hiding this comment

ggerganov commented Nov 12, 2023

KerfuffleV2 commented Nov 13, 2023 • edited Loading

cebtenzzre commented Nov 27, 2023 • edited Loading

slaren commented Nov 28, 2023 • edited Loading

ggerganov commented Nov 2, 2023 •

edited

Loading

xaedes Nov 6, 2023 •

edited

Loading

slaren Nov 6, 2023 •

edited

Loading

CoruNethron Nov 7, 2023 •

edited

Loading

CoruNethron commented Nov 7, 2023 •

edited

Loading

KerfuffleV2 commented Nov 13, 2023 •

edited

Loading

cebtenzzre commented Nov 27, 2023 •

edited

Loading

slaren commented Nov 28, 2023 •

edited

Loading