-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync : ggml (backend v2) #3912
sync : ggml (backend v2) #3912
Conversation
examples/finetune/finetune.cpp
Outdated
gf = ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, true); | ||
gf->order = (enum ggml_cgraph_eval_order) order; | ||
gb = ggml_new_graph(ctx_compute); | ||
gb = ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, false); | ||
gb_tmp = params.common.use_checkpointing | ||
? ggml_new_graph(ctx_compute) | ||
? ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slaren Does this look OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, I don't know if gb
here needs grads or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gb
needs grads, because gb
also contains the gf
nodes, which have grads.
Changing the bool grads
argument from false
to true
resolves a triggered assert in ggml.c ggml_graph_cpy
.
GGML_ASSERT(dst->grads != NULL);
With this change finetune runs, I will report back if the results are good as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ggml_graph_cpy
be changed to allow skipping the grads if the src has them but not the dst?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an additional - unrelated to this PR - regression in finetune and train-text-from-scratch due to new yarn rope implementation.
Changing bool grads
argument to true and applying #3974 to fix the backward process of rope, the output of finetune is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note, lines 1805 and 1807 below needs that change as well, I missed them at first attempt to copy this fix.
Also, mentioned regression and #3974 fix seems to be critical, because otherwise finetune produces LORA's without any progress from one checkpoint to another.
Pinging @xaedes - in case you get the chance to take a look and see if the training examples work as expected |
@ggerganov, is funetune on this branch expected to produce file with same hash, given all the same RNG states at begining compared to master branch? Or testing should be more manual, like quering the resulting LORA's ? I'll run few short tests on 3B model to check today. |
@CoruNethron The results should be identical for same RNG state |
Hmm, doesn't seem to calculate the correct context size for a 70B model I tried (
Other models I tried seemed to work (Mistral, Orca3B, CausalLM 14B). Doesn't seem related to GPU support, I tried compiling for CPU only and it made no difference. |
@KerfuffleV2 And this does not fail on |
Yes, that's correct. I also just tried with a Q2_K 70B and it failed the same - the numbers also didn't change. So quantization, GPU or no GPU doesn't seem to matter. |
I will look into the training examples. |
@KerfuffleV2 Should be fixed now @xaedes Thanks |
Thanks. I can confirm it seems good now. |
examples/finetune/finetune.cpp
Outdated
@@ -1769,7 +1769,7 @@ int main(int argc, char ** argv) { | |||
alloc = ggml_allocr_new_measure(tensor_alignment); | |||
gf = ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, true); | |||
gf->order = (enum ggml_cgraph_eval_order) order; | |||
gb = ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, false); | |||
gb = ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, true); | |||
gb_tmp = params.common.use_checkpointing | |||
? ggml_new_graph_custom(ctx_compute, LLAMA_TRAIN_MAX_NODES, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think gb_tmp
also needs the grads=true
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This merge mostly applies 875fb42
We should probably merge this soon. Anybody found any issues with the latest version of this branch? |
I did some testing with ROCM. Mainly just loading the model and running a few blocks of perplexity. I didn't notice a difference in performance. The models I tested returned identical perplexity results compared to master. Tested:
I couldn't test with Persimmon due to missing CUDA ops (#4041). It's not possible to offload layers or even run perplexity when compiled with CUDA/ROCM. I was able to load the model and do a little TG though. It seemed fine. Mistral and LLaMA2 70B were the ones that had issues with this pull previously. Currently everything seems to work as well as (Not that my approval really means anything. Also, I didn't test anything esoteric like training models.) |
* sync : ggml (backend v2) (wip) * sync : migrate examples and llama.cpp to dynamic graphs (wip) * sync : update tests + fix max op params to 64 ggml-ci * sync : ggml-cuda ggml-ci * llama : fix save/load state context size ggml-ci * sync : try to fix build on tvOS * sync : pass custom graph sizes in training examples * sync : update graph copies to new ggml API * sync : update sync-ggml.sh with new files * scripts : fix header in sync script * train : fix context size calculations * llama : increase inference graph size up to 4096 nodes * train : allocate grads for backward graphs * train : allocate grads for gb_tmp
With this change, you no longer get a coredump when you hit a GGML_ASSERT, and you can't even catch the assertion with gdb without e.g.
Could we please change the |
Yes, I changed it to |
This is a first step towards bringing the new ggml backend interface to
llama.cpp
. There should be no functional change at this point - merely transitioning to the new API in some places where it is necessary. This PR will likely remain open until we confirm that everything works correctly, so help with testing will be very appreciated.The main part of the code that we expect issues with are the training examples:
I'll put a notice in the readme to direct people here. In general, if you care about some specific functionality in
llama.cpp
, please checkout this branch and make sure that it works as expected and post a comment below. This will help to ensure that it will not break when this is merged.For more detailed information about this change: