unexpected shut down when number of tokens is large #134

HeMuling · 2023-03-14T14:09:10Z

I found that the model of LLaMA-7B shut down unexpectedly when the number of tokens in prompt reaches some value, this value is approximately to be 500
this cannot be solved by setting number of tokens to predict high (e.g. 204800)

my initialization is:

./main -m ./models/7B/ggml-model-q4_0.bin \
-n 204800 \
-t 8 \
--repeat_penalty 1.0 \
--color -i \
-r "HeMuling:" \
--temp 1.0 \
-f ./models/p.txt

where p.txt is a file containing some prompts, and the token number of prompts is main: number of tokens in prompt = 486
the program shut down unexpectedly after a few interactions, last shows:

Allice:like how big
HeMuling

main: mem per token = 14434244 bytes
main:     load time =  1400.10 ms
main:   sample time =    21.30 ms
main:  predict time = 79072.03 ms / 154.74 ms per token
main:    total time = 88429.08 ms

I am using macPro M1 with 16GB RAM

I am wondering is there any limitation in the program or did i do something wrong

The text was updated successfully, but these errors were encountered:

Khalilbz · 2023-03-14T14:29:01Z

I have the same problem as you, and here is my tests

I have changed the parameter -n to -n 1048 the context is longer by almost ~50% but still not able to generate long text, then changed it to -n 4096 and got almost the same length as -n 1048

Info:
CPU: 8 cores
RAM: 16G
Model: 7B
RAM used during generation is 4.6G

256 Limit (Almost 8 lines)

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.
Here is a list of 100 sentences in the context of IT 1. IT is important is terms of technology 2. IT is important in terms of technology. 3. Information Technology is important in terms of technology. 4. Information Technology is important in terms of technology. 5. Information Technology is important in terms of technology. 6. Information Technology is important in terms of technology. 7. Information Technology is important in terms of technology. 8. Information Technology is important in terms of technology. 9. Information Technology is important in terms of technology. 10. Information Technology is important in terms of technology. 11. Information Technology is important in terms of technology. 12. Information Technology is important in terms of technology. 13. Information Technology is important in terms of technology. 14. Information Technology is important in terms of technology. 15. Information Technology is important in terms of technology. 16. Information Technology is important in terms of technology. 17. Information Technology is important in terms of technology. 18. Information Technology is important in terms of technology. 19. Information Technology is important in terms of technology. 20. Information Technology is important in terms of technology. 21. Information Technology is important in terms of technology. 22. Information Technology is important

main: mem per token = 14434244 bytes
main:     load time =  2390.74 ms
main:   sample time =   156.97 ms
main:  predict time = 58222.04 ms / 205.01 ms per token
main:    total time = 61601.12 ms

1048 Limit (Almost 12 lines)

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.
Here is a list of 100 sentences in the context of IT 1. IT is important is terms of technology 2. IT is very important in terms of technology 3. IT is important in terms of technology. 4. IT is very important in terms of technology 5. IT is important in terms of technology 6. IT is very important in terms of technology 7. IT is important in terms of technology 8. IT is very important in terms of technology 9. IT is important in terms of technology 10. IT is very important in terms of technology 11. IT is important in terms of technology 12. IT is very important in terms of technology 13. IT is important in terms of technology 14. IT is very important in terms of technology 15. IT is important in terms of technology 16. IT is very important in terms of technology 17. IT is important in terms of technology 18. IT is very important in terms of technology 19. IT is important in terms of technology 20. IT is very important in terms of technology 21. IT is important in terms of technology 22. IT is very important in terms of technology 23. IT is important in terms of technology 24. IT is very important in terms of technology 25. IT is important in terms of technology 26. IT is very important in terms of technology 27. IT is important in terms of technology 28. IT is very important in terms of technology 29. IT is important in terms of technology 30. IT is very important in terms of technology 31. IT is important in terms of technology 32. IT is very important in terms of technology 33. IT is important in terms of technology 34. IT is very important in terms of technology 35. IT is important in terms of technology 36. IT is very important in terms of technology 37. IT is important in terms of technology 38. IT is very important in terms of technology 39. IT is important in terms of technology 40. IT is very important in terms of technology 41. IT is important in terms of technology 42. IT is very important in terms of technology 43. IT is important in terms of technology 44. IT is very important in terms

main: mem per token = 14434244 bytes
main:     load time =  2677.93 ms
main:   sample time =   287.36 ms
main:  predict time = 108363.11 ms / 212.06 ms per token
main:    total time = 112125.02 ms

I have tried with the Model 13B, it is so slow, uses about 8G of RAM, and output was about 9 Lines with -n 1048, and outputs about 12 Lines with -n 4048

AndrewKeYanzhe · 2023-03-14T17:14:23Z

Duplicate of #71

Depending on how much memory you have you can increase the context size to get longer outputs. On a 64gb machine I was able to have a 12k context with the 7B model and 2k context with the 65B model. You can change it here

Originally posted by @eous in #71 (comment)

HeMuling · 2023-03-14T23:57:24Z

problem solved thanks to @AndrewKeYanzhe's help, here is solution:

in the file main.cpp, change line 822:
https://github.com/ggerganov/llama.cpp/blob/460c48254098b28d422382a2bbff6a0b3d7f7e17/main.cpp#L794

in to (number size can be adjusted according to RAM):

if (!llama_model_load(params.model, model, vocab, 2048)) {  // TODO: set context from user input ??

the code should be able to give a 2048-words of context reference

then run the code in terminal:

cd llama.cpp
make

to re-compile

add llama2.c-android to readme

HeMuling closed this as completed Mar 14, 2023

gjmulder added the duplicate This issue or pull request already exists label Mar 15, 2023

rooprob pushed a commit to rooprob/llama.cpp that referenced this issue Aug 2, 2023

Merge pull request ggml-org#134 from Manuel030/sync-with-upstream

4a4663a

add llama2.c-android to readme

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

Fix logprob calculation. Fixes ggml-org#134

b6747f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unexpected shut down when number of tokens is large #134

unexpected shut down when number of tokens is large #134

HeMuling commented Mar 14, 2023

Khalilbz commented Mar 14, 2023 •

edited

Loading

AndrewKeYanzhe commented Mar 14, 2023 •

edited

Loading

HeMuling commented Mar 14, 2023

unexpected shut down when number of tokens is large #134

unexpected shut down when number of tokens is large #134

Comments

HeMuling commented Mar 14, 2023

Khalilbz commented Mar 14, 2023 • edited Loading

AndrewKeYanzhe commented Mar 14, 2023 • edited Loading

HeMuling commented Mar 14, 2023

Khalilbz commented Mar 14, 2023 •

edited

Loading

AndrewKeYanzhe commented Mar 14, 2023 •

edited

Loading