Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected shut down when number of tokens is large #134

Closed
HeMuling opened this issue Mar 14, 2023 · 3 comments
Closed

unexpected shut down when number of tokens is large #134

HeMuling opened this issue Mar 14, 2023 · 3 comments
Labels
duplicate This issue or pull request already exists

Comments

@HeMuling
Copy link

I found that the model of LLaMA-7B shut down unexpectedly when the number of tokens in prompt reaches some value, this value is approximately to be 500
this cannot be solved by setting number of tokens to predict high (e.g. 204800)

my initialization is:

./main -m ./models/7B/ggml-model-q4_0.bin \
-n 204800 \
-t 8 \
--repeat_penalty 1.0 \
--color -i \
-r "HeMuling:" \
--temp 1.0 \
-f ./models/p.txt

where p.txt is a file containing some prompts, and the token number of prompts is main: number of tokens in prompt = 486
the program shut down unexpectedly after a few interactions, last shows:

Allice:like how big
HeMuling

main: mem per token = 14434244 bytes
main:     load time =  1400.10 ms
main:   sample time =    21.30 ms
main:  predict time = 79072.03 ms / 154.74 ms per token
main:    total time = 88429.08 ms

I am using macPro M1 with 16GB RAM

I am wondering is there any limitation in the program or did i do something wrong

@Khalilbz
Copy link

Khalilbz commented Mar 14, 2023

I have the same problem as you, and here is my tests

I have changed the parameter -n to -n 1048 the context is longer by almost ~50% but still not able to generate long text, then changed it to -n 4096 and got almost the same length as -n 1048

Info:
CPU: 8 cores
RAM: 16G
Model: 7B
RAM used during generation is 4.6G

256 Limit (Almost 8 lines)

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.
Here is a list of 100 sentences in the context of IT 1. IT is important is terms of technology 2. IT is important in terms of technology. 3. Information Technology is important in terms of technology. 4. Information Technology is important in terms of technology. 5. Information Technology is important in terms of technology. 6. Information Technology is important in terms of technology. 7. Information Technology is important in terms of technology. 8. Information Technology is important in terms of technology. 9. Information Technology is important in terms of technology. 10. Information Technology is important in terms of technology. 11. Information Technology is important in terms of technology. 12. Information Technology is important in terms of technology. 13. Information Technology is important in terms of technology. 14. Information Technology is important in terms of technology. 15. Information Technology is important in terms of technology. 16. Information Technology is important in terms of technology. 17. Information Technology is important in terms of technology. 18. Information Technology is important in terms of technology. 19. Information Technology is important in terms of technology. 20. Information Technology is important in terms of technology. 21. Information Technology is important in terms of technology. 22. Information Technology is important

main: mem per token = 14434244 bytes
main:     load time =  2390.74 ms
main:   sample time =   156.97 ms
main:  predict time = 58222.04 ms / 205.01 ms per token
main:    total time = 61601.12 ms

1048 Limit (Almost 12 lines)

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.
Here is a list of 100 sentences in the context of IT 1. IT is important is terms of technology 2. IT is very important in terms of technology 3. IT is important in terms of technology. 4. IT is very important in terms of technology 5. IT is important in terms of technology 6. IT is very important in terms of technology 7. IT is important in terms of technology 8. IT is very important in terms of technology 9. IT is important in terms of technology 10. IT is very important in terms of technology 11. IT is important in terms of technology 12. IT is very important in terms of technology 13. IT is important in terms of technology 14. IT is very important in terms of technology 15. IT is important in terms of technology 16. IT is very important in terms of technology 17. IT is important in terms of technology 18. IT is very important in terms of technology 19. IT is important in terms of technology 20. IT is very important in terms of technology 21. IT is important in terms of technology 22. IT is very important in terms of technology 23. IT is important in terms of technology 24. IT is very important in terms of technology 25. IT is important in terms of technology 26. IT is very important in terms of technology 27. IT is important in terms of technology 28. IT is very important in terms of technology 29. IT is important in terms of technology 30. IT is very important in terms of technology 31. IT is important in terms of technology 32. IT is very important in terms of technology 33. IT is important in terms of technology 34. IT is very important in terms of technology 35. IT is important in terms of technology 36. IT is very important in terms of technology 37. IT is important in terms of technology 38. IT is very important in terms of technology 39. IT is important in terms of technology 40. IT is very important in terms of technology 41. IT is important in terms of technology 42. IT is very important in terms of technology 43. IT is important in terms of technology 44. IT is very important in terms

main: mem per token = 14434244 bytes
main:     load time =  2677.93 ms
main:   sample time =   287.36 ms
main:  predict time = 108363.11 ms / 212.06 ms per token
main:    total time = 112125.02 ms

I have tried with the Model 13B, it is so slow, uses about 8G of RAM, and output was about 9 Lines with -n 1048, and outputs about 12 Lines with -n 4048

@AndrewKeYanzhe
Copy link

AndrewKeYanzhe commented Mar 14, 2023

Duplicate of #71

Depending on how much memory you have you can increase the context size to get longer outputs. On a 64gb machine I was able to have a 12k context with the 7B model and 2k context with the 65B model. You can change it here

Originally posted by @eous in #71 (comment)

@HeMuling
Copy link
Author

problem solved thanks to @AndrewKeYanzhe's help, here is solution:

in the file main.cpp, change line 822:
https://github.com/ggerganov/llama.cpp/blob/460c48254098b28d422382a2bbff6a0b3d7f7e17/main.cpp#L794

in to (number size can be adjusted according to RAM):

if (!llama_model_load(params.model, model, vocab, 2048)) {  // TODO: set context from user input ??

the code should be able to give a 2048-words of context reference

then run the code in terminal:

cd llama.cpp
make

to re-compile

@gjmulder gjmulder added the duplicate This issue or pull request already exists label Mar 15, 2023
rooprob pushed a commit to rooprob/llama.cpp that referenced this issue Aug 2, 2023
Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

4 participants