Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Overflows when Using CodeLlama7B Model on Titan XP #587

Closed
ClarkWain opened this issue Oct 18, 2023 · 8 comments
Closed

Memory Overflows when Using CodeLlama7B Model on Titan XP #587

ClarkWain opened this issue Oct 18, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@ClarkWain
Copy link

Hello,

I have been using your Tabby project and it's been very helpful. However, I've encountered an issue regarding memory management when I use the CodeLlama7B model.

Here's a detailed description of the problem:

Description

After starting the server with CodeLlama7B model on a Titan XP, the initial memory usage is around 6-7GB. After a single code completion, the memory usage increases to roughly 8GB. Following a few more code completions, the system throws an Out-Of-Memory (OOM) error.

Here is the error log:

2023-10-18T08:50:49.589409Z INFO tabby::serve: crates/tabby/src/serve/mod.rs:165: Starting server, this might takes a few minutes...
2023-10-18T08:52:40.522802Z INFO tabby::serve: crates/tabby/src/serve/mod.rs:183: Listening at 0.0.0.0:8080
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDA failed with error out of memory

I've tested other models including StarCoder1B, StarCoder3B, and StarCoder7B, and they all work well.

@ClarkWain ClarkWain added the bug Something isn't working label Oct 18, 2023
@ClarkWain
Copy link
Author

here is the command:

docker run -it \
  --gpus all -p 8080:8080 -v /data1/docker_main:/data \
  tabbyml/tabby \
  serve --model TabbyML/CodeLlama-7B --device cuda

@wsxiaoys
Copy link
Member

Thank you for reporting this. It seems to be a recurring issue, as reported on #541 (comment).

I'm investigating to identify the culprit between versions 0.2.0 and 0.3.0.

@wsxiaoys
Copy link
Member

Would you mind sharing the output of your health endpoint? You can acquire it using the following command:

curl -X POST http://localhost:8080/v1/health

@ClarkWain
Copy link
Author

Would you mind sharing the output of your health endpoint? You can acquire it using the following command:

curl -X POST http://localhost:8080/v1/health

{"model":"TabbyML/CodeLlama-7B","device":"cuda","compute_type":"auto","arch":"x86_64","cpu_info":"Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz","cpu_count":32,"cuda_devices":["NVIDIA TITAN Xp","NVIDIA TITAN Xp","NVIDIA TITAN Xp","NVIDIA TITAN Xp"],"version":{"build_date":"2023-10-10","build_timestamp":"2023-10-10T02:46:12.584424280Z","git_sha":"3580d6f5510060714266d3031ee352d43826d56d","git_describe":"v0.2.2"}}

@wsxiaoys
Copy link
Member

Since the issue mentioned in #541 (comment) states that the out-of-memory (OOM) problem is resolved with version 0.3.0, would you mind trying out version 0.3.0 to check if the OOM issue persists?

@ClarkWain
Copy link
Author

Since the issue mentioned in #541 (comment) states that the out-of-memory (OOM) problem is resolved with version 0.3.0, would you mind trying out version 0.3.0 to check if the OOM issue persists?

OK, I will try.

@ClarkWain
Copy link
Author

I have tried it, the memory usage of CodaLlama7B is within the range of 7GB~8GB, and there is no OOM anymore. Thanks

@wsxiaoys
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants