Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

obj_size <= remaining_buffer_size #680

Open
qzq-123 opened this issue Jan 20, 2025 · 1 comment
Open

obj_size <= remaining_buffer_size #680

qzq-123 opened this issue Jan 20, 2025 · 1 comment

Comments

@qzq-123
Copy link

qzq-123 commented Jan 20, 2025

Does anyone know what obj_size and remaining_buffer_size refer to? Where can I adjust them?

Image

Container startup parameters
docker run --rm -it --net host --shm-size=20g \ --ulimit memlock=-1 --ulimit stack=67108864 --gpus all

I have an A5000 gpu, running the Qwen2.5-3B-Instruct model,
python3 ../run.py --input_text "Hello, what is your name?" --max_output_len=50 --tokenizer_dir ./tmp/Qwen/3B/ --engine_dir=./tmp/Qwen/3B/trt_engines/int4_weight_only/1-gpu/
I can get normal results.

But starting the backend service reported an error
python3 /tensorrtllm_backend/scripts/launch_triton_server.py --world_size=1 --model_repo=${MODEL_FOLDER}

@mutkach
Copy link

mutkach commented Jan 28, 2025

I've encountered the same problem with MLlama.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants