You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Does anyone know what obj_size and remaining_buffer_size refer to? Where can I adjust them?
Container startup parameters docker run --rm -it --net host --shm-size=20g \ --ulimit memlock=-1 --ulimit stack=67108864 --gpus all
I have an A5000 gpu, running the Qwen2.5-3B-Instruct model, python3 ../run.py --input_text "Hello, what is your name?" --max_output_len=50 --tokenizer_dir ./tmp/Qwen/3B/ --engine_dir=./tmp/Qwen/3B/trt_engines/int4_weight_only/1-gpu/
I can get normal results.
But starting the backend service reported an error python3 /tensorrtllm_backend/scripts/launch_triton_server.py --world_size=1 --model_repo=${MODEL_FOLDER}
The text was updated successfully, but these errors were encountered:
Does anyone know what obj_size and remaining_buffer_size refer to? Where can I adjust them?
Container startup parameters
docker run --rm -it --net host --shm-size=20g \ --ulimit memlock=-1 --ulimit stack=67108864 --gpus all
I have an A5000 gpu, running the Qwen2.5-3B-Instruct model,
python3 ../run.py --input_text "Hello, what is your name?" --max_output_len=50 --tokenizer_dir ./tmp/Qwen/3B/ --engine_dir=./tmp/Qwen/3B/trt_engines/int4_weight_only/1-gpu/
I can get normal results.
But starting the backend service reported an error
python3 /tensorrtllm_backend/scripts/launch_triton_server.py --world_size=1 --model_repo=${MODEL_FOLDER}
The text was updated successfully, but these errors were encountered: