Skip to content

Latest commit

 

History

History

Llama3

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Llama-3

1. Supported Models

  • Llama-3-8B

2. Setup

2.1 Start Docker Container

Refer to the README.md

2.2 Clone Model

# 1. create directory
mkdir -p /data/mtt/models /data/mtt/models_convert

# 2. clone model
cd /data/mtt/models
git lfs install
git clone https://www.modelscope.cn/LLM-Research/Meta-Llama-3-8B-Instruct.git

If you encounter any issues while downloading the model, please refer to the README.

2.3 Weight Conversion

python -m mttransformer.convert_weight \
	--in_file /data/mtt/models/Meta-Llama-3-8B-Instruct/ \
	--saved_dir /data/mtt/models_convert/Meta-Llama-3-8B-Instruct-fp16-tp1-convert/ \
	--tensor-para-size 1

3. Inference

  • start server
python -m vllm.entrypoints.openai.api_server \
    --model /data/mtt/models_convert/Meta-Llama-3-8B-Instruct-fp16-tp1-convert/ \
    --trust-remote-code \
    --tensor-parallel-size 1 \
    -pp 1 \
    --block-size 64 \
    --max-model-len 4096 \
    --disable-log-stats \
    --disable-log-requests \
    --gpu-memory-utilization 0.95 \
    --device "musa"
  • send message
curl http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
        "model": "/data/mtt/models_convert/Meta-Llama-3-8B-Instruct-fp16-tp1-convert/",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the world series in 2020?"}
        ]
}'