About multiple-thread attention computation on CPU using zero-inference example. #886

luckyq · 2024-04-04T22:12:38Z

Hi,

I am trying to test the attention computation on the CPU with zero-interference.

I use the following command to run the script.

BSZ=96
LOG_DIR=$BASE_LOG_DIR/${MODEL_NAME}_bs${BSZ}
mkdir -p  $LOG_DIR
deepspeed --num_gpus 1 run_model.py --dummy --model ${FULL_MODEL_NAME} --batch-size ${BSZ} --cpu-offload --pin-memory 1 --offload-dir /tmp/data/dxu --gen-len 32 --pin-memory 1 --kv-offload --async_kv_offload

Duing the exection, I only saw one core is used.

However, there are many processes are created for the run.

Are there any other parameters or env I should configure to enable multiple-cores?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About multiple-thread attention computation on CPU using zero-inference example. #886

About multiple-thread attention computation on CPU using zero-inference example. #886

luckyq commented Apr 4, 2024

About multiple-thread attention computation on CPU using zero-inference example. #886

About multiple-thread attention computation on CPU using zero-inference example. #886

Comments

luckyq commented Apr 4, 2024