Control printing information using NEURAL_SPEED_VERBOSE #26

zhenwei-intel · 2024-01-04T03:21:00Z

Type of Change

feature
API changed

Description

cherry-pick from intel/intel-extension-for-transformers#1054
NEURAL_SPEED_VERBOSE for c++ and python api.

Enable verbose mode and control tracing information using the NEURAL_SPEED_VERBOSE environment variable.

Available modes:

0: Print all tracing information. Comprehensive output, including: evaluation time and operator profiling.
1: Print evaluation time. Time taken for each evaluation.
2: Profile individual operators. Identify performance bottlenecks within the model.

example:

NEURAL_SPEED_VERBOSE=1 ./build/bin/run_llama -m runtime_outs/ne_llama_q_int4_jblas_cint8_g32.bin -p "once upon a time, a little girl" -n 10

...................................................................................................
model_init_from_file: support_jblas_kv = 0
model_init_from_file: kv self size =  128.00 MB

system_info: n_threads = 56 / 112 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | F16C = 1 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 10, n_keep = 0


 once upon a time, a little girl named Lily lived in a small village nestled
model_print_timings:        load time =  2233.65 ms
model_print_timings:      sample time =     7.48 ms /    10 runs   (    0.75 ms per token)
model_print_timings: prompt eval time =   222.95 ms /     9 tokens (   24.77 ms per token)
model_print_timings:        eval time =   408.06 ms /     9 runs   (   45.34 ms per token)
model_print_timings:       total time =  2653.31 ms
========== eval time log of each prediction ==========
prediction   0, time: 222.95ms
prediction   1, time: 43.97ms
prediction   2, time: 43.83ms
prediction   3, time: 43.74ms
prediction   4, time: 43.80ms
prediction   5, time: 50.83ms
prediction   6, time: 45.16ms
prediction   7, time: 46.75ms
prediction   8, time: 44.81ms
prediction   9, time: 45.19ms

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: zhenwei-intel <[email protected]>

a32543254

LGTM

zhenwei-intel added 2 commits January 4, 2024 11:10

Control printing information using NEURAL_SPEED_VERBOSE

2c46267

Signed-off-by: zhenwei-intel <[email protected]>

fix test

dfbeb93

Signed-off-by: zhenwei-intel <[email protected]>

zhenwei-intel requested review from a32543254 and zhentaoyu January 4, 2024 03:21

a32543254 approved these changes Jan 4, 2024

View reviewed changes

zhentaoyu approved these changes Jan 4, 2024

View reviewed changes

VincyZhang merged commit a8d9e7d into main Jan 4, 2024
9 checks passed

zhenwei-intel deleted the lzw/loginfo branch January 8, 2024 01:11

DDEle pushed a commit to DDEle/neural-speed that referenced this pull request Feb 15, 2024

correct brgemm slm size (intel#26)

d644c21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control printing information using NEURAL_SPEED_VERBOSE #26

Control printing information using NEURAL_SPEED_VERBOSE #26

zhenwei-intel commented Jan 4, 2024

a32543254 left a comment

Control printing information using NEURAL_SPEED_VERBOSE #26

Control printing information using NEURAL_SPEED_VERBOSE #26

Conversation

zhenwei-intel commented Jan 4, 2024

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

a32543254 left a comment

Choose a reason for hiding this comment