[LLM Runtime] Enable phi-2&phi-1.5&phi-1 #78

intellinjun · 2024-01-22T01:27:00Z

Type of Change

feature or bug fix or documentation or others
API changed or not
not

Description

Model enabling

phi-1
phi-1.5
phi-2
detail description
convert_phi.py for phi-1&1.5&2
inference for phi-1&1.5&2
partial_rotary_factor
quant for phi-1&1.5&2
MHA&FFN fusion for phi1&1.5&2
pybinding for phi1&1.5&2
extension text for phi2
update README and Requirements

Signed-off-by: intellinjun <[email protected]>

intellinjun · 2024-01-23T10:54:21Z

Signed-off-by: intellinjun <[email protected]>

intellinjun · 2024-01-24T07:17:50Z

a32543254

LGTM

a32543254 · 2024-01-24T07:23:54Z

Could you post the phi2’s performance data here ？

a32543254 · 2024-01-24T07:25:12Z

please also change https://github.com/intel/neural-speed/blob/main/docs/supported_models.md
and add extension test for phi2

intellinjun · 2024-01-24T07:25:45Z

https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/44/

zhentaoyu · 2024-01-24T07:29:06Z

Does its model architecture have any differences compared with other GPT-liked models (llama, gpt-j)?

Signed-off-by: intellinjun <[email protected]>

intellinjun · 2024-01-24T07:34:20Z

Does its model architecture have any differences compared with other GPT-liked models (llama, gpt-j)?

use partial rope with parameter "partial_rotary_factor"

intellinjun · 2024-01-24T07:36:18Z

please also change https://github.com/intel/neural-speed/blob/main/docs/supported_models.md and add extension test for phi2

done

scripts/cal_diff.py

requirements.txt

neural_speed/convert/convert_phi.py

airMeng · 2024-01-25T01:56:03Z

how about the performance?

Signed-off-by: intellinjun <[email protected]>

intellinjun · 2024-01-25T02:33:23Z

how about the performance?
Here is performance test result
https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/47/artifact/report.html

intellinjun · 2024-01-25T02:34:24Z

Could you post the phi2’s performance data here ？

here is performance test result
https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/47/artifact/report.html

a32543254 · 2024-01-25T02:41:08Z

Could you post the phi2’s performance data here ？

here is performance test result https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/47/artifact/report.html

Model	Input	Output	Batchsize	Cores/Instance	Precision	Eval Time per Token	Memory	1st Latency	Total Time	P90 Latency Time	P99 Latency Time
phi2	32	32	1	32	q4_j_i8_g128	13.65		33.41	456.65	15.45	33.41
phi2	1024	32	1	32	q4_j_i8_g128	14.04	1835.72	329.43	764.56	14.28	329.43
phi2	2012	32	1	32	q4_j_i8_g128	16.14	2397.97	782.88	1283.21	16.3	782.88
phi2	32	32	1	48	q4_j_i8_g128	17.84	2820.39	41.44	594.62	17.98	41.44
phi2	1024	32	1	48	q4_j_i8_g128	18.7	2656.64	332.41	912.11	18.9	332.41
phi2	2012	32	1	48	q4_j_i8_g128	21.03	2610.86	702.2	1354.16	21.19	702.2
phi2	32	32	1	56	q4_j_i8_g128	13.18	2742.12	26.26	434.8	13.36	26.26
phi2	1024	32	1	56	q4_j_i8_g128	14.51	2648.44	313.79	763.55	14.72	313.79
phi2	2012	32	1	56	q4_j_i8_g128	20.62	2627.29	670.58	1309.75	21.39	670.58
phi2	32	32	1	32	q4_j_i8_g32	15.92	1919.43	55.29	548.68	16.07	55.29
phi2	1024	32	1	32	q4_j_i8_g32	17.55	2113.5	740.11	1284.13	17.77	740.11
phi2	2012	32	1	32	q4_j_i8_g32	19.63	2547.23	1582.4	2191.04	19.82	1582.4
phi2	32	32	1	48	q4_j_i8_g32	17.07	2761.54	67.44	596.51	17.19	67.44
phi2	1024	32	1	48	q4_j_i8_g32	18.57	2625.89	670.55	1246.1	18.96	670.55
phi2	2012	32	1	48	q4_j_i8_g32	24.97	2646.65	1354.37	2128.29	25.2	1354.37
phi2	32	32	1	56	q4_j_i8_g32	20.81	2702.85	62.48	707.69	20.96	62.48
phi2	1024	32	1	56	q4_j_i8_g32	23.17	2635.03	649.97	1368.26	23.35	649.97
phi2	2012	32	1	56	q4_j_i8_g32	20.15	2641.73	1367.48	1992.27	20.37	1367.48
phi2	32	32	1	32	q4_0	19.82	1775.53	157.67	772.19	20.04	157.67
phi2	1024	32	1	32	q4_0	24.11	2336.25	4470.92	5218.26	24.51	4470.92
phi2	2012	32	1	32	q4_0	27.32	2861.43	9642.05	10488.9	27.72	9642.05
phi2	32	32	1	48	q4_0	20.23	2692.87	149.22	776.39	20.24	149.22
phi2	1024	32	1	48	q4_0	20.52	2602.2	3529.12	4165.3	21.04	3529.12
phi2	2012	32	1	48	q4_0	28.7	2773.79	7662.59	8552.42	29.01	7662.59
phi2	32	32	1	56	q4_0	20.65	2665.96	129.45	769.73	20.82	129.45
phi2	1024	32	1	56	q4_0	22.04	2623.41	3316.49	3999.72	22.64	3316.49
phi2	2012	32	1	56	q4_0	30.76	2731.6	7060.52	8014.12	31.08	7060.52

intellinjun and others added 18 commits January 10, 2024 10:10

fix magicoder ci test

257639c

Signed-off-by: intellinjun <[email protected]>

Merge branch 'main' into phi2

2fec805

-senable phi2

b542764

enable phi2

31d83bf

Signed-off-by: intellinjun <[email protected]>

enable phi2 pybind

2916f11

Signed-off-by: intellinjun <[email protected]>

enable phi2 pybind

d88f1a8

Signed-off-by: intellinjun <[email protected]>

add phi2 extension test

9079170

Signed-off-by: intellinjun <[email protected]>

enable mha&ffn fusion

e3f8c97

Signed-off-by: intellinjun <[email protected]>

enable phi-1_5&phi-1

614733d

Signed-off-by: intellinjun <[email protected]>

update requirements

efa9f4d

Signed-off-by: intellinjun <[email protected]>

Merge branch 'main' into phi2

715c477

Update phi.h

10995b8

Update phi_utils.cpp

8ae0c46

fix pybind compile error

0bc9908

Signed-off-by: intellinjun <[email protected]>

Update convert_phi.py

647466c

fix compile error

d1797ce

Signed-off-by: intellinjun <[email protected]>

fix correct error

b12360f

Signed-off-by: intellinjun <[email protected]>

Merge branch 'phi2' of https://github.com/intel/neural-speed into phi2

fff3b3f

intellinjun and others added 5 commits January 23, 2024 18:57

fix format error

cf48eb1

Signed-off-by: intellinjun <[email protected]>

enable phi-gguf

c334055

Signed-off-by: intellinjun <[email protected]>

fix format error

a581a7d

Signed-off-by: intellinjun <[email protected]>

fix convert error

6eb29d0

Signed-off-by: intellinjun <[email protected]>

Merge branch 'main' into phi2

4154f71

intellinjun requested review from Zhenzhong1, airMeng and zhenwei-intel and removed request for airMeng January 24, 2024 07:17

intellinjun requested review from a32543254 and zhentaoyu January 24, 2024 07:18

intellinjun marked this pull request as ready for review January 24, 2024 07:18

a32543254 approved these changes Jan 24, 2024

View reviewed changes

update supported_models

239d59e

Signed-off-by: intellinjun <[email protected]>

zhenwei-intel reviewed Jan 24, 2024

View reviewed changes

scripts/cal_diff.py Outdated Show resolved Hide resolved

zhenwei-intel reviewed Jan 24, 2024

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

zhenwei-intel approved these changes Jan 24, 2024

View reviewed changes

intellinjun added 3 commits January 24, 2024 15:45

Update cal_diff.py

99db172

Update requirements.txt

d9b310e

Update cpp_graph_inference.sh

105fab6

Zhenzhong1 reviewed Jan 25, 2024

View reviewed changes

neural_speed/convert/convert_phi.py Outdated Show resolved Hide resolved

Zhenzhong1 reviewed Jan 25, 2024

View reviewed changes

neural_speed/convert/convert_phi.py Outdated Show resolved Hide resolved

intellinjun and others added 3 commits January 25, 2024 10:07

fix convert script error

3702d1a

Signed-off-by: intellinjun <[email protected]>

Merge branch 'phi2' of https://github.com/intel/neural-speed into phi2

1a86d12

Update convert_phi.py

d624205

VincyZhang merged commit c212d89 into main Jan 25, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM Runtime] Enable phi-2&phi-1.5&phi-1 #78

[LLM Runtime] Enable phi-2&phi-1.5&phi-1 #78

intellinjun commented Jan 22, 2024 •

edited

Loading

intellinjun commented Jan 23, 2024

intellinjun commented Jan 24, 2024

a32543254 left a comment

a32543254 commented Jan 24, 2024

a32543254 commented Jan 24, 2024

intellinjun commented Jan 24, 2024

zhentaoyu commented Jan 24, 2024

intellinjun commented Jan 24, 2024

intellinjun commented Jan 24, 2024

airMeng commented Jan 25, 2024

intellinjun commented Jan 25, 2024

intellinjun commented Jan 25, 2024

a32543254 commented Jan 25, 2024 •

edited by intellinjun

Loading

[LLM Runtime] Enable phi-2&phi-1.5&phi-1 #78

[LLM Runtime] Enable phi-2&phi-1.5&phi-1 #78

Conversation

intellinjun commented Jan 22, 2024 • edited Loading

Type of Change

Description

intellinjun commented Jan 23, 2024

intellinjun commented Jan 24, 2024

a32543254 left a comment

Choose a reason for hiding this comment

a32543254 commented Jan 24, 2024

a32543254 commented Jan 24, 2024

intellinjun commented Jan 24, 2024

zhentaoyu commented Jan 24, 2024

intellinjun commented Jan 24, 2024

intellinjun commented Jan 24, 2024

airMeng commented Jan 25, 2024

intellinjun commented Jan 25, 2024

intellinjun commented Jan 25, 2024

a32543254 commented Jan 25, 2024 • edited by intellinjun Loading

intellinjun commented Jan 22, 2024 •

edited

Loading

a32543254 commented Jan 25, 2024 •

edited by intellinjun

Loading