Skip to content
This repository was archived by the owner on Aug 30, 2024. It is now read-only.

[LLM Runtime] Enable phi-2&phi-1.5&phi-1 #78

Merged
merged 30 commits into from
Jan 25, 2024
Merged

[LLM Runtime] Enable phi-2&phi-1.5&phi-1 #78

merged 30 commits into from
Jan 25, 2024

Conversation

intellinjun
Copy link
Contributor

@intellinjun intellinjun commented Jan 22, 2024

Type of Change

feature or bug fix or documentation or others
API changed or not
not

Description

Model enabling

  • phi-1
  • phi-1.5
  • phi-2
    detail description
  • convert_phi.py for phi-1&1.5&2
  • inference for phi-1&1.5&2
  • partial_rotary_factor
  • quant for phi-1&1.5&2
  • MHA&FFN fusion for phi1&1.5&2
  • pybinding for phi1&1.5&2
  • extension text for phi2
  • update README and Requirements

@intellinjun
Copy link
Contributor Author

image

intellinjun and others added 5 commits January 23, 2024 18:57
Signed-off-by: intellinjun <[email protected]>
Signed-off-by: intellinjun <[email protected]>
Signed-off-by: intellinjun <[email protected]>
Signed-off-by: intellinjun <[email protected]>
@intellinjun
Copy link
Contributor Author

image

@intellinjun intellinjun requested review from Zhenzhong1, airMeng and zhenwei-intel and removed request for airMeng January 24, 2024 07:17
@intellinjun intellinjun marked this pull request as ready for review January 24, 2024 07:18
Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@a32543254
Copy link
Contributor

Could you post the phi2’s performance data here ?

@a32543254
Copy link
Contributor

please also change https://github.com/intel/neural-speed/blob/main/docs/supported_models.md
and add extension test for phi2

@intellinjun
Copy link
Contributor Author

@zhentaoyu
Copy link
Contributor

Does its model architecture have any differences compared with other GPT-liked models (llama, gpt-j)?

Signed-off-by: intellinjun <[email protected]>
@intellinjun
Copy link
Contributor Author

Does its model architecture have any differences compared with other GPT-liked models (llama, gpt-j)?

use partial rope with parameter "partial_rotary_factor"

@intellinjun
Copy link
Contributor Author

please also change https://github.com/intel/neural-speed/blob/main/docs/supported_models.md and add extension test for phi2

done

@airMeng
Copy link
Contributor

airMeng commented Jan 25, 2024

how about the performance?

@intellinjun
Copy link
Contributor Author

how about the performance?
Here is performance test result
https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/47/artifact/report.html

@intellinjun
Copy link
Contributor Author

Could you post the phi2’s performance data here ?

here is performance test result
https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/47/artifact/report.html

@a32543254
Copy link
Contributor

a32543254 commented Jan 25, 2024

Could you post the phi2’s performance data here ?

here is performance test result https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/47/artifact/report.html

<style> </style>
Model Input Output Batchsize Cores/Instance Precision Eval Time per Token Memory 1st Latency Total Time P90 Latency Time P99 Latency Time
phi2 32 32 1 32 q4_j_i8_g128 13.65   33.41 456.65 15.45 33.41
phi2 1024 32 1 32 q4_j_i8_g128 14.04 1835.72 329.43 764.56 14.28 329.43
phi2 2012 32 1 32 q4_j_i8_g128 16.14 2397.97 782.88 1283.21 16.3 782.88
phi2 32 32 1 48 q4_j_i8_g128 17.84 2820.39 41.44 594.62 17.98 41.44
phi2 1024 32 1 48 q4_j_i8_g128 18.7 2656.64 332.41 912.11 18.9 332.41
phi2 2012 32 1 48 q4_j_i8_g128 21.03 2610.86 702.2 1354.16 21.19 702.2
phi2 32 32 1 56 q4_j_i8_g128 13.18 2742.12 26.26 434.8 13.36 26.26
phi2 1024 32 1 56 q4_j_i8_g128 14.51 2648.44 313.79 763.55 14.72 313.79
phi2 2012 32 1 56 q4_j_i8_g128 20.62 2627.29 670.58 1309.75 21.39 670.58
phi2 32 32 1 32 q4_j_i8_g32 15.92 1919.43 55.29 548.68 16.07 55.29
phi2 1024 32 1 32 q4_j_i8_g32 17.55 2113.5 740.11 1284.13 17.77 740.11
phi2 2012 32 1 32 q4_j_i8_g32 19.63 2547.23 1582.4 2191.04 19.82 1582.4
phi2 32 32 1 48 q4_j_i8_g32 17.07 2761.54 67.44 596.51 17.19 67.44
phi2 1024 32 1 48 q4_j_i8_g32 18.57 2625.89 670.55 1246.1 18.96 670.55
phi2 2012 32 1 48 q4_j_i8_g32 24.97 2646.65 1354.37 2128.29 25.2 1354.37
phi2 32 32 1 56 q4_j_i8_g32 20.81 2702.85 62.48 707.69 20.96 62.48
phi2 1024 32 1 56 q4_j_i8_g32 23.17 2635.03 649.97 1368.26 23.35 649.97
phi2 2012 32 1 56 q4_j_i8_g32 20.15 2641.73 1367.48 1992.27 20.37 1367.48
phi2 32 32 1 32 q4_0 19.82 1775.53 157.67 772.19 20.04 157.67
phi2 1024 32 1 32 q4_0 24.11 2336.25 4470.92 5218.26 24.51 4470.92
phi2 2012 32 1 32 q4_0 27.32 2861.43 9642.05 10488.9 27.72 9642.05
phi2 32 32 1 48 q4_0 20.23 2692.87 149.22 776.39 20.24 149.22
phi2 1024 32 1 48 q4_0 20.52 2602.2 3529.12 4165.3 21.04 3529.12
phi2 2012 32 1 48 q4_0 28.7 2773.79 7662.59 8552.42 29.01 7662.59
phi2 32 32 1 56 q4_0 20.65 2665.96 129.45 769.73 20.82 129.45
phi2 1024 32 1 56 q4_0 22.04 2623.41 3316.49 3999.72 22.64 3316.49
phi2 2012 32 1 56 q4_0 30.76 2731.6 7060.52 8014.12 31.08 7060.52

@VincyZhang VincyZhang merged commit c212d89 into main Jan 25, 2024
10 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants